Redacting sensitive information with Application Insights in C#

In addition to this article, here are additional content that you may find relevant:

Introduction

Sensitive information should be protected in transit and at rest. Sometimes, sensitive information shouldn't even reach the destination system and should be redacted before it's sent to a cloud service or storage container of any sort.

Sometimes you want to encrypt information to later be able to decrypt it. Sometimes, you never want the information to reach the destination storage buckets in the first place. The latter is what we're looking at in this post.

What is sensitive information and PII?

Sensitive information refers to any data or information that is considered confidential or private and requires special protection to prevent unauthorized access, disclosure, or use. It can include a wide range of data types such as financial information, medical records, personally identifiable information (PII), and trade secrets.

PII in logs

Logs are a record of events that occur in a system, application, or network. Audit logs, in particular, are logs that record every action taken within a system or application. These logs can contain sensitive information and PII, such as usernames and passwords, IP addresses, and other network information. It is essential to ensure that sensitive information and PII are adequately protected in logs, and that access to these logs is strictly controlled.

Considerations for logging sensitive information

In Application Insights, any trace information is displayed in the logs and therefore also accessible from any tool, service or report that uses data from this log table.

For example, here's a standard set of trace messages logged in Application Insights, including sensitive information.

Screenshot showing Personally Identifiable Information (PII) in Application Insights logs.

However, we can take actions to help automate redaction of sensitive information in various ways. Ideally, we obfuscate the output before it is sent over the wire to Application Insights. Here's one way it could look if we achieved our desired result:

Screenshot of a log table in Application Insights with redacted personally identifiable information (PII)

In the picture, you can see that the e-mail addresses are redacted. It's not the most ideal message, but it proves the point. You could opt for something that replaces it with a mock e-mail format to indicate it was an e-mail, such as ***@***.***. Unless, of course, you want to avoid disclosing the format of the redacted information, too.

Build it

In the example above, I redacted any e-mail address that was being ingested in the Application Insights telemetry. I will take you through how to build that yourself now. Tag along!

Know your regex: E-mail addresses

Using regex, you can quickly create a representation of the information you want to look for in strings. For me, it's an e-mail regex with this pattern:

\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*

Before copy-pasting any regex online, make sure you validate your test cases. An easy way to do that is by using tools like https://regexr.com/

In C#, we can replace string contents with a Regex.Replace() like this:

Regex.Replace(myString, @"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*", "[PII REDACTED]");

In my string, myString, we find any e-mail address and replace it with the text [PII REDACTED].

ITelemetryInitializer

Next, we build a new ITelemetryInitializer to ensure filtering of telemetry messages and to remove sensitive info. We will later inject this into our logger.

using Microsoft.ApplicationInsights.Channel;
using Microsoft.ApplicationInsights.DataContracts;
using Microsoft.ApplicationInsights.Extensibility;
using System.Text.RegularExpressions;

namespace ApplicationInsights.RedactSensitiveInformation
{
    /// <summary>
    /// Redacts standardized sensitive information from the trace messages.
    /// </summary>
    internal class SensitivityRedactionTelemetryInitializer : ITelemetryInitializer
    {
        public void Initialize(ITelemetry t)
        {
            var traceTelemetry = t as TraceTelemetry;
            if (traceTelemetry != null)
            {
                // Use Regex to replace any e-mail address with a replacement string.
                traceTelemetry.Message = Regex.Replace(traceTelemetry.Message, @"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*", "[PII REDACTED]");
                
                // If we don't remove this CustomDimension, the telemetry message will still contain the PII in the "OriginalFormat" property.
                traceTelemetry.Properties.Remove("OriginalFormat");
            }
        }
    }
}

In the code block, we create a new class and inherit from ITelemetryInitializer, and for any TraceTelemetry message we retrieve, we run the Regex.Replace() code to redact any e-mail addresses.

You can, and should, consider the scope of your implementation in the ITelemetryInitializer. Do you need to cover more than traces, and how can you ensure the desired end-to-end redaction of PII across all ingested messages? I'll leave that to you to expand on.

Important:️ ⚠ Remove custom dimensions!

With the above code, we are almost there. However, the CustomDimensions property will contain the original text in a property, OriginalFormat:

Screenshot of a log entry that displays the customDimensions property with an OriginalFormat value.

We need to remove that in the ITelemetryInitializer, too. If you look at the last line of code above before the method exits, you can see that I remove the OriginalFormat property, and therefore the logs look cleaner:

Screenshot of a log entry that does not display the OriginalFormat customDimension property.

Inject the ITelemetryInitializer to your logger

Now, inject the ITelemetryInitializer in the code that calls the logger in your main app:

// NOTE: Injecting the SensitivityRedaction initializer.
services.AddSingleton<ITelemetryInitializer, SensitivityRedactionTelemetryInitializer>();

Here's a full example of the main demo app, wiring up Application Insights logging and then injecting the new SensitivityRedactionTelemetryInitializer we just created:

// Necessary using statements.
using ApplicationInsights.RedactSensitiveInformation;
using Microsoft.ApplicationInsights;
using Microsoft.ApplicationInsights.Extensibility;
using Microsoft.ApplicationInsights.WorkerService;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.ApplicationInsights;

// DEMO ONLY: Don't put credentials in code - use Azure Key Vault, or applicable protected configuration services.
// I am using the connection string in code for clarity and avoiding unnecessary logic that distracts from the focus of the demo.
const string connectionString = "";

#region Wire-up

//
// Wire-up.
// 
IServiceCollection services = new ServiceCollection();

// Add ApplicationInsightsLoggerProvider logger.
services.AddLogging(loggingBuilder => loggingBuilder.AddFilter<ApplicationInsightsLoggerProvider>("Category", LogLevel.Information));

// Add Application Insights logic (ApplicationInsightsTelemetryWorkerService)
services.AddApplicationInsightsTelemetryWorkerService((ApplicationInsightsServiceOptions options) => options.ConnectionString = connectionString);


// NOTE: Injecting the SensitivityRedaction initializer.
services.AddSingleton<ITelemetryInitializer, SensitivityRedactionTelemetryInitializer>();


IServiceProvider serviceProvider = services.BuildServiceProvider();

#endregion


//
// NOTE: Program logic to demonstrate
//

// Get the app insights ILogger from the service provider. 
ILogger<Program> logger = serviceProvider.GetRequiredService<ILogger<Program>>();


// Sending a few log messages. Some include PII, some does not. 
logger.LogWarning("This is a log message without PII.");
logger.LogWarning("This is a log message with an e-mail: spam@idjfidjfidfjidj.net");
logger.LogWarning("This is another message with spam@dikfjdifjdifjdij.net, and random@unicorn-demo-haiku.ai");
logger.LogWarning("Users access restrictions changed for: name@thecompany123.io;another@mail.com, new access level is 'Reader' on resource '123'");


// For demo purposes in our console app. 
// Used to directly flush the buffer before we quit the app.
var telemetryClient = serviceProvider.GetRequiredService<TelemetryClient>();
telemetryClient.Flush();
Task.Delay(5000).Wait();
           

Et voila. A simple demonstration of how to use an ITelemetryInitializer to adjust the ingested telemetry and make sure this information isn't persisted in storage.

Get the solution

Find the full solution file and source code on GitHub.