Hi everyone, Is there a way to add “middleware” to log4net that would apply regex logic toward MASKING sensitive data; names, dob, etc. ?
You could probably write your own appender that wrapped an existing one, if that’s what you wanted to do. That seems like a hard problem to solve, though, unless your data is extremely regular.
What might be better would be to have log4net log everything to a database.
Yes the appender with filter finds me the messages but applying the logic for replacing string with ‘*’ is proving difficult
You are better off not logging that information in the first place, or modify the logging code to remove it. Names will be impossible to detect in a performant way. DOB is just a date. Plus, PII can be all sorts of things.
My thoughts exactly. We don’t know the context, but think it’s rarely a case when you should put in logs such information. Even on trace level.
I’d rather modify all places where you put something to log and append only this information, which is relevant for you. I guess e.g. name is not needed, since you want to hide it.
Exactly, but the researching and trying to implement such a POC is interesting.
You may want to think about looking for a different logger as Log4net is no longer being maintained and the latest version has an unpatched vulnerability.
Serilog is a popular alternative with tons of contributed middleware including a masking library https://github.com/evjenio/masking.serilog
Have you got any details for that? It looks like the last release was a month ago, so it certainly looks like it’s being maintained.
I’ve seen this done a long time ago with Castle DynamicProxy and an Interceptor.
I think we just filtered out any fields containing *Name *Password *DOB, *Address etc, you could easily build a dictionary of PII to exclude.
So dictionaries I been told are out. Just simple strings. Create an appender for log4net and an enriched for serilog that funnels in message. Searches for string patterns. Replaces with ‘*’
I’m not familiar with log4net myself, so can’t suggest anything in that regard. Your problem did spark my intrigue though as recognizing PII and being able to redact it, seems like it would present some challenges. This seems especially true if your data isn’t incredibly regular and predictably structured, like the other user mentioned.
Anyways not sure if this would be applicable to your situation at all but you could use some kind of OCR or text analysis AI, in order to classify possible PII information and then redact it. Turns out that the Azure Text Analytics API does exactly that and can recognize names, phone numbers, etc. Not sure how effective or accurate it would be in practical application, as I haven’t used it, but it definitely seems like an interesting problem to solve.
Like I said, might not have any usefulness in your use case but definitely worth checking out since using regex logic seems like an uphill battle. You could always log raw output as usual and then have a background process/service that scans the saved logs and redacts any PII.
Text Analytics | Microsoft Azure
One of my goals is for a robust appender, so the domains could vary from project to project.
Why are you logging that? Seems like the real question.
Maybe you cna use some sort of [DoNotLog] attribute. I’d search around, seems unlikely you are the first person to encounter this problem.
C# devs
null reference exceptions