Hi,
I have inherited 0.5 mio C# LOC across 180 projects written by complete lunatics. Everything is public, ReSharper has more warnings than LOC and even the unit test project itself has less than 10% coverage (the rest is below 0.1%).
Using identical inputs will sometimes result in wrong outputs. It’s about 5-10% error rate and depends on CPU speed (faster = more errors).
ReSharper does show issues regarding fields accessed with and without locking. But walking the code quickly showed fields accessed from multiple threads without any locking.
Is there a tool for this in the C# world?
I’m usually on C++, I have right now VS 2019, ReSharper ultimate and about 3 weeks to fix this or at least lower the error rate…
Not really. You could try filtering the resharper warnings only for thread related issues, and also you might try install ing some roslyn analyzer which helps with thread ing issues. But éven with theese you won’t be able to just alt+enter your way to a correct solution. Good luck tho.
That’s what I’m asking here: where to find such a “Roslyn analyser”? What is the toolchain in this case?
Start with the refactor. I hope that would be manageable… 3 weeks ain’t much but it would be quite sufficient to stabilize quite a lot of things.
I don’t believe that multithreading malpractice can be solved by debugging alone and applying hotfixes here and there… It would be a huge timewaste… I might be wrong, the end decision is up to you 😉
Workflow refactor must take place first. You would need someone with domain expertise to be sure that you don’t miss something. Then take on lower level implementation details one by one. Then VS debugger can help you to test if everything works correctly. System.Threading setting current thread.name (only once per thread or u will break it) and freezing threads will be your new best friend.
Best of luck.
I would usually just opt for a full rewrite and not use the code at all.
But this code is in production. The rewrite has already started, is 50% feature complete at 25k LOC. But the rewrite will take another at least another 6 months…. 6 months with this current code staying in production. That’s why I’m tasked with patching and hot fixing the mess. It’s not about pretty, maintainability,… It’s about survival (as in running the code 3x and taking majority vote on the result).
I was hoping for something like helgrind (part of valgrind) for .NET. Helgrind marks all memory locations accessed from different threads without locking. It’s a runtime thing, it’s not complete, and I would definitely prefer something static. But it does identify the hot spots very efficiently.
Woah. Be afraid of the stuff that is not covered by tests! If the few tests that exist show this concurrency problems… just wait until you find the real good bugs…
The bugs are in production, the unit tests are happy.
Yes, it’s that bad.
One thing you can do that is relatively straightforward is to see whether some issues are caused by improperly accessed collections and replace them with their concurrent counterparts.
Then lock all the things… Obviously sacrificing performance in process.
But overall, my condolences. Shitty situation to be in. Debugging race conditions is always the worst.
I would resist the urge to start making sweeping changes; you’re likely to do more harm than good given the lack of tests (not that unit tests are very helpful in testing concurrency), and the code is going to be sunsetted soon anyway so there’s not much return on the effort. Assuming the rewrite is really happening and will really be done in 6 months per your comments.
Instead, triage bugs and tackle them one by one to buy you time until the rewrite is done.
Be aware that introducing locks around fields designed to be accessed by multiple threads simultaneously carries a risk of introducing deadlocks (this might be why there are no locks in the first place). Consider adding log messages (in debug) just prior to waiting on a lock, after obtaining a lock, and prior to releasing it. Also consider using ReaderWriterLockSlim
with EnterReadLock
and EnterWriteLock
to block threads for writing only; this will help reduce the deadlock risk.
Have you considered locking the process to a single cpu? Or if configurable, set the process to only use 1 thread at a time?
I’ve completely removed badly written threaded code and still had acceptable performance- especially given the trade off was to have correct results every time.
It runs “fine” if I limit the production server to 1GHz. Limiting th VM to one CPU thread, but 3GHz results in errors.
It explicitly starts new threads all over the place. Some are short lived, some don’t seem to produce a result ever, some exit with exceptions,…
If it produces fewer errors while it’s slow, you can make it run slow much faster than fix every little bug
How bad is it? There are no tools that I am aware of that will help you.
You have at least one unit test that has this concurrency error.
Try fix this specific one. If you are lucky, there is just a single place where all the bugs come from. But fixing this could take days. Any more than a few and you cannot fix this in 3 weeks.
What kind of errors do you get?
Lists, collections or queues that are shared between threads can be changed to their thread save CuncurrentXXXXX counterparts.
I’d love to have a look at the code. Those are my kind of puzzles I’d like to solve.
If you add lock around ONE usage of such a field, you will get warnings that the other usages are not locked. You could start with that. I’m not sure if the warning comes from resharper or VS.
Newbie question: what is “0.5 mio”?