Goal: A Governor for 10,000 tasks
My goal was to put a little sanity around 10k tasks, each of which pushed load against an HTTP endpoint. Starting all of them at once isn’t realistic or kind or stable. It can cause port exhaustion on the OS, and other unpleasant side-effects.
A little sanity would look like a governor that allowed X number of these to run in parallel at a time.
That sounds like the
System.Threading.Tasks.Parallel class in the .NET framework!
Nope, Actually Wrong
One would think that a class in the
Task namespace plays well with Tasks.
Parallel.For() will betray you if you pass it a bunch of tasks to run. You would think it would start some subset of them based on the specified or calculated degree of parallelism supported by your machine. But instead, all the tasks are started at the same time, and the Parallel.For() doesn’t wait for them to finish before deciding that everything is done.
Here’s a little LinqPad script to demonstrate. Notice that:
- The synchronous code works.
- The Async lambda does not do the same thing.
Well, You see Nathan…
Joe Dev: “Well, you see Nathan, you need to understand that asynchronous execution is not the same as multi-threading or parallel execution on multiple cores.”
Joe Dev: “
Parallel.For() is about multi-threading. Async doesn’t really even need multiple threads. It’s about interleaving bits of work.”
Me: “Mmm hmm.”
Joe Dev: “So don’t you see how silly your mistake was?”
Me: “How silly of me to pass a
Task to the
Parallel class in the
System.Threading.Tasks namespace. O_o”
Hindsight = Duh?
In retrospect, as with most hindsight… “Of course… that’s totally obvious!”
Except that it wasn’t obvious on the front side of this experience.
From the Microsoft docs page:
The Parallel class provides library-based data parallel replacements for common operations such as for loops, for each loops, and execution of a set of statements.
Notice that little gem of a word “data” in that description? Yeah, me neither. That’s the hint that the implementation is focused on CPU intensive work. The
Parallel.For() (and it’s sibling methods) are great for spreading work across your CPU cores. That’s wonderful for crunching numbers, processing images, etc.
Async IO is not the same sort of beast.
Parallel is not Async friendly.
There are numerous options that work well. They vary in how the operate, and have some slight trade-offs.
Task.WhenAll() – Variant #1 – Divide and Conquer
This approach divides the work into N buckets, with one Task to govern (exec) each bucket of tasks.
Task.WhenAll() – Variant #2 – SemaphoreSlim
Greg Bair pointed out that the
Semaphore is built for almost exactly this purpose. Brilliant. And we now have an async friendly version via SemaphoreSlim.