Last week in my post on updating my Windows Phone 7 application to Windows 8 I shared some code from Michael L. Perry using a concept whereby one protects access to a shared resource using a critical section in a way that works comfortably with the new await/async keywords. Protecting shared resources like files is a little more subtle now that asynchronous is so easy. We'll see this more and more as Windows 8 and Windows Phone 8 promote the idea that apps shouldn't block for anything.
After that post, my friend and mentor (he doesn't know he's my mentor but I just decided that he is just now) Stephen Toub, expert on all things asynchronous, sent me an email with some excellent thoughts and feedback on this technique. I include some of that email here with permission as it will help us all learn!
I hadn’t seen the Awaitable Critical Section helper you mention below before, but I just took a look at it, and while it’s functional, it’s not ideal. For a client-side solution like this, it’s probably fine. If this were a server-side solution, though, I’d be concerned about the overhead associated with this particular implementation.
I love Stephen Toubs's feedback in all things. Always firm but kind. Stephen Cleary makes a similar observation in the comments and also points out that immediately disabling the button works too. ;) It's also worth noting that Cleary's excellent AsyncEx library has lots of async-ready primitives and supports both Windows Phone 8 and 7.5.
The SemaphoreSlim class was updated on .NET 4.5 (and Windows Phone 8) to support async waits. You would have to build your own IDisposable Release, though. (In the situation you describe, I usually just disable the button at the beginning of the async handler and re-enable it at the end; but async synchronization would work too).
Ultimately what we're trying to do is create "Async Coordination Primitives" and Toub talked about this in February.
Here, we’ll look at building support for an async mutual exclusion mechanism that supports scoping via ‘using.’
I previously blogged about a similar solution (http://blogs.msdn.com/b/pfxteam/archive/2012/02/12/10266988.aspx), which would result in a helper class like this:
Here Toub uses the new lightweight SemaphoreSlim class and indulges our love of the "using" pattern to create something very lightweight.
public sealed class AsyncLock
{
private readonly SemaphoreSlim m_semaphore = new SemaphoreSlim(1, 1);
private readonly Taskm_releaser;
public AsyncLock()
{
m_releaser = Task.FromResult((IDisposable)new Releaser(this));
}
public TaskLockAsync()
{
var wait = m_semaphore.WaitAsync();
return wait.IsCompleted ?
m_releaser :
wait.ContinueWith((_, state) => (IDisposable)state,
m_releaser.Result, CancellationToken.None,
TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
}
private sealed class Releaser : IDisposable
{
private readonly AsyncLock m_toRelease;
internal Releaser(AsyncLock toRelease) { m_toRelease = toRelease; }
public void Dispose() { m_toRelease.m_semaphore.Release(); }
}
}
How lightweight and how is this different from the previous solution? Here's Stephen Toub, emphasis mine.
There are a few reasons I’m not enamored with the referenced AwaitableCriticalSection solution.
First, it has unnecessary allocations; again, not a big deal for a client library, but potentially more impactful for a server-side solution. An example of this is that often with locks, when you access them they’re uncontended, and in such cases you really want acquiring and releasing the lock to be as low-overhead as possible; in other words, accessing uncontended locks should involve a fast path. With AsyncLock above, you can see that on the fast path where the task we get back from WaitAsync is already completed, we’re just returning a cached already-completed task, so there’s no allocation (for the uncontended path where there’s still count left in the semaphore, WaitAsync will use a similar trick and will not incur any allocations).
Lots here to parse. One of the interesting meta-points is that a simple client-side app with a user interacting (like my app) has VERY different behaviors than a high-throughput server-side application. Translation? I can get away with a lot more on the client side...but should I when I don't have to?
His solution requires fewer allocations and zero garbage collections.
Overall, it’s also just much more unnecessary overhead. A basic microbenchmark shows that in the uncontended case, AsyncLock above is about 30x faster with 0 GCs (versus a bunch of GCs in the AwaitableCriticalSection example. And in the contended case, it looks to be about 10-15x faster.
Here's the microbenchmark comparing the two...remembering of course there's, "lies, damned lies, and microbenchmarks," but this one is pretty useful. ;)
class Program
{
static void Main()
{
const int ITERS = 100000;
while (true)
{
Run("Uncontended AL ", () => TestAsyncLockAsync(ITERS, false));
Run("Uncontended ACS", () => TestAwaitableCriticalSectionAsync(ITERS, false));
Run("Contended AL ", () => TestAsyncLockAsync(ITERS, true));
Run("Contended ACS", () => TestAwaitableCriticalSectionAsync(ITERS, true));
Console.WriteLine();
}
}
static void Run(string name, Functest)
{
var sw = Stopwatch.StartNew();
test().Wait();
sw.Stop();
Console.WriteLine("{0}: {1}", name, sw.ElapsedMilliseconds);
}
static async Task TestAsyncLockAsync(int iters, bool contended)
{
var mutex = new AsyncLock();
if (contended)
{
var waits = new Task[iters];
using (await mutex.LockAsync())
for (int i = 0; i < iters; i++)
waits[i] = mutex.LockAsync();
for (int i = 0; i < iters; i++)
using (await waits[i]) { }
}
else
{
for (int i = 0; i < iters; i++)
using (await mutex.LockAsync()) { }
}
}
static async Task TestAwaitableCriticalSectionAsync(int iters, bool contended)
{
var mutex = new AwaitableCriticalSection();
if (contended)
{
var waits = new Task[iters];
using (await mutex.EnterAsync())
for (int i = 0; i < iters; i++)
waits[i] = mutex.EnterAsync();
for (int i = 0; i < iters; i++)
using (await waits[i]) { }
}
else
{
for (int i = 0; i < iters; i++)
using (await mutex.EnterAsync()) { }
}
}
}
Stephen Toub is using Semaphore Slim, the "lightest weight" option available, rather than RegisterWaitForSingleObject:
Second, and more importantly, the AwaitableCriticalSection is using a fairly heavy synchronization mechanism to provide the mutual exclusion. The solution is using Task.Factory.FromAsync(IAsyncResult, …), which is just a wrapper around ThreadPool.RegisterWaitForSingleObject (see http://blogs.msdn.com/b/pfxteam/archive/2012/02/06/10264610.aspx). Each call to this is asking the ThreadPool to have a thread block waiting on the supplied ManualResetEvent, and then to complete the returned Task when the event is set. Thankfully, the ThreadPool doesn’t burn one thread per event, and rather groups multiple events together per thread, but still, you end up wasting some number of threads (IIRC, it’s 63 events per thread), so in a server-side environment, this could result in degraded behavior.
All in all, a education for me - and I hope you, Dear Reader - as well as a few important lessons.
- Know what's happening underneath if you can.
- Code Reviews are always a good thing.
- Ask someone smarter.
- Performance may not matter in one context but it can in another.
- You can likely get away with this or that, until you totally can't. (Client vs. Server)
Thanks Stephen Toub and Stephen Cleary!
Related Reading
- Building Async Coordination Primitives, Part 1: AsyncManualResetEvent
- Building Async Coordination Primitives, Part 2: AsyncAutoResetEvent
- Building Async Coordination Primitives, Part 3: AsyncCountdownEvent
- Building Async Coordination Primitives, Part 4: AsyncBarrier
- Building Async Coordination Primitives, Part 5: AsyncSemaphore
- Building Async Coordination Primitives, Part 6: AsyncLock
© 2012 Scott Hanselman. All rights reserved.