A code generation (AKA "codegen") issue in RyuJIT in the .NET Framework 4.6 has been discovered that affects a calling pattern called Tail Call Optimization. The RyuJIT team has fixed the issue and has started the process of producing a .NET Framework 4.6 patch that will be freely available for anyone to download and install.
There is a workaround for this issue, with the .NET Framework 4.6. It is supported to use this workaround in production to safely avoid this issue. The workaround is enabling a RyuJit config switch to disable tail call optimizations. See the recommendation below, for a detailed explanation how to proceed.
Description of the Issue
This issue can affect apps running on the .NET Framework 4.6 in 64-bit processes. For example, you may have an app that was built for the .NET Framework 4.0. If you upgrade your machine to the .NET Framework 4.6, it could potentially be affected. Apps running in 32-bit processes are not affected by this issue. Note that the default process type for client apps (e.g. WPF, Windows Forms) is 32-bit.
This issue is narrow in nature. Your code has to use specific data types, pass them in specific ways and execute specific operations. Very few programs will satisfy all of these characteristics, which is required to trigger this codgen bug. We have reviewed this issue to determine if it is exploitable. We have not identified an exploit, but are pushing the change through our process at same pace as we would an exploit.
The following annotated C# repro provides a detailed explanation of the bug.
The following F# repro provides the F# version of the issue.
Customer Bug Report
Nick Craver and Marc Gravell, a team of two at Stack Exchange (runs Stack Overlow), reached out to us on Thursday of last week on this issue. They were scouting the .NET Framework 4.6, to see if it was ready for their use in production and ran into some unexpected product behavior. They went the extra mile and reduced what they were seeing into a minimal repro. Thanks! Clearly, a very solid set of engineers.
We were able to diagnose the issue by Friday and provide a simple work-around to disable the specific RyuJIT optimization.
Advisory
Nick Craver published his own customer advisory yesterday, on Why you should wait on upgrading to .Net 4.6. It's a good post that you should read if you are deploying the .NET Framework 4.6.
The .NET team has concluded a detailed analysis of tens of thousands of test assets and internal customer data. The data suggests that the vast majority of .NET developers will not experience this same issue. We have extensive tests for the .NET Framework libraries (e.g. System.Xml). We have not been able to find a single case of this issue across that very large body of code. From a production standpoint, big Microsoft web properties have been running on pre-release versions of .NET Framework 4.6 for months without hitting this issue.
This bug requires a significant set of conditions that must be present to trigger it. It's unlikely that many developers have actually written matching code. We recognize that this bug is very real to StackExchange, and conclude that they are one of the few cases that have and will hit it.
Recommendation
Our recommendation to StackExchange and to any other customer is the following:
- Scout the .NET Framework 4.6 in your environment.
- If you run into an issue that you cannot diagnose, try disabling RyuJIT.
- If disabling RyuJIT resolves the issue, please re-enable RyuJIT and disable tail call optimization.
- If your issue is mitigated with the tail call optimization disabled, then you know that your app is subject to this issue. You can run your app in production in that configuration (tail call optimization disabled), to get the other .NET Framework 4.6 benefits. This work around will disable only the tail call optimization feature and should not negatively impact performance.
- If your issue is not mitigated with the tail call optimization disabled, but is mitigated with RyuJIT disabled, we want to hear from you on .NET Framework Connect. You can also run your app in production in this configuration (RyuJIT disabled).
- If your issue is not mitigated by disabling RyuJIT or tail call optimization, then it something else and unrelated to this advisory.
You may be wondering how it is OK to run the .NET Framework 4.6 in production with RyuJIT disabled. It's very similar in nature to the way that the .NET Framework 4.5 CLR runs, which doesn't have RyuJIT at all. The .NET Framework 4.6 includes both JIT64 and RyuJIT, providing additional flexibility for both testing and production use.
F# developers are encouraged to wait to deploy the .NET Framework 4.6. This issue affects F# programs more commonly. We will post an update on the blog when we are ready to make the all-clear on .NET Framework use for F# developers. We appologize for that situation. We are in the process of increasing our F# test coverage.
Closing
Thanks again to the StackExchange team for reaching out to us with this issue and for getting the word out about the issue.
As stated at the start of the post, we have already started producing a RyuJIT patch for the .NET Framework 4.6. We will post an update when it is is available.
We know that you rely on us to provide high-quality software. We take that very seriously. It's no accident that RyuJIT has several configuration settings. We use them for our own testing and we expected that someone somewhere would find an issue that required investigation and potentially a fix. These settings enabled us to quickly root-cause the issue and also provides a way of safely running the .NET Framework 4.6, without risk of running into codegen issues.
The .NET Framework 4.6 is a great release that we can continue to recommend deploying. It is perfectly safe to run the .NET Framework 4.6 with tail call optimizations disabled, while you are waiting for the patch. Your app will get the benefit of other .NET Framework 4.6 improvements.