{"id":320,"date":"2025-08-19T08:34:01","date_gmt":"2025-08-19T08:34:01","guid":{"rendered":"https:\/\/www.dotnetdevelopers.us\/blogs\/?p=320"},"modified":"2025-09-01T07:53:32","modified_gmt":"2025-09-01T07:53:32","slug":"inside-dotnet-runtime","status":"publish","type":"post","link":"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/","title":{"rendered":"Inside .NET Runtime: Performance Tweaks in 2025"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_74 counter-hierarchy ez-toc-counter ez-toc-light-blue ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#Tiered_Compilation_The_Goldilocks_Zone_between_Startup_and_Steady_State\" >Tiered Compilation: The Goldilocks Zone between Startup and Steady State<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#_What_is_it\" >&nbsp;What is it?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#_Why_we_use_it\" >&nbsp;Why we use it<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#_How_does_it_work\" >&nbsp;How does it work?<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#ReadyToRun_Crossgen2_Ahead_of_Time_Warmups\" >ReadyToRun &amp; Crossgen2: Ahead of Time Warmups<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#What_it_is\" >What it is<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#_Why_we_use_it-2\" >&nbsp;Why we use it<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#_How_it_works\" >&nbsp;How it works<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#Native_AOT_The_Final_Frontier\" >Native AOT: The Final Frontier<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#_What_it_is\" >&nbsp;What it is<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#Why_to_use_it\" >Why to use it<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#Improving_Garbage_Collectors_Performance_Trade-Offs_with_Latency_and_Throughput\" >Improving Garbage Collectors: Performance Trade-Offs with Latency and Throughput&nbsp;<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#Why_we_use_it\" >Why we use it:&nbsp;<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#_How_to_use_it\" >&nbsp;How to use it:&nbsp;<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#JIT_Evolution_PGO\" >JIT Evolution &amp; PGO<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#_What_is_it-2\" >&nbsp;What is it?&nbsp;<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#Why_do_we_use_it\" >Why do we use it?&nbsp;<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#_How_does_it_work-2\" >&nbsp;How does it work?&nbsp;<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-19\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#ThreadPool_Asynchronous_IO_Tuning\" >ThreadPool &amp; Asynchronous I\/O Tuning<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-20\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#_What_it_is-2\" >&nbsp;What it is<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-21\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#_Why_you_would_use_it\" >&nbsp;Why you would use it<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-22\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#_How_it_works-2\" >&nbsp;How it works<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-23\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#Putting_It_All_Together_Real-Life_Benchmarking\" >Putting It All Together: Real-Life Benchmarking<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-24\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#_Optimized\" >&nbsp;Optimized:<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-25\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#Further_Reading_Resources\" >Further Reading &amp; Resources<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-26\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#Q1_What_impacts_NET_runtime_performance\" >Q1 : What impacts .NET runtime performance?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-27\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#Q2_How_does_tiered_compilation_improve_speed\" >Q2 : How does tiered compilation improve speed?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-28\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#Q3_When_should_I_tweak_Garbage_Collector_settings\" >Q3 : When should I tweak Garbage Collector settings?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-29\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#Why_use_Profile-Guided_Optimization_PGO\" >Why use Profile-Guided Optimization (PGO)?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-30\" href=\"https:\/\/www.dotnetdevelopers.us\/blogs\/inside-dotnet-runtime\/#To_summarize\" >To summarize,<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n\n<p>In 2025, <a href=\"https:\/\/www.dotnetdevelopers.us\/\" target=\"_blank\" data-type=\"link\" data-id=\"https:\/\/www.dotnetdevelopers.us\/\" rel=\"noreferrer noopener\">.NET<\/a> has developed into one of the most performant and flexible runtimes for cloud-native services and desktop applications. In the background, engineers are still pushing limits to have faster compilation, better garbage collection, improved threading, and more. In this post, you will learn about the new performance enhancements shipped in new .NET runtime, why they make a difference, and how they work in detail, so you can take every millisecond from your applications.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Tiered_Compilation_The_Goldilocks_Zone_between_Startup_and_Steady_State\"><\/span><strong>Tiered Compilation: The Goldilocks Zone between Startup and Steady State<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Modern applications need a snappy startup without giving up on long-running throughput. Tiered Compilation provides that sweet spot.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"_What_is_it\"><\/span><strong>&nbsp;What is it?<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>A two-phase JIT, where methods compile quickly initially, and then recompile in the background with a higher optimization level.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"_Why_we_use_it\"><\/span><strong>&nbsp;Why we use it<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>&nbsp;Instant responsiveness on application boot.<\/p>\n\n\n\n<p>&nbsp;Peak performance once hot paths have warmed up.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"_How_does_it_work\"><\/span><strong>&nbsp;How does it work?<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>&nbsp;On the first call, the JIT emits a &#8220;tier-0&#8221; version: where it inlines minimally, and uses basic register allocation.<\/p>\n\n\n\n<p>&nbsp;A background thread counts the method invocations.<\/p>\n\n\n\n<p>&nbsp;Hot methods (above a specific threshold) are queued for recompilation in a &#8220;tier-1&#8221; version \/ now with aggressive inlining, for loops unrolled, and PGO hints.<\/p>\n\n\n\n<p>&nbsp;Calls transparently transition to tier-1 once it is built, increasing throughput without a single line of code change.<\/p>\n\n\n\n<p>\/\/ Enable tiered compilation in runtimeconfig.json { &#8220;runtimeOptions&#8221;: { &#8220;configProperties&#8221;: { &#8220;System.Runtime.TieredCompilation&#8221;: true, &#8220;System.Runtime.TieredPGO&#8221;: true } } }&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"ReadyToRun_Crossgen2_Ahead_of_Time_Warmups\"><\/span><strong>ReadyToRun &amp; Crossgen2: Ahead of Time Warmups<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Just in time is not the only game. Ready To Run (R2R) images precompile IL to native code at publish time.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"What_it_is\"><\/span><strong>What it is<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>R2R is like a AOT snapshot of your assemblies. Crossgen2 will generate R2R DLLs and EXEs, that are optimized for the general hardware of the target OS.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"_Why_we_use_it-2\"><\/span>&nbsp;<strong>Why we use it<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>&nbsp;Cold starts are dramatically faster on cloud instances and containers.&nbsp;<\/p>\n\n\n\n<p>&nbsp;Greatly reduced JIT overhead in memory-constrained environments.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"_How_it_works\"><\/span>&nbsp;<strong>How it works<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>&nbsp;Finally, when you do a dotnet publish -c Release, Crossgen2 will read the IL and emit native stubs that are bound to the OS and CPU features.<\/p>\n\n\n\n<p>&nbsp;On first invocation, the stubs will either lazily invoke the full method, or directly embed method bodies in R2R sections.&nbsp;<\/p>\n\n\n\n<p>&nbsp;By the time the runtime loads the native code pages, you are not invoking IL through the JIT which can save up to 40% of startup time.<\/p>\n\n\n\n<p>dotnet publish -c Release \/p:PublishReadyToRun=true \/p:Crossgen2Composite=true&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Native_AOT_The_Final_Frontier\"><\/span><strong>Native AOT: The Final Frontier<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Pushing AOT to the next level, Native AOT turns your entire application into a single, stand-alone native binary.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"_What_it_is\"><\/span><strong>&nbsp;What it is<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>An ultra-modern experimental feature that gets rid of the JIT entirely.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Why_to_use_it\"><\/span><strong>Why to use it<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Lightning-fast startup speed (&lt; 10 ms in simple console apps).<\/p>\n\n\n\n<p>&nbsp;Very small memory footprint\u2014no JIT code cache at runtime.<\/p>\n\n\n\n<p>&nbsp;Easiest deployment experience\u2014you deploy one native executable with no framework dependency.<\/p>\n\n\n\n<p>&nbsp;How it works.<\/p>\n\n\n\n<p>&nbsp;Ahead-of-Time compilation takes IL, its dependencies, and the GC and compiles them all into a native image.<\/p>\n\n\n\n<p>&nbsp;The IL is translated into platform instructions, and linked against a trimmed runtime.<\/p>\n\n\n\n<p>&nbsp;Unreferenced reflection metadata, and unreferenced APIs get tree-shaken out of the native image to make it smaller.<\/p>\n\n\n\n<p>&lt;!&#8211; in your .csproj &#8211;&gt;<\/p>\n\n\n\n<p>&lt;PropertyGroup&gt;<\/p>\n\n\n\n<p>&nbsp;&lt;PublishAot&gt;true&lt;\/PublishAot&gt;<\/p>\n\n\n\n<p>&nbsp;&lt;TrimUnusedDependencies&gt;true&lt;\/TrimUnusedDependencies&gt;<\/p>\n\n\n\n<p>&lt;\/PropertyGroup&gt;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Improving_Garbage_Collectors_Performance_Trade-Offs_with_Latency_and_Throughput\"><\/span><strong>Improving Garbage Collectors: Performance Trade-Offs with Latency and Throughput&nbsp;<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Garbage collection scenarios have evolved: consider the need for trade-offs across real-time game development, high-throughput web APIs, and various IoT devices.&nbsp;<\/p>\n\n\n\n<p>What it is: A set of runtime flags and new GC modes\u2014No-GC Regions, improved low-latency concurrent GC, and more.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Why_we_use_it\"><\/span><strong>Why we use it:&nbsp;<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>&nbsp;To eliminate hiccups in code that is latency-sensitive (e.g., the game loop, finance).&nbsp;<\/p>\n\n\n\n<p>&nbsp;To tune throughput to maximize batch-processing pipelines.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"_How_to_use_it\"><\/span><strong>&nbsp;How to use it:&nbsp;<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>No-GC Regions: You reserve a contiguous area of memory and suspend garbage collection while some critical work completes.<\/p>\n\n\n\n<p>Dynamic Heap Sizing: the GC can dynamically adjust the upper and lower bounds of the heap to match the running application, instead of doubling the memory consumed notoriously \u2018out-of-nowhere.\u2019<\/p>\n\n\n\n<p>Thread-Local Heaps: Each thread has its own heap (created on worker thread startup) to run the smaller heaps and reduce synchronization to eradicate cross-thread synchronization, even eliminating pauses altogether as well as throughput.<\/p>\n\n\n\n<p>\/\/ Start a no-GC region<\/p>\n\n\n\n<p>GC.TryStartNoGCRegion(10_000_000);<\/p>\n\n\n\n<p>\/\/ Do requested, very critical low-latency<\/p>\n\n\n\n<p>GC.EndNoGCRegion();<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"JIT_Evolution_PGO\"><\/span><strong>JIT Evolution &amp; PGO<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The JIT has undergone microarchitectural optimization and can now perform hardware-aware optimizations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"_What_is_it-2\"><\/span><strong>&nbsp;What is it?&nbsp;<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Jitted heuristics that, based on runtime hotspots, specialize code paths for common use cases.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Why_do_we_use_it\"><\/span><strong>Why do we use it?<\/strong>&nbsp;<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>&nbsp;Use real-world usage to improve inlining decisions.&nbsp;<\/p>\n\n\n\n<p>Create SIMD accelerated loops on CPUs supporting AVX-512 or SVE.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"_How_does_it_work-2\"><\/span>&nbsp;<strong>How does it work?<\/strong>&nbsp;<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>&nbsp;Collect the profile while a production or staging run using dotnet trace and dotnet pgo.&nbsp;<\/p>\n\n\n\n<p>&nbsp;Feed the profile back into Crossgen2 or JIT using &#8211;pgo-data to influence the branch layout and inlining budgets.&nbsp;<\/p>\n\n\n\n<p>&nbsp;The JIT emits code paths gated with CPU feature checks so vectorization is only active when its safe.&nbsp;<\/p>\n\n\n\n<p># Collect PGO data<\/p>\n\n\n\n<p>dotnet pgo collect &#8211;output pgo-data.nettrace MyApp.dll<\/p>\n\n\n\n<p># Apply PGO during AOT<\/p>\n\n\n\n<p>dotnet publish \/p:PublishReadyToRun=true \/p:PGOData=pgo-data.nettrace<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"ThreadPool_Asynchronous_IO_Tuning\"><\/span><strong>ThreadPool &amp; Asynchronous I\/O Tuning<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Efficient concurrency is at the heart of scalable servers and UI-responsive clients.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"_What_it_is-2\"><\/span><strong>&nbsp;What it is<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Adaptive ThreadPool heuristics and, for async-heavy apps, a more effective way to schedule I\/O operations on Unix and Windows.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"_Why_you_would_use_it\"><\/span><strong>&nbsp;Why you would use it<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>&nbsp;Prevent thread starvation under bursty workloads.<\/p>\n\n\n\n<p>&nbsp;Achieve maximum throughput on async-heavy services (e.g. HTTP, gRPC).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"_How_it_works-2\"><\/span><strong>&nbsp;How it works<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>&nbsp;Dynamic Worker Adjustment &#8211; The runtime checks on queue lengths continuously, and then can add\/remove worker threads in a more aggressive manner.<\/p>\n\n\n\n<p>&nbsp;IOCP enhancements &#8211; On Windows, I\/O completion ports can now batch completions, reducing the number of syscalls.<\/p>\n\n\n\n<p>&nbsp;epoll\/kqueue enhancements &#8211; On Linux and macOS, you can poll file descriptors (sockets) that scale to millions of sockets with near constant CPU usage.<\/p>\n\n\n\n<p>\/\/ Set minimum\/maximum threads at startup<\/p>\n\n\n\n<p>ThreadPool.SetMinThreads(workerThreads: 200, completionPortThreads: 200);&nbsp;<\/p>\n\n\n\n<p>ThreadPool.SetMaxThreads(workerThreads: 1000, completionPortThreads: 1000);&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Putting_It_All_Together_Real-Life_Benchmarking\"><\/span><strong>Putting It All Together: Real-Life Benchmarking<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>&nbsp;Case: ASP.NET Core API at ~10,000 concurrent connections<\/p>\n\n\n\n<p>&nbsp;Standard: default runtime settings, cold start ~350 ms, p95 latency ~120 ms.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"_Optimized\"><\/span>&nbsp;<strong>Optimized:<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>&nbsp;Tiered + PGO + R2R<\/p>\n\n\n\n<p>&nbsp;ThreadPool tuned<\/p>\n\n\n\n<p>&nbsp;Low-latency GC<\/p>\n\n\n\n<p>&nbsp;Cold start ~90 ms, p95 latency ~45 ms, 1.8\u00d7 throughput improvement.<\/p>\n\n\n\n<p># Measure with BenchmarkDotNet<\/p>\n\n\n\n<p>dotnet run -c Release &#8212; &#8211;project Benchmarks\/MyAspNetAPI.csproj<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Further_Reading_Resources\"><\/span><strong>Further Reading &amp; Resources<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>.NET Runtime Performance Guide:<a href=\"https:\/\/docs.microsoft.com\/dotnet\/guide\/performance\" rel=\"nofollow noopener\" target=\"_blank\"> https:\/\/docs.microsoft.com\/dotnet\/guide\/performance<\/a><\/li>\n\n\n\n<li>Tiered Compilation Deep Dive:<a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/tiered-jit\" rel=\"nofollow noopener\" target=\"_blank\"> https:\/\/devblogs.microsoft.com\/dotnet\/tiered-jit<\/a><\/li>\n\n\n\n<li>Crossgen2 and Native AOT:<a href=\"https:\/\/docs.microsoft.com\/dotnet\/core\/deploying\/native-aot\" rel=\"nofollow noopener\" target=\"_blank\"> https:\/\/docs.microsoft.com\/dotnet\/core\/deploying\/native-aot<\/a><\/li>\n\n\n\n<li>Garbage Collection Modes:<a href=\"https:\/\/docs.microsoft.com\/dotnet\/standard\/garbage-collection\" rel=\"nofollow noopener\" target=\"_blank\"> https:\/\/docs.microsoft.com\/dotnet\/standard\/garbage-collection<\/a><\/li>\n\n\n\n<li>BenchmarkDotNet:<a href=\"https:\/\/benchmarkdotnet.org\/\" rel=\"nofollow noopener\" target=\"_blank\"> https:\/\/benchmarkdotnet.org\/<\/a><\/li>\n<\/ul>\n\n\n\n<p class=\"has-text-align-center has-large-font-size\">FAQS<\/p>\n\n\n<div id=\"rank-math-faq\" class=\"rank-math-block\">\n<div class=\"rank-math-list \">\n<div id=\"faq-question-1755592198206\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><span class=\"ez-toc-section\" id=\"Q1_What_impacts_NET_runtime_performance\"><\/span>Q1 : What impacts .NET runtime performance?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>A : Primary factors are the JIT compiler, Garbage Collector, ThreadPool management, and hardware intrinsics.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1755592276610\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><span class=\"ez-toc-section\" id=\"Q2_How_does_tiered_compilation_improve_speed\"><\/span>Q2 : How does tiered compilation improve speed?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>A : It emits a fast, low-optimization \u201ctier-0\u201d build at startup, then recompiles hot methods with full optimizations in the background.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1755592314484\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><span class=\"ez-toc-section\" id=\"Q3_When_should_I_tweak_Garbage_Collector_settings\"><\/span>Q3 : When should I tweak Garbage Collector settings?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>A : For low-latency or high-throughput services: use Server GC, enable concurrent collection, and monitor pause times to guide tuning.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1755592341628\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><span class=\"ez-toc-section\" id=\"Why_use_Profile-Guided_Optimization_PGO\"><\/span>Why use Profile-Guided Optimization (PGO)?<br><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>PGO uses real-world call-count and branch data to drive inlining and code layout, unlocking SIMD paths and better hot-path performance.<\/p>\n\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"To_summarize\"><\/span><strong>To summarize,<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>In 2025, the .NET runtime has made great strides, turning into a fully-tunable runtime that works well in a wide range of scenarios, from IoT devices and games to hyperscale cloud backends. With tiered JIT, ReadyToRun, Native AOT, GC Modes, PGO, ThreadPool optimizations, you can tune the balance between startup performance, throughput, and latency for your workload. Start tuning knobs today, measure performance impact with dotnet-trace and BenchmarkDotNet, and watch your app fly.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In 2025, .NET has developed into one of the most performant and flexible runtimes for cloud-native services and desktop applications. In the background, engineers are still pushing limits to have faster compilation, better garbage collection, improved threading, and more. In this post, you will learn about the new performance enhancements shipped in new .NET runtime, why they make a difference, and how they work in detail, so you can take every millisecond from your applications. Tiered Compilation: The Goldilocks Zone between Startup and Steady State Modern applications need a snappy startup without giving up on long-running throughput. Tiered Compilation provides that sweet spot. &nbsp;What is it? A two-phase JIT, where methods compile quickly initially, and then recompile in the background with a higher optimization level. &nbsp;Why we use it &nbsp;Instant responsiveness on application boot. &nbsp;Peak performance once hot paths have warmed up. &nbsp;How does it work? &nbsp;On the first call, the JIT emits a &#8220;tier-0&#8221; version: where it inlines minimally, and uses basic register allocation. &nbsp;A background thread counts the method invocations. &nbsp;Hot methods (above a specific threshold) are queued for recompilation in a &#8220;tier-1&#8221; version \/ now with aggressive inlining, for loops unrolled, and PGO hints. &nbsp;Calls transparently transition to tier-1 once it is built, increasing throughput without a single line of code change. \/\/ Enable tiered compilation in runtimeconfig.json { &#8220;runtimeOptions&#8221;: { &#8220;configProperties&#8221;: { &#8220;System.Runtime.TieredCompilation&#8221;: true, &#8220;System.Runtime.TieredPGO&#8221;: true } } }&nbsp; ReadyToRun &amp; Crossgen2: Ahead of Time Warmups Just in time is not the only game. Ready To Run (R2R) images precompile IL to native code at publish time.&nbsp; What it is R2R is like a AOT snapshot of your assemblies. Crossgen2 will generate R2R DLLs and EXEs, that are optimized for the general hardware of the target OS. &nbsp;Why we use it &nbsp;Cold starts are dramatically faster on cloud instances and containers.&nbsp; &nbsp;Greatly reduced JIT overhead in memory-constrained environments.&nbsp; &nbsp;How it works &nbsp;Finally, when you do a dotnet publish -c Release, Crossgen2 will read the IL and emit native stubs that are bound to the OS and CPU features. &nbsp;On first invocation, the stubs will either lazily invoke the full method, or directly embed method bodies in R2R sections.&nbsp; &nbsp;By the time the runtime loads the native code pages, you are not invoking IL through the JIT which can save up to 40% of startup time. dotnet publish -c Release \/p:PublishReadyToRun=true \/p:Crossgen2Composite=true&nbsp; Native AOT: The Final Frontier Pushing AOT to the next level, Native AOT turns your entire application into a single, stand-alone native binary. &nbsp;What it is An ultra-modern experimental feature that gets rid of the JIT entirely. Why to use it Lightning-fast startup speed (&lt; 10 ms in simple console apps). &nbsp;Very small memory footprint\u2014no JIT code cache at runtime. &nbsp;Easiest deployment experience\u2014you deploy one native executable with no framework dependency. &nbsp;How it works. &nbsp;Ahead-of-Time compilation takes IL, its dependencies, and the GC and compiles them all into a native image. &nbsp;The IL is translated into platform instructions, and linked against a trimmed runtime. &nbsp;Unreferenced reflection metadata, and unreferenced APIs get tree-shaken out of the native image to make it smaller. &lt;!&#8211; in your .csproj &#8211;&gt; &lt;PropertyGroup&gt; &nbsp;&lt;PublishAot&gt;true&lt;\/PublishAot&gt; &nbsp;&lt;TrimUnusedDependencies&gt;true&lt;\/TrimUnusedDependencies&gt; &lt;\/PropertyGroup&gt; Improving Garbage Collectors: Performance Trade-Offs with Latency and Throughput&nbsp; Garbage collection scenarios have evolved: consider the need for trade-offs across real-time game development, high-throughput web APIs, and various IoT devices.&nbsp; What it is: A set of runtime flags and new GC modes\u2014No-GC Regions, improved low-latency concurrent GC, and more.&nbsp; Why we use it:&nbsp; &nbsp;To eliminate hiccups in code that is latency-sensitive (e.g., the game loop, finance).&nbsp; &nbsp;To tune throughput to maximize batch-processing pipelines. &nbsp;How to use it:&nbsp; No-GC Regions: You reserve a contiguous area of memory and suspend garbage collection while some critical work completes. Dynamic Heap Sizing: the GC can dynamically adjust the upper and lower bounds of the heap to match the running application, instead of doubling the memory consumed notoriously \u2018out-of-nowhere.\u2019 Thread-Local Heaps: Each thread has its own heap (created on worker thread startup) to run the smaller heaps and reduce synchronization to eradicate cross-thread synchronization, even eliminating pauses altogether as well as throughput. \/\/ Start a no-GC region GC.TryStartNoGCRegion(10_000_000); \/\/ Do requested, very critical low-latency GC.EndNoGCRegion(); JIT Evolution &amp; PGO The JIT has undergone microarchitectural optimization and can now perform hardware-aware optimizations. &nbsp;What is it?&nbsp; Jitted heuristics that, based on runtime hotspots, specialize code paths for common use cases. Why do we use it?&nbsp; &nbsp;Use real-world usage to improve inlining decisions.&nbsp; Create SIMD accelerated loops on CPUs supporting AVX-512 or SVE. &nbsp;How does it work?&nbsp; &nbsp;Collect the profile while a production or staging run using dotnet trace and dotnet pgo.&nbsp; &nbsp;Feed the profile back into Crossgen2 or JIT using &#8211;pgo-data to influence the branch layout and inlining budgets.&nbsp; &nbsp;The JIT emits code paths gated with CPU feature checks so vectorization is only active when its safe.&nbsp; # Collect PGO data dotnet pgo collect &#8211;output pgo-data.nettrace MyApp.dll # Apply PGO during AOT dotnet publish \/p:PublishReadyToRun=true \/p:PGOData=pgo-data.nettrace ThreadPool &amp; Asynchronous I\/O Tuning Efficient concurrency is at the heart of scalable servers and UI-responsive clients. &nbsp;What it is Adaptive ThreadPool heuristics and, for async-heavy apps, a more effective way to schedule I\/O operations on Unix and Windows. &nbsp;Why you would use it &nbsp;Prevent thread starvation under bursty workloads. &nbsp;Achieve maximum throughput on async-heavy services (e.g. HTTP, gRPC). &nbsp;How it works &nbsp;Dynamic Worker Adjustment &#8211; The runtime checks on queue lengths continuously, and then can add\/remove worker threads in a more aggressive manner. &nbsp;IOCP enhancements &#8211; On Windows, I\/O completion ports can now batch completions, reducing the number of syscalls. &nbsp;epoll\/kqueue enhancements &#8211; On Linux and macOS, you can poll file descriptors (sockets) that scale to millions of sockets with near constant CPU usage. \/\/ Set minimum\/maximum threads at startup ThreadPool.SetMinThreads(workerThreads: 200, completionPortThreads: 200);&nbsp; ThreadPool.SetMaxThreads(workerThreads: 1000, completionPortThreads: 1000);&nbsp; Putting It All Together: Real-Life Benchmarking &nbsp;Case: ASP.NET Core API at ~10,000 concurrent connections &nbsp;Standard: default runtime settings, cold start ~350 ms, p95 latency ~120 ms. &nbsp;Optimized: &nbsp;Tiered + PGO + R2R &nbsp;ThreadPool tuned &nbsp;Low-latency GC &nbsp;Cold start ~90 ms, p95 latency ~45 ms, 1.8\u00d7 throughput improvement. # Measure with BenchmarkDotNet dotnet run -c Release &#8212; &#8211;project Benchmarks\/MyAspNetAPI.csproj Further Reading &amp; Resources FAQS To summarize, In 2025, the .NET runtime has made great strides, turning into a fully-tunable runtime that works well in a wide range of scenarios, from IoT devices and games to hyperscale cloud backends. With tiered JIT, ReadyToRun, Native AOT, GC Modes, PGO, ThreadPool optimizations, you can tune the balance between startup performance, throughput, and latency for your workload. Start tuning knobs today, measure performance impact with dotnet-trace and BenchmarkDotNet, and watch your app fly.<\/p>\n","protected":false},"author":1,"featured_media":321,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[2],"tags":[],"class_list":["post-320","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-development"],"_links":{"self":[{"href":"https:\/\/www.dotnetdevelopers.us\/blogs\/wp-json\/wp\/v2\/posts\/320","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dotnetdevelopers.us\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dotnetdevelopers.us\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dotnetdevelopers.us\/blogs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dotnetdevelopers.us\/blogs\/wp-json\/wp\/v2\/comments?post=320"}],"version-history":[{"count":3,"href":"https:\/\/www.dotnetdevelopers.us\/blogs\/wp-json\/wp\/v2\/posts\/320\/revisions"}],"predecessor-version":[{"id":341,"href":"https:\/\/www.dotnetdevelopers.us\/blogs\/wp-json\/wp\/v2\/posts\/320\/revisions\/341"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dotnetdevelopers.us\/blogs\/wp-json\/wp\/v2\/media\/321"}],"wp:attachment":[{"href":"https:\/\/www.dotnetdevelopers.us\/blogs\/wp-json\/wp\/v2\/media?parent=320"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dotnetdevelopers.us\/blogs\/wp-json\/wp\/v2\/categories?post=320"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dotnetdevelopers.us\/blogs\/wp-json\/wp\/v2\/tags?post=320"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}