In 2025, .NET has developed into one of the most performant and flexible runtimes for cloud-native services and desktop applications. In the background, engineers are still pushing limits to have faster compilation, better garbage collection, improved threading, and more. In this post, you will learn about the new performance enhancements shipped in new .NET runtime, why they make a difference, and how they work in detail, so you can take every millisecond from your applications.
Tiered Compilation: The Goldilocks Zone between Startup and Steady State
Modern applications need a snappy startup without giving up on long-running throughput. Tiered Compilation provides that sweet spot.
What is it?
A two-phase JIT, where methods compile quickly initially, and then recompile in the background with a higher optimization level.
Why we use it
Instant responsiveness on application boot.
Peak performance once hot paths have warmed up.
How does it work?
On the first call, the JIT emits a “tier-0” version: where it inlines minimally, and uses basic register allocation.
A background thread counts the method invocations.
Hot methods (above a specific threshold) are queued for recompilation in a “tier-1” version / now with aggressive inlining, for loops unrolled, and PGO hints.
Calls transparently transition to tier-1 once it is built, increasing throughput without a single line of code change.
// Enable tiered compilation in runtimeconfig.json { “runtimeOptions”: { “configProperties”: { “System.Runtime.TieredCompilation”: true, “System.Runtime.TieredPGO”: true } } }
ReadyToRun & Crossgen2: Ahead of Time Warmups
Just in time is not the only game. Ready To Run (R2R) images precompile IL to native code at publish time.
What it is
R2R is like a AOT snapshot of your assemblies. Crossgen2 will generate R2R DLLs and EXEs, that are optimized for the general hardware of the target OS.
Why we use it
Cold starts are dramatically faster on cloud instances and containers.
Greatly reduced JIT overhead in memory-constrained environments.
How it works
Finally, when you do a dotnet publish -c Release, Crossgen2 will read the IL and emit native stubs that are bound to the OS and CPU features.
On first invocation, the stubs will either lazily invoke the full method, or directly embed method bodies in R2R sections.
By the time the runtime loads the native code pages, you are not invoking IL through the JIT which can save up to 40% of startup time.
dotnet publish -c Release /p:PublishReadyToRun=true /p:Crossgen2Composite=true
Native AOT: The Final Frontier
Pushing AOT to the next level, Native AOT turns your entire application into a single, stand-alone native binary.
What it is
An ultra-modern experimental feature that gets rid of the JIT entirely.
Why to use it
Lightning-fast startup speed (< 10 ms in simple console apps).
Very small memory footprint—no JIT code cache at runtime.
Easiest deployment experience—you deploy one native executable with no framework dependency.
How it works.
Ahead-of-Time compilation takes IL, its dependencies, and the GC and compiles them all into a native image.
The IL is translated into platform instructions, and linked against a trimmed runtime.
Unreferenced reflection metadata, and unreferenced APIs get tree-shaken out of the native image to make it smaller.
<!– in your .csproj –>
<PropertyGroup>
<PublishAot>true</PublishAot>
<TrimUnusedDependencies>true</TrimUnusedDependencies>
</PropertyGroup>
Improving Garbage Collectors: Performance Trade-Offs with Latency and Throughput
Garbage collection scenarios have evolved: consider the need for trade-offs across real-time game development, high-throughput web APIs, and various IoT devices.
What it is: A set of runtime flags and new GC modes—No-GC Regions, improved low-latency concurrent GC, and more.
Why we use it:
To eliminate hiccups in code that is latency-sensitive (e.g., the game loop, finance).
To tune throughput to maximize batch-processing pipelines.
How to use it:
No-GC Regions: You reserve a contiguous area of memory and suspend garbage collection while some critical work completes.
Dynamic Heap Sizing: the GC can dynamically adjust the upper and lower bounds of the heap to match the running application, instead of doubling the memory consumed notoriously ‘out-of-nowhere.’
Thread-Local Heaps: Each thread has its own heap (created on worker thread startup) to run the smaller heaps and reduce synchronization to eradicate cross-thread synchronization, even eliminating pauses altogether as well as throughput.
// Start a no-GC region
GC.TryStartNoGCRegion(10_000_000);
// Do requested, very critical low-latency
GC.EndNoGCRegion();
JIT Evolution & PGO
The JIT has undergone microarchitectural optimization and can now perform hardware-aware optimizations.
What is it?
Jitted heuristics that, based on runtime hotspots, specialize code paths for common use cases.
Why do we use it?
Use real-world usage to improve inlining decisions.
Create SIMD accelerated loops on CPUs supporting AVX-512 or SVE.
How does it work?
Collect the profile while a production or staging run using dotnet trace and dotnet pgo.
Feed the profile back into Crossgen2 or JIT using –pgo-data to influence the branch layout and inlining budgets.
The JIT emits code paths gated with CPU feature checks so vectorization is only active when its safe.
# Collect PGO data
dotnet pgo collect –output pgo-data.nettrace MyApp.dll
# Apply PGO during AOT
dotnet publish /p:PublishReadyToRun=true /p:PGOData=pgo-data.nettrace
ThreadPool & Asynchronous I/O Tuning
Efficient concurrency is at the heart of scalable servers and UI-responsive clients.
What it is
Adaptive ThreadPool heuristics and, for async-heavy apps, a more effective way to schedule I/O operations on Unix and Windows.
Why you would use it
Prevent thread starvation under bursty workloads.
Achieve maximum throughput on async-heavy services (e.g. HTTP, gRPC).
How it works
Dynamic Worker Adjustment – The runtime checks on queue lengths continuously, and then can add/remove worker threads in a more aggressive manner.
IOCP enhancements – On Windows, I/O completion ports can now batch completions, reducing the number of syscalls.
epoll/kqueue enhancements – On Linux and macOS, you can poll file descriptors (sockets) that scale to millions of sockets with near constant CPU usage.
// Set minimum/maximum threads at startup
ThreadPool.SetMinThreads(workerThreads: 200, completionPortThreads: 200);
ThreadPool.SetMaxThreads(workerThreads: 1000, completionPortThreads: 1000);
Putting It All Together: Real-Life Benchmarking
Case: ASP.NET Core API at ~10,000 concurrent connections
Standard: default runtime settings, cold start ~350 ms, p95 latency ~120 ms.
Optimized:
Tiered + PGO + R2R
ThreadPool tuned
Low-latency GC
Cold start ~90 ms, p95 latency ~45 ms, 1.8× throughput improvement.
# Measure with BenchmarkDotNet
dotnet run -c Release — –project Benchmarks/MyAspNetAPI.csproj
Further Reading & Resources
- .NET Runtime Performance Guide: https://docs.microsoft.com/dotnet/guide/performance
- Tiered Compilation Deep Dive: https://devblogs.microsoft.com/dotnet/tiered-jit
- Crossgen2 and Native AOT: https://docs.microsoft.com/dotnet/core/deploying/native-aot
- Garbage Collection Modes: https://docs.microsoft.com/dotnet/standard/garbage-collection
- BenchmarkDotNet: https://benchmarkdotnet.org/
FAQS
Q1 : What impacts .NET runtime performance?
A : Primary factors are the JIT compiler, Garbage Collector, ThreadPool management, and hardware intrinsics.
Q2 : How does tiered compilation improve speed?
A : It emits a fast, low-optimization “tier-0” build at startup, then recompiles hot methods with full optimizations in the background.
Q3 : When should I tweak Garbage Collector settings?
A : For low-latency or high-throughput services: use Server GC, enable concurrent collection, and monitor pause times to guide tuning.
Why use Profile-Guided Optimization (PGO)?
PGO uses real-world call-count and branch data to drive inlining and code layout, unlocking SIMD paths and better hot-path performance.
To summarize,
In 2025, the .NET runtime has made great strides, turning into a fully-tunable runtime that works well in a wide range of scenarios, from IoT devices and games to hyperscale cloud backends. With tiered JIT, ReadyToRun, Native AOT, GC Modes, PGO, ThreadPool optimizations, you can tune the balance between startup performance, throughput, and latency for your workload. Start tuning knobs today, measure performance impact with dotnet-trace and BenchmarkDotNet, and watch your app fly.