2 minutes
Linux CPU Thread Scheduling Analysis
Heterogeneous CPU topologies are becoming commonplace, but how are they utilized in pratice? By default, the Linux thread scheduler should schedule work on a P-core until all P-core threads are busy, then start scheduling on E-cores. Perfetto provide a convenient mechanism for visualizing this scheduling policy.
Thread Topology
lstopo
provides
a useful visual summary of the CPU core topology. As an example, here’s the view of my i9-12900KF
(AlderLake) CPU with all cores online. Threads 0-15 are the 8 P-cores; two threads for each core because
Hyper-Threading is enabled. Threads 16-23 are the E-cores.
Cores can be offlined echoing 0 to /sys/devices/system/cpu/cpuN/online
; offline cores are
not present in the lstopo
visualization.
Scheduling Visualization
The excellent Perfetto tool makes it simple to visualize thread scheduling. To visualize the scheduling for the Cyberpunk 2077 benchmark:
- Build Perfetto: https://perfetto.dev/docs/quickstart/linux-tracing
- In a terminal, run tracing command:
sudo out/linux/tracebox -o trace_file.perfetto-trace --txt -c test/configs/scheduling.cfg
- In a second terminal, launch the benchmark:
steam -applaunch 1091500 --launcher-skip -benchmark
Once the benchmark is complete, run sudo chmod 755 trace_file.perfetto-trace
to make the
trace readable by non-root users and open it in the Perfetto UI.
Here’s an example run:
This data shows work evenly distributed across the available CPU threads. This is good, because Cyberpunk is a DX12 game that should leverage multi-threading where possible.
The default trace configuration captures just a few megabytes of data, discarding the rest. This behavior can be modified by editing the trace configuration.
Scheduling Constraints
To constrain the benchmark such that it only runs on P-cores:
killall steam && taskset -c 0-15 steam -applaunch 1091500 --launcher-skip -benchmark
We must terminate Steam before launching the benchmark; otherwise Steam will signal the current running instance and terminate instead of running the benchmark inside our affinity-constrained environment.