Heterogeneous CPU topologies are becoming commonplace, but how are they utilized in pratice? By default, the Linux thread scheduler should schedule work on a P-core until all P-core threads are busy, then start scheduling on E-cores. Perfetto provide a convenient mechanism for visualizing this scheduling policy.

Thread Topology

lstopo provides a useful visual summary of the CPU core topology. As an example, here’s the view of my i9-12900KF (AlderLake) CPU with all cores online. Threads 0-15 are the 8 P-cores; two threads for each core because Hyper-Threading is enabled. Threads 16-23 are the E-cores.

ADL CPU Core Topology

Cores can be offlined echoing 0 to /sys/devices/system/cpu/cpuN/online; offline cores are not present in the lstopo visualization.

Scheduling Visualization

The excellent Perfetto tool makes it simple to visualize thread scheduling. To visualize the scheduling for the Cyberpunk 2077 benchmark:

  1. Build Perfetto: https://perfetto.dev/docs/quickstart/linux-tracing
  2. In a terminal, run tracing command:
sudo out/linux/tracebox -o trace_file.perfetto-trace --txt -c test/configs/scheduling.cfg
  1. In a second terminal, launch the benchmark:
steam -applaunch 1091500 --launcher-skip -benchmark

Once the benchmark is complete, run sudo chmod 755 trace_file.perfetto-trace to make the trace readable by non-root users and open it in the Perfetto UI. Here’s an example run:

Perfetto Scheduling Viz

This data shows work evenly distributed across the available CPU threads. This is good, because Cyberpunk is a DX12 game that should leverage multi-threading where possible.

The default trace configuration captures just a few megabytes of data, discarding the rest. This behavior can be modified by editing the trace configuration.

Scheduling Constraints

To constrain the benchmark such that it only runs on P-cores:

killall steam && taskset -c 0-15 steam -applaunch 1091500 --launcher-skip -benchmark

We must terminate Steam before launching the benchmark; otherwise Steam will signal the current running instance and terminate instead of running the benchmark inside our affinity-constrained environment.