Thermal management on embedded Linux platforms is not simply a matter of preventing overheating; it is a highly integrated balancing act involving hardware design, kernel subsystems, and runtime performance control. In embedded devices—whether a single-board computer tucked inside an IoT gateway, an automotive infotainment unit, or an industrial controller—the environmental conditions and workload profiles can vary dramatically. These systems often run in constrained enclosures, with limited or no active cooling, and must still deliver consistent performance while ensuring long-term reliability of the hardware. The Linux kernel provides two tightly connected frameworks to handle this challenge: the thermal subsystem and the CPU frequency scaling (CPUFreq) subsystem.
At the heart of thermal management is the concept of a thermal zone—a logical grouping of temperature sensors that the kernel monitors to make cooling decisions. A thermal zone may represent a physical SoC die, a CPU cluster, a GPU, or even a power management IC. Each zone can have multiple trip points, which are temperature thresholds triggering specific actions. Passive trip points generally reduce system performance to lower heat generation, while active trip points can trigger cooling hardware, such as fans or liquid pumps. The elegance of the Linux thermal framework is that it allows these trip points to directly influence CPU operating frequencies, GPU clock rates, and other subsystem power states.
Understanding the Kernel Thermal Subsystem
When Linux boots, it enumerates thermal sensors exposed by the platform. These can be integrated into the SoC itself (e.g., ARM thermal monitor units), provided by a PMIC over I²C, or even external digital sensors connected via SPI. The kernel maps these sensors into /sys/class/thermal/, where each appears as a thermal_zoneX. The naming and count depend on the hardware and its device tree (or ACPI tables in x86).
For example:
ls /sys/class/thermal/
cooling_device0 cooling_device1 thermal_zone0 thermal_zone1A typical thermal zone directory might contain:
cat /sys/class/thermal/thermal_zone0/type
x86_pkg_temp
cat /sys/class/thermal/thermal_zone0/temp
65000Here, the temp value 65000 represents 65°C in millidegrees Celsius. This zone could have trip points defined:
cat /sys/class/thermal/thermal_zone0/trip_point_0_temp
75000
cat /sys/class/thermal/thermal_zone0/trip_point_0_type
passiveThis means that at 75°C, the kernel will begin passive cooling, which usually involves reducing CPU or GPU frequencies.
The thermal subsystem works hand in hand with cooling devices, which are drivers implementing methods to lower temperature. Cooling devices can be:
- Passive: Lower CPU frequency via CPUFreq.
- Active: Engage fans, pumps, or blowers.
- Device-specific: Lower GPU clocks, DDR speed, or disable blocks.
Cooling devices are mapped to thermal zones via cooling maps, often defined in the device tree.
Dynamic CPU Frequency Scaling (CPUFreq)
The CPUFreq subsystem allows the kernel to dynamically adjust CPU clock speeds based on load, thermal needs, or power constraints. On embedded Linux systems, this often works with Dynamic Voltage and Frequency Scaling (DVFS), where both CPU frequency and voltage are adjusted together to balance performance and power usage.
CPUFreq policies are visible under:
ls /sys/devices/system/cpu/cpu0/cpufreq/Key files include:
scaling_cur_freq— current CPU frequency.scaling_available_frequencies— all possible CPU frequencies.scaling_governor— the active frequency scaling policy.scaling_max_freq— the maximum allowed frequency.
Example:
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
1200000This means CPU0 is currently running at 1.2 GHz.
Governors define how frequency changes are made:
performance— always run at max frequency.powersave— always run at lowest frequency.ondemand— quickly scale up when load increases.schedutil— integrate scaling with the kernel scheduler.
When thermal events occur, the thermal governor can temporarily lower the maximum frequency (scaling_max_freq) until the temperature drops.
Device Tree Integration for Thermal Zones and DVFS
On embedded platforms without ACPI, the Device Tree (DT) describes thermal zones, sensors, trip points, and cooling maps. A typical thermal zone DT snippet might look like:
thermal-zones {
cpu_thermal: cpu-thermal {
polling-delay-passive = <250>; /* ms */
polling-delay = <1000>; /* ms */
thermal-sensors = <&tsens0 0>;
trips {
cpu_alert0: trip0 {
temperature = <75000>; /* millidegree Celsius */
hysteresis = <2000>;
type = "passive";
};
cpu_crit: trip1 {
temperature = <95000>;
hysteresis = <2000>;
type = "critical";
};
};
cooling-maps {
map0 {
trip = <&cpu_alert0>;
cooling-device = <&cpu0_cooling>;
};
};
};
};Here, the cpu_thermal zone uses sensor tsens0 and defines two trip points. When cpu_alert0 is reached, it uses the cpu0_cooling device to lower CPU frequency.
big.LITTLE and Per-Cluster DVFS
On ARM big.LITTLE systems, different CPU clusters may have separate thermal and frequency scaling policies. For example, high-performance cores may be throttled first to maintain responsiveness on low-power cores. The thermal subsystem allows separate cooling devices per CPU policy, meaning one cluster can be slowed down independently of the other.
You can inspect policies:
ls /sys/devices/system/cpu/cpufreq/
policy0 policy4Where policy0 might control LITTLE cores and policy4 the big cores.
Practical Debugging and Testing
To observe thermal behavior in action, you can:
stress-ng --cpu 4 --timeout 60s
watch -n 1 cat /sys/class/thermal/thermal_zone0/temp
watch -n 1 cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freqYou’ll see temperature rise and the CPU frequency drop as the passive trip point is crossed.
Kernel logs show cooling actions:
dmesg | grep -i thermalReal-World Embedded Scenarios
- Fanless Industrial Gateway: Uses only passive cooling. Aggressive trip points keep temperature under 80°C but sacrifice peak performance under sustained loads.
- Automotive Infotainment: Uses active fans at high trip points to maintain smooth playback and UI responsiveness even in hot climates.
- IoT Camera Module: Reduces both CPU and image signal processor (ISP) clocks under thermal stress to avoid heat distortion in sensors.
Long-Duration Thermal Validation
In production validation, devices undergo thermal soak testing:
stress-ng --cpu 4 --vm 2 --vm-bytes 256M --timeout 6hTemperatures are logged:
while true; do
date >> temp.log
cat /sys/class/thermal/thermal_zone0/temp >> temp.log
sleep 1
doneThis verifies that thermal controls prevent overheating without excessive performance loss.