Thermal Guard Daemon
The thermal guard daemon watches local temperatures and top CPU consumers, logs snapshots/actions, and applies targeted mitigations when configured rules trip.
Files
- Runtime:
/home/david/random/bin/thermal_guard_daemon.py - User config:
/home/david/.config/thermal-guard/config.json - User unit:
/home/david/.config/systemd/user/thermal-guard.service - Latest snapshot:
~/.local/state/thermal-guard/latest.json - Event log:
~/.local/state/thermal-guard/events.jsonl
Commands
bash
python3 /home/david/random/bin/thermal_guard_daemon.py once
python3 /home/david/random/bin/thermal_guard_daemon.py status
python3 /home/david/random/bin/thermal_guard_daemon.py daemon
python3 /home/david/random/bin/thermal_guard_daemon.py daemon --dry-run
Rule Model
Each rule can match by command regex, temperature source, temperature threshold, minimum CPU percent, required consecutive breaches, cooldown, and action.
Supported actions:
signalsignal_groupsystemctl_stop_user
Default Rules
The default config targets the specific classes of runaway jobs that recently caused thermal spikes:
sol_ingest.py ... buildollama runner --ollama-engine- headless Playwright browser jobs
sol-chat-vision-fast.servicevia its Gemma fast-vision backend
The defaults intentionally do not kill broader long-running services like masterbot.service; add those only if you explicitly want thermal guard to stop them.