Leveraging Systemd for Hardware Watchdog Control in Embedded Linux

Reading Time: 3 minutes

Introduction

In embedded systems, the hardware watchdog timer is a crucial yet often overlooked feature, especially in Linux systems. Many modern system-on-chips (SoCs) include an internal watchdog timer with upstream Linux support. With the growing complexity and extensive software suite of a custom Embedded Linux system, it is possible for such systems to experience freezing issues. In field applications, such behavior is unacceptable and can lead to undesirable outcomes.

A hardware watchdog timer is a type of timer that, once enabled, requires regular “kicking” or “petting” (resetting the count) to prevent it from triggering a system reset when the timer expires. While modern Linux platforms may include a hardware watchdog and associated driver, there is additional software work and configuration required to fully utilize the device in the running system. This article explores how engineers can leverage the hardware watchdog timer in modern Linux systems with ease by utilizing the built-in functionality of systemd.

This article will not explore the hardware or reset configuration needed for a hardware watchdog timer enablement, but rather, the systemd configuration of the watchdog. The assumption is that the user already has a watchdog device at /dev/watchdog0. Depending on the interest and feedback received on this article, we may write a future article that focuses on specific watchdog hardware configurations.

Systemd Configuration

Systemd, a service manager software widely adopted in modern Linux systems, plays a crucial role in the boot process. It initializes and loads the essential components and services necessary for Linux to function effectively. Systemd’s popularity extends beyond traditional Linux machines, finding extensive use in Embedded Linux, where it often serves as the default choice in numerous Yocto distributions from various vendors. Despite its perceived complexity compared to its predecessor, SysVinit, systemd boasts an impressive feature set and compatibility, making it an invaluable suite for software engineers.

One of the many features of systemd, is the ability to support the configuration and operation of a hardware watchdog timer. To enable and configure the timer, the user should create a new conf file for systemd in /etc/systemd/system.conf.d/ to set the watchdog parameters:

# /etc/systemd/system.conf.d/99-watchdog.conf
[Manager]
RuntimeWatchdogSec=30s
RebootWatchdogSec=10min

The primary parameter RuntimeWatchdogSec specifies that systemd should configure the watchdog timer for this reset interval (30 seconds in this example). If the hardware watchdog is not “kicked” within this specified time frame, it should reset the system. If the parameter is set to “off” or “0”, the watchdog timer will not be enabled.

The RebootWatchdogSec parameter acts as an extra software safeguard for unclean reboots. If a software-initiated reboot does not complete within the specified timeframe, systemd ensures that the reboot process continues. A clean reboot sequence can sometimes become stuck or delayed. This additional parameter defines the timeout duration after which a forced reboot should occur. By setting an appropriate value, systemd can take over and complete the reboot process, ensuring system stability and preventing potential issues caused by an incomplete reboot.

Additional official documentation on these parameters can be found at the link below:

https://www.freedesktop.org/software/systemd/man/latest/systemd-system.conf.html

After creating the associated configuration file, on the following boot, the watchdog should be enabled. Systemd will ensure that the watchdog is “kicked” at least half the specified timeout interval.

Verifying Watchdog Functionality

It is important that the user verifies that the watchdog is indeed being enabled and kicked by systemd. More often than not, hardware watchdogs have specific supported timeout intervals dictated by hardware registers. If the user chooses an unsupported value, they may falsely assume the watchdog is in operation in their system.

Beyond examining system logs, a straightforward way to determine this is to examine if the watchdog is being kicked on the expected interval. The user can use the strace utility to monitor watchdog kicks. If the watchdog is being kicked, an ioctl call is made with the WDIOC_KEEPALIVE parameter. See below for an example output when the watchdog is enabled:

root@linux-board:~# strace -t -e ioctl -p1 | grep WDIOC_KEEPALIVE
strace: Process 1 attached
15:32:00.000000 ioctl(3, WDIOC_KEEPALIVE, 0x7fff4b3c67d0) = 0
15:32:15.000000 ioctl(3, WDIOC_KEEPALIVE, 0x7fff4b3c67d0) = 0
15:32:30.000000 ioctl(3, WDIOC_KEEPALIVE, 0x7fff4b3c67d0) = 0
15:32:45.000000 ioctl(3, WDIOC_KEEPALIVE, 0x7fff4b3c67d0) = 0

Summary

This article has discussed the importance of a watchdog timer in Embedded Linux designs, how it can be easily implemented using built-in systemd features and finally how to verify systemd is utilizing it as intended. While these steps do not address underlying causes of potential system locks up or failures, they do ensure the system can recover quickly if a lock-up or failure occurs, reducing downtime and maximizing system availability.

Have you previously used the watchdog timer in your Linux system? How did you typically enable and control the timer? We would love to hear your thoughts and feedback on the matter. Please leave a comment below and let us know if you have any questions!