Tracing the Root of Linux Kernel Problems with Error Messages

Reading Time: 3 minutes

Introduction

When setting up Linux on a new board, you are bound to encounter issues. These issues can range from preventing the kernel from booting to emitting warnings or causing drivers to fail registration or probing. Regardless of the specific issue, the result is a system that does not function as intended. Debugging is the natural next step, but it can be daunting, especially for those unfamiliar with Linux kernel development.

In this article, we will propose a straightforward and fundamental debugging technique that is my go-to approach when starting to tackle such problems.

The authors of the various kernel subsystems and drivers have (in most cases), been studious enough to add warning or error messages into the code. Your mileage may vary here, as some drivers will indicate enough information for you to determine the problem on its face, while others it seems as though the author was trying not to make the problem seem so bad 😛. Nevertheless, we should use these messages to our advantage, to help trace down the root of our problems.

Debugging Through Log Messages

Let us imagine our system emitted the following message during boot:

[ 1.743164] pca953x 0-0034: Failed to enable regulator: -5

This little breadcrumb is what we will start with. Since the Linux kernel is open source, we should have access to the source and that should give us some guidance. Even if you are not an expert in kernel drivers, with enough patience and following code, you can usually track down the root cause of issues.

To locate the source of the log message, we can begin by conducting a search through the Linux source code. However, searching for the entire log message exactly as it appears may not yield any results. This is because the logs are using formatting which inserts strings like timestamps, drivernames etc. Thus, you cannot search literally for these strings. However, using either some regex or a partial search, you should be able to arrive at the line in question. What tool you use is not as important; I sometimes use grep, other times just the search in my IDE (vscode) and I have also used tools like cscope which provide a more formalized search method.

After a few tries of searching, you should be able to track down the line of code that emitted the error or warning message. From here, you can start tracing back the issue. Start by examining the code around the message, and follow things from there.

Example for our failure search using grep:

$ grep -rni 'Failed to enable regulator:' ./drivers/
./drivers/leds/leds-lm3692x.c:180:				"Failed to enable regulator: %d\n", ret);
./drivers/gpu/drm/stm/dw_mipi_dsi-stm.c:469:		DRM_ERROR("Failed to enable regulator: %d\n", ret);
./drivers/gpu/drm/stm/dw_mipi_dsi-stm.c:568:		DRM_ERROR("Failed to enable regulator: %d\n", ret);
./drivers/gpu/drm/panel/panel-samsung-sofef00.c:121:		dev_err(dev, "Failed to enable regulator: %d\n", ret);
./drivers/input/mouse/elan_i2c_core.c:1229:		dev_err(dev, "Failed to enable regulator: %d\n", error);
./drivers/tty/serial/sccnxp.c:909:				"Failed to enable regulator: %i\n", ret);
./drivers/gpio/gpio-pca953x.c:1278:			dev_err(dev, "Failed to enable regulator: %d\n", ret);
./drivers/opp/core.c:990:			dev_warn(dev, "Failed to enable regulator: %d", ret);
./drivers/pci/controller/dwc/pcie-tegra194.c:1407:		dev_err(pcie->dev, "Failed to enable regulator: %d\n", ret);
./drivers/usb/host/ohci-da8xx.c:100:			dev_err(dev, "Failed to enable regulator: %d\n", ret)

In our results, we can see the obvious emitter is ./drivers/gpio/gpio-pca953x.c:1278. From here we can continue tracing down the root of the issue.

Summary

While this may seem like a very simple technique, it is an extremely effective way to start debugging a kernel issue. This can work not just for driver errors or warnings, but also in cases where the failure causes the system to hang. In the latter case, you can use the last emitted messages in the log to help track down the potential cause of the hang.

With over a decade of expertise, Cornersoft Solutions is committed to sharing our knowledge and insights to help others succeed in resolving Linux kernel issues through informative articles like the one you just read. We delve into the techniques we have developed over the years to help others learn from our approach and achieve success in debugging their own Linux kernel issues. We provide detailed explanations, real-world examples, and practical tips that you can apply directly to your work.

If you are facing a specific Linux issue, we welcome the opportunity to discuss it with you and your team. We offer personalized consultations where we can analyze your problem, identify the root cause, and develop a tailored solution to help you resolve it effectively.

Leave a Comment

Your email address will not be published. Required fields are marked *