Thursday, August 7, 2014

dm-live issue and fix. Also debugging with kexec

I noticed that successful bootup to a root filesystem connected via USB was broken on certain hardware. I initially assumed that the failure might be associated with a kernel bug, or perhaps to do with specific motherboard chipsets, or a flaw in my understanding of udev. I was stumped and invested some time to figure out what was wrong. This screenshot shows a typical bootup failure. Googling the failure shows others are experiencing the same troubles, perhaps for similar reasons. I didn't see any obvious solutions. For the last while I have mostly been working around the problem by using a root filesystem connected directly over SATA.

I stumbled on the solution by chance:

Include the ehci-pci module among those that are loaded at boot. That ensures the best performance for the external drive, be it a flash disc or a magnetic disc.

Because booting to a root filesystem on USB is the heart of a live USB install, it's good to know that it was a case of PEBKAC and not something more endemic.

With that fix out of the way, there are some other changes to my dm-live startup environment including following along with the stable kernel releases in the 3.10.x series. Currently, I am testing with kernel 3.10.51.

By the way, in the debug phase of this problem, I was pointed to kdump. I played around with the kexec utility to see some of the basics of kernel debugging and how it is designed to work. I found right away that the basic kernel configuration that I have been using, which is extremely similar to the basic default Slackware kernel, is configured to not generate symbols, enable boot-time reservation of memory using the parameter crashkernel=, or be arbitrarily relocatable. To use kdump there are a few changes, including adding the full compiler symbols the compiled objects. In general, I agree with Slackware's decision not to include the symbols because they increase the size of the resulting compile by a factor of 5 or 6 times. In the end, throughout my attempt to debug the above problem, I wasn't able to generate the right kind of crash, i.e. one that actually writes its crash data. I was pretty sure I was doing it right because kexec -l and kexec -e were working as expected. The panic code when loaded with kexec -p just wasn't tripped for whatever reason. It was too hard of crash, I guess.

No comments:

Post a Comment