How one line in one file made me reinstall Gentoo
Posted by David Zaslavsky on — CommentsHey, internet. Long time no see.
(It’s often claimed “long time no see” is a literal translation from Chinese “好久不见”, “hǎo jiǔ bú jiàn”, but actually nobody knows for sure.)
On Thursday, my computer crashed. Not just that it crashed, but it somehow corrupted itself so that I couldn’t even boot it. It survived for two seconds after being turned on, before bailing out with this error:
init[1]: segfault at 0 ip 00007ff10ea3fe05 sp 0007ffff7cb49148 error 4 in libc-2.19.so[7ff10e919000+19e000]
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
This is so early in the startup process that nothing is running. It’s pretty much just the kernel and init
(which, if you don’t know, is the very first program to run when a Linux system starts, the one that runs all other programs).
So of course I recompiled (it’s Gentoo) and reinstalled both the kernel and init
, as well as libc
, several times, as well as several other programs that I’m pretty sure didn’t even get a chance to start, so how could they have anything to do with this crash? Still, better safe than sorry I guess.
None of it made any difference. After replacing every program that could possibly be running at the time of the error, init
still crashed. This is probably the most frustrating error I’ve seen in 15 years as a Linux user.
So I spent my weekend reinstalling everything on my computer.
Verifying downloads before GPG
One of the easy ways a nefarious group could hack into my computer is by intercepting the source code I download when I’m installing a program and modifying it to do, well, whatever they want. China’s government tries to maintain pretty strict control over the internet in their country, so I wouldn’t put it past them to do this. (To be fair, I wouldn’t put it past the US government either.) One way I could get around that is to download everything from a trusted server in the US or some other country, but the internet access here is kind of slow, especially going in or out of the country. It’d be a lot quicker to download all my source code within China, and I need all the time I can get.
The alternative is to use cryptography. Gentoo already does this, to some extent: whenever I install a program, the installer first checks the SHA256 hash of the source code against an internal database to make sure the source code hasn’t been tampered with. But that internal database is also something I need to download, and how do I know that hasn’t been tampered with?
Of course, the Gentoo Release Engineering Team (the people who maintain this internal database) have thought about this, and they cryptographically sign the database files. I’ll skip the details, but through the “magic” of public-key cryptography, I can check that the SHA256 hashes I download are the same ones they upload by checking one single 40-letter key fingerprint, just once.
Here’s the catch: to verify the hashes, I need to have GPG installed, and to install GPG, I need to verify the hashes of its source code and all the other code it depends on. My bright idea to get around this was to define this little function, a fake version of GPG:
echo "gpg invoked: gpg $*"
echo "please do this manually and press ok if successful"
read -n 2
if [[ "$REPLY"=="ok" ]]; then
echo
return 0
else
echo
return 1
fi
That goes into /usr/local/bin/gpg
in the system I’m trying to install. This way, when I run emerge-webrsync
to download the hashes, and it gets to the point where it’s going to check the file’s signature, it will pause so I can manually check the signature.
I spent way too much time coming up with that.
About that crash
I did eventually figure out the problem. It turned out to be a line in /etc/ld.so.preload
, which is a file specifying libraries of code that the system should preload for every new program it starts up. Essentially you’re patching extra computer code into the program when it runs. Extra code that it probably wasn’t written to deal with. That makes an environment ripe for conflicts between the preloaded code and the original code. (The fact that this feature exists is shocking, most of the time, but it does have a few legitimate uses.)
I had tried to install the Astrill VPN client, which as part of its installation adds this line to /etc/ld.so.preload
:
/lib64/$LIB/liblsp.so
I’m guessing liblsp.so
is Astrill’s way of intercepting everything that a program tries to send or receive over the internet. It might work for most programs (might; I’m not going to keep it around to find out), but clearly, it has a major conflict with init
(which doesn’t even need to access the internet anyway).
It wasn’t easy to find the culprit, actually; I had to boot my computer using the System Rescue CD (on a USB drive, despite the name) and find all files that were modified at the exact time I installed Astrill:
find /mnt/gentoo -mmin 537 -mmax 539
This finds all files which were last modified 538±1 minutes ago. There were about 50 of them, and it wasn’t hard to pick out the one that could affect how the very first program on the computer starts up.
Still, I can’t believe I reinstalled my entire operating system because of one line in one file… life of a Gentoo user, I guess. :-P