home of the madduck/ blog/
udev and grey hair

It won't be long before I grow grey hair if udev keeps pulling tricks on me. Yesterday, an upgrade to 0.087-1 hosed one of my systems. The system is sarge-based, but it is connected a to a cable modem needing the cdc_ether driver, which 2.6.8 does not have. Since I don't expect backports of 2.6.15 to sarge (Update: backports.org has a 2.6.15 backport, but it won't solve my problem it seems), but also don't want to migrate my system to testing, I simply decided to pin linux-image-2.6.15-1-686 and all its dependencies to unstable with APT:

Package: linux-image-2.6.15-1-686
Pin: release a=unstable
Pin-Priority: 600

Package: initramfs-tools
Pin: release a=unstable
Pin-Priority: 600

Package: udev
[...]

The install worked, and off I went to reboot... but the machine did not come back up, and booting 2.6.8 with the new udev also failed. Great. At first, I thought I knew the problem, but at closer inspection, it was something else: udev's ide.agent hung itself up and timed out. It turns out it was looking for hd141 instead of hda, and once I found that out, it didn't take long to put two and two together: 141 is ASCII 97 is 'a'. And if you echo hd\141 just like that, the shell will swallow the backslash. Marco, the udev maintainer blamed a broken shell, and I identified busybox-cvs-static to be the problem; Replacing it with busybox from unstable fixed the issue.

Now all that remains is to convince Marco that the bug has nothing to do with initramfs-tools when it occurs in a script provided by udev. initramfs-tools depends on busybox-cvs-static | busybox since it works with either. If udev doesn't work with busybox-cvs-static, it has to conflict, which is not really an option though, due to a libc6 upgrade loop. Fortunately, the 2.6.16 kernel will make ide.agent obsolete, so the problem shall vanish in smoke.

With one problem solved, I woke up this morning to find another. I use udev's network interface renaming feature to ensure that my interfaces always have the names I expect, and that their names give me a hint as to what they're connected to. Sure, using /etc/modules to ensure a defined load order would work fine, but I have too many machines under my control to want to remember that eth2 on this machine is the wireless LAN.

So I use the following udev rules:

wall:~# cat /etc/udev/rules.d/local-interfaces.rules 
KERNEL="eth*", SYSFS{address}="00:02:8a:80:21:31", NAME="internet"
KERNEL="eth*", SYSFS{address}="08:00:46:b1:2d:ee", NAME="lan"
KERNEL="eth*", SYSFS{address}="00:50:04:5b:ec:b3", NAME="wlan"
KERNEL="eth*", SYSFS{address}="00:04:23:72:4e:6c", NAME="wifibackup"

Update: Bas Zoetekouw suggested to match against something else than MAC addresses, for testing. Thus, I tried PCI IDs (using the topmost SYSFS{device} entry in the udevinfo output):

KERNEL="eth*", SYSFS{device}="0x24c4", NAME="internet"
KERNEL="eth*", SYSFS{device}="0x103d", NAME="lan"
KERNEL="eth*", SYSFS{device}="0x5157", NAME="wlan"
KERNEL="eth*", SYSFS{device}="0x1043", NAME="wifibackup"

The problem remains the exact same. It also remains the same if I completely remove wlan and wifibackup.

When I woke up this morning, I found the following mess:

wall:~# ip addr
2: internet: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 08:00:46:b1:2d:ee brd ff:ff:ff:ff:ff:ff
3: eth0_ifrename: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
    link/ether 00:02:8a:80:21:31 brd ff:ff:ff:ff:ff:ff
4: wifibackup: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
    link/ether 00:04:23:72:4e:6c brd ff:ff:ff:ff:ff:ff
5: wlan: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:50:04:5b:ec:b3 brd ff:ff:ff:ff:ff:ff
    inet 192.168.14.129/25 brd 192.168.14.255 scope global wlan
    inet6 fe80::250:4ff:fe5b:ecb3/64 scope link 
      valid_lft forever preferred_lft forever

The wlan and wifibackup interfaces are configured correctly (I use wifibackup to hook into the various open WLANs around, when my provider goes down, or I need more bandwidth). But internet was assigned to the LAN interface, and eth0_ifrename, well... that's just whacked.

Looking at the udev code, this seems to be due to a patch Marco pulled from Ubuntu, which is to guard against race conditions in the renaming. For instance, if eth0 needs to become eth1 and vice versa, udev renames the first to eth0_ifrename and waits until the other has finished its identity change. The patch, however, is a hack: it tries endlessly to rename the interface to its final target name, which, in my case, obviously goes on forever.

10:18  * Md just copied it from the Ubuntu package
10:18 < madduck> why???
10:19 < Md> because it worked in my artificial setup

Unfortunately, this isn't the first time that Ubuntu's "giving back to Debian" (which requires Debian to go out and fetch) is two steps back rather than one forward. I would hope that maintainers of criticial packages (such as udev) would exercise more care when pulling from Ubuntu. And that Ubuntu would please stop adding hacks to packages and instead concentrate on fixing issues at the root the Debian way.

So my problem still persists, and even given Ubuntu's ifrename patch problems, I can't figure out what is actually going on. It does not help that udev also suddenly stopped logging interface name changes. Yes, just like that.

KERNEL="eth*", SYSFS{address}="00:02:8a:80:21:31", NAME="internet"
KERNEL="eth*", SYSFS{address}="08:00:46:b1:2d:ee", NAME="lan"

How can these two rules actually trigger the rename conflict? The only way I could imagine is that udev gets confused and falsely renames 08:00:46:b1:2d:ee to internet. Then, when it gets to the other card, a name collision occurs, udev chooses eth0_ifrename as temporary workaround, and then tries forever to rename eth0_ifrename to internet, which will never succeed.

So why does udev get confused in the first place? Why would it ever name the interface 08:00:46:b1:2d:ee internet? Beats me. But I better end here because the world surely doesn't need just anoAther udev rant.

Update: I forgot to mention that the renaming works just fine when I unload/load modules from the command line. It's only during the boot process that things go wild.

Update 2: I should not that it does not work fine some of the time if the modules are loaded in quick succession.