It won’t be long before I grow grey hair if udev
keeps pulling tricks on me. Yesterday, an upgrade to 0.087-1 hosed
one of my systems. The system is sarge-based, but it is connected a
to a cable modem needing the cdc_ether driver, which
2.6.8 does not have. Since I don’t expect backports of 2.6.15 to
sarge (Update: backports.org has a 2.6.15
backport, but it won’t solve my problem it seems), but also
don’t want to migrate my system to testing, I simply
decided to pin linux-image-2.6.15-1-686 and all its
dependencies to unstable with APT:
Package: linux-image-2.6.15-1-686
Pin: release a=unstable
Pin-Priority: 600
Package: initramfs-tools
Pin: release a=unstable
Pin-Priority: 600
Package: udev
[...]
The install worked, and off I went to reboot… but the machine
did not come back up, and booting 2.6.8 with the new
udev also failed. Great. At first, I thought I
knew the problem, but at closer inspection, it was something
else: udev’s ide.agent hung itself up and
timed out. It turns out it was looking for hd141
instead of hda, and once I found that out, it didn’t
take long to put two and two together: 141 is ASCII 97 is ‘a’. And
if you echo hd\141 just like that, the shell will
swallow the backslash. Marco, the udev maintainer
blamed a broken shell, and I identified
busybox-cvs-static to be the problem; Replacing it with
busybox from unstable fixed the
issue.
Now all that remains is to convince Marco that the bug has
nothing to do with initramfs-tools when it occurs in a
script provided by udev. initramfs-tools
depends on busybox-cvs-static | busybox since it
works with either. If udev doesn’t work with
busybox-cvs-static, it has to conflict, which is not
really an option though, due to a libc6 upgrade loop.
Fortunately, the 2.6.16 kernel will make ide.agent
obsolete, so the problem shall vanish in smoke.
With one problem solved, I woke up this morning to find another.
I use udev’s network interface renaming feature to ensure that my
interfaces always have the names I expect, and that their names
give me a hint as to what they’re connected to. Sure, using
/etc/modules to ensure a defined load order would work
fine, but I have too many machines under my control to want to
remember that eth2 on this machine is the wireless
LAN.
So I use the following udev rules:
wall:~# cat /etc/udev/rules.d/local-interfaces.rules
KERNEL="eth*", SYSFS{address}="00:02:8a:80:21:31", NAME="internet"
KERNEL="eth*", SYSFS{address}="08:00:46:b1:2d:ee", NAME="lan"
KERNEL="eth*", SYSFS{address}="00:50:04:5b:ec:b3", NAME="wlan"
KERNEL="eth*", SYSFS{address}="00:04:23:72:4e:6c", NAME="wifibackup"
Update: Bas Zoetekouw suggested to match
against something else than MAC addresses, for testing. Thus, I
tried PCI IDs (using the topmost SYSFS{device} entry
in the udevinfo output):
KERNEL="eth*", SYSFS{device}="0x24c4", NAME="internet"
KERNEL="eth*", SYSFS{device}="0x103d", NAME="lan"
KERNEL="eth*", SYSFS{device}="0x5157", NAME="wlan"
KERNEL="eth*", SYSFS{device}="0x1043", NAME="wifibackup"
The problem remains the exact same. It also remains the same if
I completely remove wlan and
wifibackup.
When I woke up this morning, I found the following mess:
wall:~# ip addr
2: internet: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 08:00:46:b1:2d:ee brd ff:ff:ff:ff:ff:ff
3: eth0_ifrename: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:02:8a:80:21:31 brd ff:ff:ff:ff:ff:ff
4: wifibackup: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:04:23:72:4e:6c brd ff:ff:ff:ff:ff:ff
5: wlan: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:50:04:5b:ec:b3 brd ff:ff:ff:ff:ff:ff
inet 192.168.14.129/25 brd 192.168.14.255 scope global wlan
inet6 fe80::250:4ff:fe5b:ecb3/64 scope link
valid_lft forever preferred_lft forever
The wlan and wifibackup interfaces are
configured correctly (I use wifibackup to hook into
the various open WLANs around, when my provider goes down, or I
need more bandwidth). But internet was assigned to the
LAN interface, and eth0_ifrename, well… that’s just
whacked.
Looking at the udev code, this seems to be due to a
patch Marco pulled from Ubuntu, which is to guard against race
conditions in the renaming. For instance, if eth0
needs to become eth1 and vice versa,
udev renames the first to eth0_ifrename
and waits until the other has finished its identity change. The
patch, however, is a hack: it tries endlessly to rename the
interface to its final target name, which, in my case, obviously
goes on forever.
10:18 * Md just copied it from the Ubuntu package
10:18 < madduck> why???
10:19 < Md> because it worked in my artificial setup
Unfortunately, this isn’t the first time that Ubuntu’s “giving
back to Debian” (which requires Debian to go out and fetch) is two
steps back rather than one forward. I would hope that maintainers
of criticial packages (such as udev) would exercise
more care when pulling from Ubuntu. And that Ubuntu would please
stop adding hacks to packages and instead concentrate on fixing
issues at the root the Debian way.
So my problem still persists, and even given Ubuntu’s
ifrename patch problems, I can’t figure out what is
actually going on. It does not help that udev also
suddenly stopped logging interface name changes. Yes, just like
that.
KERNEL="eth*", SYSFS{address}="00:02:8a:80:21:31", NAME="internet"
KERNEL="eth*", SYSFS{address}="08:00:46:b1:2d:ee", NAME="lan"
How can these two rules actually trigger the rename conflict?
The only way I could imagine is that udev gets
confused and falsely renames 08:00:46:b1:2d:ee to
internet. Then, when it gets to the other card, a name
collision occurs, udev chooses
eth0_ifrename as temporary workaround, and then tries
forever to rename eth0_ifrename to
internet, which will never succeed.
So why does udev get confused in the first place?
Why would it ever name the interface 08:00:46:b1:2d:ee
internet? Beats me. But I better end here because the
world surely doesn’t need just anoAther udev rant.
Update: I forgot to mention that the renaming works just fine when I unload/load modules from the command line. It’s only during the boot process that things go wild.
Update 2: I should not that it does not work fine some of the time if the modules are loaded in quick succession.

