home of the madduck/ blog/
Linux pains #517: unregistering network devices

Somehow my network device does not want to recognise that it actually has a carrier. This being Linux (and not Windows), I thus decide to reload the driver, after taking down the network interface:

lapse:~# ip l dev eth0
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0a:e4:30:43:16 brd ff:ff:ff:ff:ff:ff
# notice: not UP
lapse:~# rmmod e1000
lapse kernel: unregister_netdevice: waiting for eth0 to become free.
  Usage count = 1
lapse kernel: unregister_netdevice: waiting for eth0 to become free.
  Usage count = 1
[...]
# goes on forever, uninterruptible; thus new shell:
lapse:~# ps u -C rmmod
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      5624  0.0  0.0   1656   480 pts/2    D+   00:02   0:00 rmmod e1000

As we all well know, state 'D' — uninterruptible sleep — means: reboot. Especially since I cannot ssh into the box anymore (doh!), and su and sudo just end up in state 'D' as well, if I try them.

So, on to reboot. Or wait, that doesn't work as the shutdown process will hang long before the filesystems get unmounted. Great.

It's become impossible to argue that Linux is the more stable operating system…

Update: a few people responded with workarounds, ranging from use of watchdog to automate the reboot (this is my laptop, so it's not really necessary) to various combinations of flags to /sbin/shutdown or the use of sysrq to force the system to reboot without cleaning up or unmounting filesystems — essentially the same as pressing the reset button.

Ben Hutchings says:

I think this happens when the driver repeatedly attempts to fix an error condition and keeps scheduling future work (retaining a reference to the device) rather than giving up.

I thus ended up filing a bug against the kernel.