One thing I will never understand about this Linux hype are the operating system’s miserable failures in the wake of hardware problems. Coming home from dinner, I find this all over my consoles:
Message from syslogd@piper at Sat Jan 7 22:53:35 2006 ...
piper kernel: Oops: 0002 [18]
Message from syslogd@piper at Sat Jan 7 22:53:35 2006 ...
piper kernel: CR2: 0000004000000004
Since then, I cannot start new processes anymore (though apache2 and old processes work just fine), which means I cannot SSH into the box (which is far away from where I am right now), and thus it’s become useless.
The problem could be anything: corrupt memory, a broken CPU, a
harddrive with bad blocks in the swap area, etc… since Linux
obviously seems to be able to wall in response to a
problem, I would only wish it wouldn’t pout as a consequence but
handle the event more gracefully.
If only I had the time and energy to finally wave goodbye and choose NetBSD…
Update: I managed to get a dmesg output:
<1>Unable to handle kernel paging request at 0000004000000004 RIP:
<ffffffff80152c54>{find_get_pages+36}
PGD 72cee067 PUD 0
Oops: 0002 [18]
CPU 0
Modules linked in: rfcomm l2cap ipv6 af_packet ipt_REJECT ipt_state
iptable_filter iptable_nat ip_conntrack ip_tables deflate zlib_deflate
twofish serpent aes blowfish des sha256 sha1 md5 crypto_null af_key usbhid
hci_usb bluetooth raid5 xor dm_mod sbp2 ide_generic ide_cd eth1394
snd_seq_dummy snd_seq_oss snd_seq_midi snd_seq_midi_event snd_seq
snd_via82xx gameport snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd
i2c_viapro soundcore i2c_core ehci_hcd via82cxxx ohci1394 shpchp pci_hotplug
sk98lin uhci_hcd ide_core ieee1394 rtc parport_pc parport floppy psmouse
pcspkr serio_raw evdev xfs exportfs sr_mod cdrom sd_mod sata_via
sata_promise libata sg scsi_mod raid1 md unix fbcon tileblit font bitblit
vesafb cfbcopyarea cfbimgblt cfbfillrect softcursor
Pid: 136, comm: kswapd0 Not tainted 2.6.12-1-amd64-k8
RIP: 0010:[<ffffffff80152c54>] <ffffffff80152c54>{find_get_pages+36}
RSP: 0018:ffff81007f90dcc8 EFLAGS: 00010002
RAX: 0000004000000000 RBX: ffff81007f90dd08 RCX: ffff81007f90dd10
RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff81000289f678
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff81007113f670
R10: 0000000000000040 R11: 0000000000000000 R12: ffff8100705e54e0
R13: 000000000000007a R14: ffffffffffffffff R15: ffff8100705e55f8
FS: 00002aaaab730d70(0000) GS:ffffffff8040f940(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000004000000004 CR3: 0000000072925000 CR4: 00000000000006e0
Process kswapd0 (pid: 136, threadinfo ffff81007f90c000, task ffff81007f906760)
Stack: ffff81007f90dcf8 ffffffff8015bca7 ffff8100705e54f0 ffffffff8015c905
ffff810001172200 0000000000000000 0000000000000000 0000000000000000
ffff81000289f678 0000004000000000
Call Trace:<ffffffff8015bca7>{pagevec_lookup+23}
<ffffffff8015c905>{invalidate_mapping_pages+245}
<ffffffff8018a403>{shrink_icache_memory+259}
<ffffffff8012d144>{recalc_task_prio+324}
<ffffffff8015ca01>{shrink_slab+193}
<ffffffff8015dcaf>{balance_pgdat+623}
<ffffffff8015df37>{kswapd+295}
<ffffffff80144e00>{autoremove_wake_function+0}
<ffffffff8010f0f7>{child_rip+8} <ffffffff8015de10>{kswapd+0}
<ffffffff8010f0ef>{child_rip+0}
Code: ff 40 04 ff c2 48 83 c1 08 39 d6 75 f0 fb 5b 89 f0 c3 66 66
RIP <ffffffff80152c54>{find_get_pages+36} RSP <ffff81007f90dcc8>
CR2: 0000004000000004
Now, how does a mere mortal diagnose this? This was the third in
a series of oopses, the others occurred in a find
process… the machine runs a software RAID5, so it really could be
anything, right?

