home of the madduck/ blog/
forcedeth: nVidia network chips are broken

Yesterday, I removed my new server from the rack and brought it back home, after the problems with the nVidia network chip (forcedeth) took down the NIC to the point that the IPMI chip, which is routed through the primary interface, wasn't reachable anymore.

Even though a soft reboot fixed the problem, a bit of large-packet traffic, like downloading via IPv6, broke the card again. Since IPMI is also affected, I cannot remotely manage the machine and thus can't leave it in the rack.

In fact, I am strongly considering to make use of the try&buy contract and return the thing. I cannot rule out a software problem, but given that the NIC goes into a state in which it gets unusable even to the IPMI system, which is completely independent of Linux, I somewhat doubt it.

Instead, I suspect a hardware error, beyond the known problems with nVidia network chips and segmentation/checksum offloading.

On the other hand, it's not news that drivers can break hardware, and the fact that I am using Linux is reason for hope.

One alternative, which would allow me to potentially help in fixing this bug, is to use a riser card to stuff a different network card into the server. This would mean I didn't invest all the time into the server for nothing. Unfortunately, this workaround comes with extra costs, and would require Init Seven to allocate another switch port for the IPMI card, which I am not sure they'll be keen about.

The bottom line of the story is that I will avoid nVidia even more in the future, and you might want to do so as well. Companies that produce crap hardware and do not cooperate with people writing free drivers for them do not deserve the money.

Unfortunately, almost everyone out there uses the MCP55 chipset these days, and deny any problems with it. I guess I will take a look at HP and IBM, although I'd really prefer not to pay for their brands, and not to enslave myself to their customer support standards.

NP: Neil Young: Dead Man