«

»

Apr 29 2014

ESXi 5.1 U1 Purple Screen of Death

A few days ago i was contacted by a customer who was running vSphere 5.1 U1 which is about one year old. For various reasons they have not yet upgraded to a later version of vSphere 5.1 or to vSphere 5.5. However, there has been quite some time since i a have seen a purple screen of death (PSOD) but that was the reason my customer called and emailed me. Two of their ESXi hosts crashed within 20 minutes with the exact same PSOD:

Screen Shot 2014-04-04 at 21.02.04

I know it is not the best print screen but click the picture and you’ll see the PSOD message in more detail and taking a closer look it actually gives you an idea of the root cause for the problem.
Both E1000PollRxRing and E1000DevRx points to the Virtual machine (VM) E1000 and/or E1000e driver and this made me verify two things:

  • Are there any VMs using the E1000 and/or E1000e vNIC driver.
  • Are there any existing VMware KB available reporting issues related to the PSOD message. I remember there was quite a few discussions about this a few months back so i figure this is a known problem.

My first investigation showed that around 150 VMs used the E1000 and/or E1000e vNIC driver and i also found the VMware KB article 2059053 in which the following is described:

  • ESXi 5.x host fails with a purple diagnostic screen. Pay attention to the ESXi 5.x description. This actually means that ESXi 5.0, ESXi 5.1 and ESXi 5.5 are affected.
  • This is a known issue affecting ESXi 5.0, 5.1, and 5.5 hosts and virtual machines using the E1000 and E1000e virtual network adapters

The issue is resolved in the ESXi 5.1 Update 2, in ESXi 5.5 Update 1 and for ESXi 5.0 the issue is resolved in patch ESXi500-201401001.

I identified one VM, a Windows Server 2012 R2, that was running on both ESXi hosts before they PSOD and the same time utilized quite a lot resources compared to previous days. That very same VM also used the E1000e vNIC driver so we contacted the VM sysadmin and asked him if he:

  • was running any specific tasks on the VM since it utilize more resources compared to previous days.
  • could change the vNIC driver to VMXNET3.

The reason for the increased VM resource utilization was that the VM was going through performance testing and the sysadmin told us that the VM had crashed two times today about 10 minutes after the performance test was started. This correlates perfectly with the ESXi host PSODs.
The sysadmin change the vNIC driver and successfully (no ESXi host PSOD) completed the 30 minutes long performance test after the vNIC change.

To be on the safe side we also started the process to update the ESXi hosts to 5.1 Update 2 in a few days. The customer will also start changing the vNIC driver from E1000/E1000e to VMXNET3 when possible.

I guess this is just another good reason to use the paravirtualized vNIC driver, VMXNET3, instead of the E1000 and/or E1000e.

8 comments

1 ping

Skip to comment form

  1. wojcieh

    Long live VMware best practices to always use vmxnet3.

  2. khjerpe

    Oneliner to find those bad bad E1000 VMs .. :
    Get-vm | ?{Get-networkadapter $_ | where-object {$_.type -like “*e1000*” }}

  3. magander3

    Totally agree (as long as supported by guest OS)

  4. magander3

    Thanks Kenth

  5. Tim

    It’s troubling that so many OVA’s from the likes of Firemon and Cisco are still shipped with E1000 NICs. In fact, some if these (e.g. Cisco ISE) won’t work with anything else. Makes it hard to get rid of E1000’s when vendors keep using them! Great article.

  6. magander3

    Thanks Tim, glad you liked it. Yes there are still quite a lot of vendors shipping their appliances without the VMXNET3 vNIC driver.

    //Magnus

  7. Paul

    Thank you for this article. IMO VMWare should set default vNIC (while creating VM) to VMXNET3 instead of e1000.

  8. magander3

    Totally agree that all templates should be configured with VMXNET3 and if not using templates the automated and/or manual process of creating VMs should say VMXNET3 if possible.

    //Magnus

  1. Newsletter: May 4, 2014 | Notes from MWhite

    […] and yet it isn’t enough.  So I share things out too.  But in any-case here is a good article about what happens if you are using the Intel E1000 with servers using various versions of […]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">