«

»

Feb 16 2016

ESXi host disconnected from vCenter Server & the ESXi localcli command

“VMTurbo"

Today i experienced an ESXi host, version 6.0 U1a, being disconnected from vCenter Server. I did try to connect direct to the ESXI host using the ESXi host embedded client, the vSphere Client and SSH without any success. The last attempt was to use the IPMI connection and that screen showed me the following:

Screen Shot 2016-02-16 at 21.45.02

That is usually not a good sign but i was lucky as the ESXi DCUI worked so i could enable ESSXi shell and access it via the F1 screen option. Based on the following log investigation i decided to disable to local ESXi firewall and when trying to disable the ESXi host local firewall i encountered another issue.

The commands i was trying to run were the following:

  • esxcli network firewall unload
  • esxcli network firewall set –enabled false

They resulted in different messages indication out of memory condition:

  • 2016-0x-xxT20:59:02.335Z cpu3:3582147)WARNING: UserMem: 7019: Failed to allocate pagetables for mmInfo: 0x43194e1ce180, startAddr: ff98 f000, length: 536576, pagePool: 18446744073709551615, status: Out of memory
  • 2016-0x-xxT20:59:57.977Z cpu32:3582449)WARNING: UserParam: 1301: could not chang e group to <host/vim/vimuser/terminal/ssh>: Admission check failed for memory resource
  • 2016-0x-xxT20:59:59.838Z cpu38:3582469)WARNING: User: 5366: Error in exec’d cart el setup: Failed to map section: Admission check failed for memory resource

As mentioned in the vSphere 6 documentation found here there is a way to bypass hostd when it is not responding and that is to use the command localcli instead of esxcli. You should use locally when instructed by support and this warning is also included in the VMware documentation:

If you use a localcli command, an inconsistent system state and potential failure can result. 

However didn’t have any other option so i ran the following commands:

  • localcli network firewall unload
  • localcli network firewall set –enabled false

When done i saw the following in the vobd.log:

  • [netCorrelator] 12406134193us: [vob.net.firewall.config.changed] Firewall configuration has changed.

About 60 seconds after disabling the ESXi host local firewall the ESXi host was joined to the vCenter Server. When you have used localcli you are instructed to restart hostd, in my case i put the ESXi host in maintenance mode and restarted it.

A case has been created with VMware and i’ll update the blog post when we find the root cause.

 

6 comments

Skip to comment form

  1. Vaibhav

    Hi Magnus,

    Thanks for the article; just wanted to ask couple of things

    What logs did you see that made you realize that it’s the firewall which is causing the issue?

    Were there any VM’s running on this host? If yes once the local firewall was disabled on the host and when it joined the VC; then were you able to vmotion the vms to a different host?

    Last, when you say ESXi Host local firewall are you referring the firewall of the ESXi that we can under configuration tab >> security >> under services or is it something else.

    Thanks
    Vaibhav

  2. magander3

    Hi, we checked the void and vmkernel log files and also the storage systems logs to verify access patterns. Yes there were VMs running on the ESXi host that could successfully be migrated to another ESXi host was back to vCenter Server.
    That’s the firewall i’m talking about. When disabled you’ll see no rules in the vSphere Web Client or vSphere client for that ESXI host.

    //Magnus

  3. Kanth

    Hello mate,

    Did you get to know why this issue arised all of a sudden? Did VMware support guys clarify as to why you had this firewall issue? Were there any changes done which triggered this issue?

    Regards,
    Kanth

  4. magander3

    Hi,
    case still in progress.

    will update when i got the final solution

    //Magnus

  5. Christian G.

    Any updates on this ? i’m having a similar issue with the same version U1a (upgraded to U1b about a week ago). Can you check on the performance data if there’s an increase usage on the network resources when the disconnection happens?

    TIA

  6. magander3

    Hi,
    no unfortunately not. VMware is still debugging the issue

    //Magnus

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">