NSX Troubleshooting – Management and Control Plane

This is the first article of my VMware NSX Troubleshooting series. I am aware that already a lot of other blog post around this topic have been published, but for me this post is also a part of my learning process in VMware NSX. Writing about a topic helps me to get a deeper understanding of my existing knowledge gaps. I will start this series of posts with the NSX Management and Control Plane

The new VMware vCenter HTML5 Client Integration for NSX has a great summary of the environment state, but within this blog post we will dig a little bit deeper into the individual components of the NSX Management and Control Plane.

NSX Troubleshooting - Management and Control Plane - System Overview

To get started with NSX Troubleshooting I would recommend the VMware NSX Troubleshooting Guide and the NSX Command Line Quick Reference. The great NSX 6.4 Control Plane Logical View from Tim Sandy will help you to understand the relations between the NSX Management, Control and Data Plane

 

Troubleshooting NSX Manager

The NSX Manager is the first component of the NSX Management and Control Plane. To be exact its one of two components of the Management Plane the other one is the VMware vCenter itself. The most environments may only have one NSX Manager, but there is also a setup with a primary and multiple secondary NSX Managers possible in a cross-vCenter NSX environment.

Check NSX Manager file system

Show the file system usage on the NSX Manager.

NSX Troubleshooting - Management and Control Plane -Show Filesystems

Monitor NSX Manager processes

Show currently running processes on the NSX Manager.

NSX Troubleshooting - Management and Control Plane - Show process Monitor

Check NSX Manager Logs

Shows the appmgmt, manager, or system log of the NSX Manager.

Additional options:​

OptionDescription
followUpdate the displayed log
reverseShow the log in reverse chronological order
last nShow the last n number of events in the log

NSX Manager Packet Capture

Display all packets captured by an NSX Manager interface. This example shows all http and https packets on the Management interface (The expression is a tcpdump-formatted string):

Note: Enabled mode required

NSX Troubleshooting - Management and Control Plane - NSX Manager debug packet

Verify NSX Manager date and time

Show the current time and date of the NSX Manager.

Troubleshooting NSX Controller

The NSX Controller Cluster is the second part of the NSX Management and Control Plane we look at within this post. The Controller Cluster needs (in VMware NSX 6.4.0) exact three nodes for a proper setup. NSX Edges and NSX DLR Control VMs can also be counted into the Control Plane, but I will do a separate post for these components.

Identify NSX Controllers

Show all controller nodes in the Controller Cluster.

NSX Manager

NSX Troubleshooting - Management and Control Plane - Show Controller list all

NSX UI

NSX Troubleshooting - Management and Control Plane - Show Controller in UI

Check NSX Controller Interfaces

Show the IP configuration of the NSX Controller.

NSX Troubleshooting - Management and Control Plane - Show Network Interface

Show NSX Controller TCP Connections

Show active TCP connections of the NSX Controller.

NSX Troubleshooting - Management and Control Plane - Show Network Connection

Show NSX Controller TCP Dump

Run a TCP Dump for the NSX Controller management interface.

Show NSX Controller Cluster Status

Show the Controller Cluster status per controller. In case of a problem, this should be verified on each controller in the cluster.

NSX Troubleshooting - Management and Control Plane - Show Control-Cluster Status

Show Controller Cluster Roles

Show active roles per controller. In case of a problem, this should be verified on each controller in the cluster.

NSX Troubleshooting - Management and Control Plane - Show Control-Cluster Roles

Show Controller Cluster Connections

Show Controller Cluster connections for the individual roles. In case of a problem, this should be verified on each controller in the cluster.

NSX Troubleshooting - Management and Control Plane - Show Control-Cluster Conections

Show NSX Controller Cluster History

Show the event history of the Controller Cluster.

NSX Troubleshooting - Management and Control Plane - Show Control-Cluster History

Check NSX Controller Logs

Check the NSX Controller Logs  for know issues, errors and warnings.

Slow Disk:

NSX Troubleshooting - Management and Control Plane - Show Controller Logs

Disk space usage:

Main controller log – warnings and errors:

NSX Troubleshooting - Management and Control Plane - Show Controller Logs

Troubleshooting ESXi Host

In my opinion at least some components of the ESXi Hosts count into the NSX Management and Control Plane, but it becomes blurry with the Data Plane.

Verify the NSX VIB installation

Verify that the NSX-V VIB is installed on the on the ESXi Host.

NSX Troubleshooting - Management and Control Plane - esxcli software vib list

Verify currently loaded NSX Modules

Verify that all the NSX modules are currently loaded in the ESXi system.

NSX Troubleshooting - Management and Control Plane - vmkload_mod -l

Verify VXLAN IP Connection between ESXi Hosts

Verify the connection between all the VTEPs in your environment.

Identify the NSX VMkernel interfaces:

NSX Troubleshooting - Management and Control Plane - esxcli network ip interface list

Run IP connection test:

NSX Troubleshooting - Management and Control Plane - VXLAN Ping

View Routing table for the VXLAN TCP/IP Stack:

NSX Troubleshooting - Management and Control Plane - esxcli network ip route

View ARP table for the VXLAN TCP/IP Stack:

NSX Troubleshooting - Management and Control Plane - esxcli network ip neighbor

Run NSX Host Health Check

Show details of the health status of the specified ESXi Host.

Identify ESXi Host ID:

NSX Troubleshooting - Management and Control Plane - show cluster all

NSX Troubleshooting - Management and Control Plane - show cluster

Run NSX Host Health Check:

NSX Troubleshooting - Management and Control Plane - show host health-status

Query API for ESXi Management and Control Plane connection

Query the NSX Manager API for all ESXi Hosts Management and Control Plane connection states.

NSX Troubleshooting - Management and Control Plane - NSX Manger API Host Connection State

Query API for ESXi Management and Control Plane connection details

There is another NSX Manager API query for a more detailed status of a single ESXi Host.

NSX Troubleshooting - Management and Control Plane - NSX Manger API Host Connection State Details

Known Error Codes:

1255602: Incomplete Controller Certificate
1255603: SSL Handshake Failure
1255604: Connection Refused
1255605: Keep-alive Timeout
1255606: SSL Exception
1255607: Bad Message
1255620: Unknown Error

Verify Control Plane Agent Status

Verify the control plane Agent (netcpad) status on the ESXi Hosts.

NSX Troubleshooting - Management and Control Plane - netcpad status

Verify Stateful Firewall Service

Verify the Stateful Firewall Service (vShield-Stateful-Firewall) status on the ESXi Hosts.

NSX Troubleshooting - Management and Control Plane - vShield-Stateful-Firewall status

Check the Control Plane Agent Configuration

Check the control plane agent configuration on the ESXi Hosts. The IP addresses of all NSX Controllers should be listed.

NSX Troubleshooting - Management and Control Plane - config-by-vsm.xml

Verify that Stateful Firewall Service is configured

Verify that Stateful Firewall Service is configured to the NSX Manager IP.

NSX Troubleshooting - Management and Control Plane - RmqIpAddress

Verify IP connection to the Controllers

Verify the active IP connections from the ESXi Host to all controllers in the NSX Controller-Cluster.

NSX Troubleshooting - Management and Control Plane - Controller IP Connection

Verify IP connection to the Manager

Verify the active IP connections from the ESXi Host to the NSX Manager.

NSX Troubleshooting - Management and Control Plane - Manager IP Connection

Check the Control Plane for the Logical Switches

Check the state of the VXLAN control plane for the logical switches on the ESXi Host.

Identify the VXLAN DVS: 

NSX Troubleshooting - Management and Control Plane - VXLAN VIB

Check VXLAN control plane state:

NSX Troubleshooting - Management and Control Plane - VXLAN DVS

Show Control Plane Agent Log Files

Show the control plane Agent log files on the ESXi Host.

NSX Troubleshooting - Management and Control Plane - netcpa.log

Show Stateful Firewall Service Log Files

Show the Stateful Firewall Service (message bus client) log files on the ESXi Host.

NSX Troubleshooting - Management and Control Plane - vsfwd.log

External references

Leave a Reply