This is the first article of my VMware NSX Troubleshooting series. I am aware that already a lot of other blog post around this topic have been published, but for me this post is also a part of my learning process in VMware NSX. Writing about a topic helps me to get a deeper understanding of my existing knowledge gaps. I will start this series of posts with the NSX Management and Control Plane
The new VMware vCenter HTML5 Client Integration for NSX has a great summary of the environment state, but within this blog post we will dig a little bit deeper into the individual components of the NSX Management and Control Plane.
To get started with NSX Troubleshooting I would recommend the VMware NSX Troubleshooting Guide and the NSX Command Line Quick Reference. The great NSX 6.4 Control Plane Logical View from Tim Sandy will help you to understand the relations between the NSX Management, Control and Data Plane
- 1 Troubleshooting NSX Manager
- 2 Troubleshooting NSX Controller
- 2.1 Identify NSX Controllers
- 2.2 Check NSX Controller Interfaces
- 2.3 Show NSX Controller TCP Connections
- 2.4 Show NSX Controller TCP Dump
- 2.5 Show NSX Controller Cluster Status
- 2.6 Show Controller Cluster Roles
- 2.7 Show Controller Cluster Connections
- 2.8 Show NSX Controller Cluster History
- 2.9 Check NSX Controller Logs
- 3 Troubleshooting ESXi Host
- 3.1 Verify the NSX VIB installation
- 3.2 Verify currently loaded NSX Modules
- 3.3 Verify VXLAN IP Connection between ESXi Hosts
- 3.4 Run NSX Host Health Check
- 3.5 Query API for ESXi Management and Control Plane connection
- 3.6 Query API for ESXi Management and Control Plane connection details
- 3.7 Verify Control Plane Agent Status
- 3.8 Verify Stateful Firewall Service
- 3.9 Check the Control Plane Agent Configuration
- 3.10 Verify that Stateful Firewall Service is configured
- 3.11 Verify IP connection to the Controllers
- 3.12 Verify IP connection to the Manager
- 3.13 Check the Control Plane for the Logical Switches
- 3.14 Show Control Plane Agent Log Files
- 3.15 Show Stateful Firewall Service Log Files
- 4 External references
The NSX Manager is the first component of the NSX Management and Control Plane. To be exact its one of two components of the Management Plane the other one is the VMware vCenter itself. The most environments may only have one NSX Manager, but there is also a setup with a primary and multiple secondary NSX Managers possible in a cross-vCenter NSX environment.
Check NSX Manager file system
Show the file system usage on the NSX Manager.
Monitor NSX Manager processes
Show currently running processes on the NSX Manager.
show process monitor
Check NSX Manager Logs
Shows the appmgmt, manager, or system log of the NSX Manager.
show log manager reverse
show log system reverse
|follow||Update the displayed log|
|reverse||Show the log in reverse chronological order|
|last n||Show the last n number of events in the log|
NSX Manager Packet Capture
Display all packets captured by an NSX Manager interface. This example shows all http and https packets on the Management interface (The expression is a tcpdump-formatted string):
debug packet display interface mgmt port_80_or_port_443
Verify NSX Manager date and time
Show the current time and date of the NSX Manager.
The NSX Controller Cluster is the second part of the NSX Management and Control Plane we look at within this post. The Controller Cluster needs (in VMware NSX 6.4.0) exact three nodes for a proper setup. NSX Edges and NSX DLR Control VMs can also be counted into the Control Plane, but I will do a separate post for these components.
Identify NSX Controllers
Show all controller nodes in the Controller Cluster.
show controller list all
Check NSX Controller Interfaces
Show the IP configuration of the NSX Controller.
show network interface
show network interface eth0
Show NSX Controller TCP Connections
Show active TCP connections of the NSX Controller.
show network connections of-type tcp
Show NSX Controller TCP Dump
Run a TCP Dump for the NSX Controller management interface.
watch network interface eth0 traffic
Show NSX Controller Cluster Status
Show the Controller Cluster status per controller. In case of a problem, this should be verified on each controller in the cluster.
show control-cluster status
Show Controller Cluster Roles
Show active roles per controller. In case of a problem, this should be verified on each controller in the cluster.
show control-cluster roles
Show Controller Cluster Connections
Show Controller Cluster connections for the individual roles. In case of a problem, this should be verified on each controller in the cluster.
show control-cluster connections
Show NSX Controller Cluster History
Show the event history of the Controller Cluster.
show control-cluster history
Check NSX Controller Logs
Check the NSX Controller Logs for know issues, errors and warnings.
show log cloudnet/cloudnet_java-zookeeper.<timestamp>.log filtered-by fsync
Disk space usage:
show log syslog filtered-by freespace:
Main controller log – warnings and errors:
show log cloudnet/cloudnet.nsx-controller.vmware-nsx.log.ERROR.<timestamp>
show log cloudnet/cloudnet.nsx-controller.vmware-nsx.log.WARNING.<timestamp>
In my opinion at least some components of the ESXi Hosts count into the NSX Management and Control Plane, but it becomes blurry with the Data Plane.
Verify the NSX VIB installation
Verify that the NSX-V VIB is installed on the on the ESXi Host.
esxcli software vib list | grep -e nsxv
Verify currently loaded NSX Modules
Verify that all the NSX modules are currently loaded in the ESXi system.
vmkload_mod -l | grep nsx
Verify VXLAN IP Connection between ESXi Hosts
Verify the connection between all the VTEPs in your environment.
Identify the NSX VMkernel interfaces:
esxcli network ip interface list --netstack=vxlan
Run IP connection test:
ping -S vxlan -d -s 1572 -I <VMK> <Remote IP>
View Routing table for the VXLAN TCP/IP Stack:
esxcli network ip route ipv4 list -N vxlan
View ARP table for the VXLAN TCP/IP Stack:
esxcli network ip neighbor list -N vxlan
Run NSX Host Health Check
Show details of the health status of the specified ESXi Host.
Identify ESXi Host ID:
show cluster all
show cluster <Cluster-Id>
Run NSX Host Health Check:
show host <Host-Id> health-status detail
Query API for ESXi Management and Control Plane connection
Query the NSX Manager API for all ESXi Hosts Management and Control Plane connection states.
Query API for ESXi Management and Control Plane connection details
There is another NSX Manager API query for a more detailed status of a single ESXi Host.
Known Error Codes:
1255602: Incomplete Controller Certificate
1255603: SSL Handshake Failure
1255604: Connection Refused
1255605: Keep-alive Timeout
1255606: SSL Exception
1255607: Bad Message
1255620: Unknown Error
Verify Control Plane Agent Status
Verify the control plane Agent (netcpad) status on the ESXi Hosts.
Verify Stateful Firewall Service
Verify the Stateful Firewall Service (vShield-Stateful-Firewall) status on the ESXi Hosts.
Check the Control Plane Agent Configuration
Check the control plane agent configuration on the ESXi Hosts. The IP addresses of all NSX Controllers should be listed.
Verify that Stateful Firewall Service is configured
Verify that Stateful Firewall Service is configured to the NSX Manager IP.
esxcfg-advcfg -g /UserVars/RmqIpAddress
Verify IP connection to the Controllers
Verify the active IP connections from the ESXi Host to all controllers in the NSX Controller-Cluster.
esxcli network ip connection list | grep 1234
Verify IP connection to the Manager
Verify the active IP connections from the ESXi Host to the NSX Manager.
esxcli network ip connection list | grep 5671
Check the Control Plane for the Logical Switches
Check the state of the VXLAN control plane for the logical switches on the ESXi Host.
Identify the VXLAN DVS:
esxcli network vswitch dvs vmware vxlan list
Check VXLAN control plane state:
esxcli network vswitch dvs vmware vxlan network list --vds-name=<DVS name>
Show Control Plane Agent Log Files
Show the control plane Agent log files on the ESXi Host.
tail -f /var/log/netcpa.log
Show Stateful Firewall Service Log Files
Show the Stateful Firewall Service (message bus client) log files on the ESXi Host.
tail -f /var/log/vsfwd.log