Featured image of post NSX Troubleshooting - Management and Control Plane

NSX Troubleshooting - Management and Control Plane

This is the first article of my VMware NSX Troubleshooting series. I am aware that already a lot of other blog post around this topic have been published, but for me this post is also a part of my learning process in VMware NSX. Writing about a topic helps me to get a deeper understanding of my existing knowledge gaps. I will start this series of posts with the NSX Management and Control Plane

The new VMware vCenter HTML5 Client Integration for NSX has a great summary of the environment state, but within this blog post we will dig a little bit deeper into the individual components of the NSX Management and Control Plane.

NSX Troubleshooting - Management and Control Plane - System Overview

To get started with NSX Troubleshooting I would recommend the VMware NSX Troubleshooting Guide and the NSX Command Line Quick Reference. The great NSX 6.4 Control Plane Logical View from Tim Sandy will help you to understand the relations between the NSX Management, Control and Data Plane

# Troubleshooting NSX Manager

The NSX Manager is the first component of the NSX Management and Control Plane. To be exact its one of two components of the Management Plane the other one is the VMware vCenter itself. The most environments may only have one NSX Manager, but there is also a setup with a primary and multiple secondary NSX Managers possible in a cross-vCenter NSX environment.

# Check NSX Manager file system

Show the file system usage on the NSX Manager.

1
show filesystems

NSX Troubleshooting - Management and Control Plane -Show Filesystems

# Monitor NSX Manager processes

Show currently running processes on the NSX Manager.

1
show process monitor

NSX Troubleshooting - Management and Control Plane - Show process Monitor

# Check NSX Manager Logs

Shows the appmgmt, manager, or system log of the NSX Manager.

1
2
3
show log manager reverse

show log system reverse

Additional options:​

OptionDescription
followUpdate the displayed log
reverseShow the log in reverse chronological order
last nShow the last n number of events in the log

# NSX Manager Packet Capture

Display all packets captured by an NSX Manager interface. This example shows all http and https packets on the Management interface (The expression is a tcpdump-formatted string):

1
debug packet display interface mgmt port_80_or_port_443

Note: Enabled mode required

NSX Troubleshooting - Management and Control Plane - NSX Manager debug packet

# Verify NSX Manager date and time

Show the current time and date of the NSX Manager.

1
show clock

# Troubleshooting NSX Controller

The NSX Controller Cluster is the second part of the NSX Management and Control Plane we look at within this post. The Controller Cluster needs (in VMware NSX 6.4.0) exact three nodes for a proper setup. NSX Edges and NSX DLR Control VMs can also be counted into the Control Plane, but I will do a separate post for these components.

# Identify NSX Controllers

Show all controller nodes in the Controller Cluster.

NSX Manager:

1
show controller list all

NSX Troubleshooting - Management and Control Plane - Show Controller list all

NSX UI:

NSX Troubleshooting - Management and Control Plane - Show Controller in UI

# Check NSX Controller Interfaces

Show the IP configuration of the NSX Controller.

1
2
3
show network interface

show network interface eth0

NSX Troubleshooting - Management and Control Plane - Show Network Interface

# Show NSX Controller TCP Connections

Show active TCP connections of the NSX Controller.

1
show network connections of-type tcp

NSX Troubleshooting - Management and Control Plane - Show Network Connection

# Show NSX Controller TCP Dump

Run a TCP Dump for the NSX Controller management interface.

1
watch network interface eth0 traffic

# Show NSX Controller Cluster Status

Show the Controller Cluster status per controller. In case of a problem, this should be verified on each controller in the cluster.

1
show control-cluster status

NSX Troubleshooting - Management and Control Plane - Show Control-Cluster Status

# Show Controller Cluster Roles

Show active roles per controller. In case of a problem, this should be verified on each controller in the cluster.

1
show control-cluster roles

NSX Troubleshooting - Management and Control Plane - Show Control-Cluster Roles

# Show Controller Cluster Connections

Show Controller Cluster connections for the individual roles. In case of a problem, this should be verified on each controller in the cluster.

1
show control-cluster connections

NSX Troubleshooting - Management and Control Plane - Show Control-Cluster Conections

# Show NSX Controller Cluster History

Show the event history of the Controller Cluster.

1
show control-cluster history

NSX Troubleshooting - Management and Control Plane - Show Control-Cluster History

# Check NSX Controller Logs

Check the NSX Controller Logs  for know issues, errors and warnings.

Slow Disk:

1
show log cloudnet/cloudnet_java-zookeeper.<timestamp>.log filtered-by fsync

NSX Troubleshooting - Management and Control Plane - Show Controller Logs

Disk space usage:

1
show log syslog filtered-by freespace:

Main controller log - warnings and errors:

1
2
3
show log cloudnet/cloudnet.nsx-controller.vmware-nsx.log.ERROR.<timestamp>

show log cloudnet/cloudnet.nsx-controller.vmware-nsx.log.WARNING.<timestamp>

NSX Troubleshooting - Management and Control Plane - Show Controller Logs

# Troubleshooting ESXi Host

In my opinion at least some components of the ESXi Hosts count into the NSX Management and Control Plane, but it becomes blurry with the Data Plane.

# Verify the NSX VIB installation

Verify that the NSX-V VIB is installed on the on the ESXi Host.

1
esxcli software vib list | grep -e nsxv

NSX Troubleshooting - Management and Control Plane - esxcli software vib list

# Verify currently loaded NSX Modules

Verify that all the NSX modules are currently loaded in the ESXi system.

1
vmkload_mod -l | grep nsx

NSX Troubleshooting - Management and Control Plane - vmkload_mod -l

# Verify VXLAN IP Connection between ESXi Hosts

Verify the connection between all the VTEPs in your environment.

Identify the NSX VMkernel interfaces:

1
esxcli network ip interface list --netstack=vxlan

NSX Troubleshooting - Management and Control Plane - esxcli network ip interface list

Run IP connection test:

1
ping -S vxlan -d -s 1572 -I <VMK> <Remote IP>

NSX Troubleshooting - Management and Control Plane - VXLAN Ping

View Routing table for the VXLAN TCP/IP Stack:

1
esxcli network ip route ipv4 list -N vxlan

NSX Troubleshooting - Management and Control Plane - esxcli network ip route

View ARP table for the VXLAN TCP/IP Stack:

1
 esxcli network ip neighbor list -N vxlan

NSX Troubleshooting - Management and Control Plane - esxcli network ip neighbor

# Run NSX Host Health Check

Show details of the health status of the specified ESXi Host.

Identify ESXi Host ID:

1
show cluster all

NSX Troubleshooting - Management and Control Plane - show cluster all

1
show cluster <Cluster-Id>

NSX Troubleshooting - Management and Control Plane - show cluster

Run NSX Host Health Check:

1
show host <Host-Id> health-status detail

NSX Troubleshooting - Management and Control Plane - show host health-status

# Query API for ESXi Management and Control Plane connection

Query the NSX Manager API for all ESXi Hosts Management and Control Plane connection states.

1
{{NSX-URL}}/api/2.0/vdn/host/status

NSX Troubleshooting - Management and Control Plane - NSX Manger API Host Connection State

# Query API for ESXi Management and Control Plane connection details

There is another NSX Manager API query for a more detailed status of a single ESXi Host.

1
{{NSX-URL}}/api/2.0/vdn/inventory/host/<Host-ID>/connection/status

NSX Troubleshooting - Management and Control Plane - NSX Manger API Host Connection State Details

Known Error Codes:

  • 1255602: Incomplete Controller Certificate
  • 1255603: SSL Handshake Failure
  • 1255604: Connection Refused
  • 1255605: Keep-alive Timeout
  • 1255606: SSL Exception
  • 1255607: Bad Message
  • 1255620: Unknown Error

# Verify Control Plane Agent Status

Verify the control plane Agent (netcpad) status on the ESXi Hosts.

1
/etc/init.d/netcpad status

NSX Troubleshooting - Management and Control Plane - netcpad status

# Verify Stateful Firewall Service

Verify the Stateful Firewall Service (vShield-Stateful-Firewall) status on the ESXi Hosts.

1
/etc/init.d/vShield-Stateful-Firewall status

NSX Troubleshooting - Management and Control Plane - vShield-Stateful-Firewall status

# Check the Control Plane Agent Configuration

Check the control plane agent configuration on the ESXi Hosts. The IP addresses of all NSX Controllers should be listed.

1
more /etc/vmware/netcpa/config-by-vsm.xml

NSX Troubleshooting - Management and Control Plane - config-by-vsm.xml

# Verify that Stateful Firewall Service is configured

Verify that Stateful Firewall Service is configured to the NSX Manager IP.

1
esxcfg-advcfg -g /UserVars/RmqIpAddress

NSX Troubleshooting - Management and Control Plane - RmqIpAddress

# Verify IP connection to the Controllers

Verify the active IP connections from the ESXi Host to all controllers in the NSX Controller-Cluster.

1
esxcli network ip connection list | grep 1234

NSX Troubleshooting - Management and Control Plane - Controller IP Connection

# Verify IP connection to the Manager

Verify the active IP connections from the ESXi Host to the NSX Manager.

1
esxcli network ip connection list | grep 5671

NSX Troubleshooting - Management and Control Plane - Manager IP Connection

# Check the Control Plane for the Logical Switches

Check the state of the VXLAN control plane for the logical switches on the ESXi Host.

Identify the VXLAN DVS: 

1
esxcli network vswitch dvs vmware vxlan list

NSX Troubleshooting - Management and Control Plane - VXLAN VIB

Check VXLAN control plane state:

1
esxcli network vswitch dvs vmware vxlan network list --vds-name=<DVS name>

NSX Troubleshooting - Management and Control Plane - VXLAN DVS

# Show Control Plane Agent Log Files

Show the control plane Agent log files on the ESXi Host.

1
tail -f /var/log/netcpa.log

NSX Troubleshooting - Management and Control Plane - netcpa.log

# Show Stateful Firewall Service Log Files

Show the Stateful Firewall Service (message bus client) log files on the ESXi Host.

1
tail -f /var/log/vsfwd.log

NSX Troubleshooting - Management and Control Plane - vsfwd.log

# External references

Built with Hugo
Theme Stack designed by Jimmy