Sunday, October 15, 2017

聊一聊 esxi 上的抓包工具pktcap-uw

今天聊一下esxi 5.5开始提供的抓包工具 pktcap-uw, (在5.5之前esxi 上的抓包工具tcpdump-uw  由于只能针对vmkernel 端口进行抓包,有一定局限性)。
对于网络工程师,抓得一手好包,是一个必备技能。尤其是NSX在每个esxi host 里面多部署了一个三层的逻辑路由器VDR,在每个host 里面的网络节点又增加了。并且通过对抓包工具的学习,同时也会对datapath 层面的理解加深。

先从vswitch 开始,二层交换机data path 非常简单
下图是一个逻辑概念,一个数据包,从物理交换机进入到,esxi 物理网卡,vswitch ,虚拟机的虚拟网卡。

Pktcap-uw 可以针对vswitch port , uplink分别进行抓包,需要注意的是pktcap-uw 抓包需要指方向



 1 获取vswitch port ID ,有多种方法,这里列出两种
第一种:
好处在于只用一条命令就能获取必要的信息
net-stats -l
[root@localhost:/tmp] net-stats -l
PortNum          TypeSubType SwitchName       MACAddress         ClientName
33554434           4       0 vSwitch0         00:50:56:b6:2b:53  vmnic0
33554437           3       0 vSwitch0         00:50:56:62:59:80  vmk1
50331650           4       0 DvsPortset-0     00:50:56:b6:2e:e3  vmnic1
67108866           4       0 DvsPortset-1     00:50:56:b6:60:fd  vmnic2
67108872           3       0 DvsPortset-1     00:50:56:b6:2b:53  vmk0
67108874           5       6 DvsPortset-1     00:50:56:96:5b:99  linux_mirror.eth0
67108875           5       6 DvsPortset-1     00:50:56:96:5b:8e  linux_source_231.eth0
67108877           5       6 DvsPortset-1     00:50:56:96:3d:35  linux_dst_233.eth0

第二种
这种方法的好处在于,如果uplink 做了teaming, 通过这个方法也可以获取到使用的uplink 
esxcli network vm list 列出worldID
[root@localhost:/tmp] esxcli network vm list
World ID  Name              Num Ports  Networks
-------- ----------------  ---------  --------------
  711273  linux_mirror              1 dvportgroup-43
  711281  linux_source_231          1 dvportgroup-43
  714723  linux_dst_233             1  dvportgroup-43



  esxcli network vmport list -w worldID


[root@localhost:/tmp] esxcli network vm port list -w 711273
   Port ID: 67108874
   vSwitch: dvSwitch
   Portgroup:dvportgroup-43
   DVPort ID: 18
   MAC Address:00:50:56:96:5b:99
   IP Address: 0.0.0.0
   Team Uplink: vmnic2
   Uplink Port ID:67108866
   Active Filters:

运行以下命令抓取 por tid 上流量注意port ID 需要和实际一致,proto 参数是制定协议号,0x01 icmp 协议号
一些常用的协议列表
TCP  0x06
UDP 0x11
OSPF 0x59
ISIS    0x7C
完整的列表
https://en.wikipedia.org/wiki/List_of_IP_protocol_numbers

命令含义,是抓取vswitch VM收到的icmp 报文,并存储在tmp文件夹下33554437_in.pcap
pktcap-uw --switchport 33554437  --dir 0 --proto 0x01 -o /tmp/33554437_in.pcap

命令含义,是抓取vswitch VM发出的icmp 报文,并存储在tmp文件夹下33554437_out.pcap
pktcap-uw --switchport 33554437 --dir 1 --proto 0x01 -o/tmp/33554437_out.pcap

抓取物理网卡vmnic6 出方向的icmp 报文,并存在tmp 文件夹下面vmnic6_out.pcap
pktcap-uw --uplink vmnic6 --dir 1 --proto 0x01 -o/tmp/vmnic6_out.pcap
抓取物理网卡vmnic6 入方向的icmp 报文,并存在tmp 文件夹下面vmnic6_in.pcap

pktcap-uw --uplink vmnic6 --dir 0 --proto 0x01 -o/tmp/vmnic6_in.pcap

如果有三层vdr 参与后在esxihost 内部的data path,图片来自nsx trouble shooting guide



三层跨host数据的逻辑路径
VM<—>vswich port< --->vdr port< --->uplink

Switch port 同样是用 net-state –l  拿到

获取vdrport
[root@localhost:~] net-vdr -C -l

Host locale Id:            4236bed1-638a-8c8a-4e44-f9b2b0de8aa7

Connection Information:
-----------------------

DvsName          VdrPort           NumLifs  VdrVmac
-------          -------           ------- -------
dvSwitch         vdrPort           4        02:50:56:56:44:52
    Vdr Switch Port: 100663301
    Teaming Policy:Default Teaming
    Uplink   : dvUplink1(100663298):00:50:56:f7:6a:ce(Team member)


   Stats : PktDropped      Pkt Replaced     Pkt Skipped
   Input : 0                0                1102180957
  Output : 33               0                103174


需要注意的是在最后的命令在官方的图片是错误的,就是在pnic处的抓包命令。正确抓取带有vxlan 封装的报文命令格式如下

Pktcap-uw –uplink vmnic0 –capture UplinkSnd
Pktcap-uw –uplink vmnic0 –capture UplinkRcv


命令列表:
esxcli network vm list 
esxcli network vm port list -w worldID
net-stats -l
net-vdr -C -l

参考资料
https://docs.vmware.com/en/VMware-NSX-for-vSphere/6.3/nsx_63_troubleshooting.pdf
https://docs.vmware.com/en/VMware-vSphere/6.0/com.vmware.vsphere.networking.doc/GUID-5CE50870-81A9-457E-BE56-C3FCEEF3D0D5.html

Saturday, November 19, 2016

Build a multiple Distro tftp boot server for ESXi and CentOS7

I use CentOS 7.1 for tftp boot server ( I also tried cobbler ,but the cobbler can not support esxi latest version)

Summary the steps
  1. Install and config tftp server
  2. Install and config dnsmq for dhcp server
  3. Install and config syslinux
  4. Install and config http server for Linux installation  
  5. Config pxe boot menu for CentOS7
  6. Config pxe boot menu for Esxi and prepare installation file
  7. prepare ks file 

Step 1:Install and config tftp server

commands

yum install tftp-server 


systemctl start tftp.socket
systemctl enable tftp.socket

config file :
# default: off
# description: The tftp server serves files using the trivial file transfer \
#       protocol.  The tftp protocol is often used to boot diskless \
#       workstations, download configuration files to network-aware printers, \
#       and to start the installation process for some operating systems.
service tftp
{
        socket_type             = dgram
        protocol                = udp
        wait                    = yes
        user                    = root
        server                  = /usr/sbin/in.tftpd
        server_args             = -s /var/lib/tftpboot
        disable                 = no
        per_source              = 11
        cps                     = 100 2
        flags                   = IPv4
}


Step 2:Install and config dnsmq for dhcp server

commands:

yum install dnsmasq

systemctl start dnsmasq
systemctl enable dnsmasq

Config file

interface=eth1,lo
#bind-interfaces
domain=centos7.local
# DHCP range-leases
dhcp-range= eth1,192.168.100.100,192.168.100.254,255.255.255.0,360h
# PXE
dhcp-boot=pxelinux.0,pxeserver,192.168.100.1
# Gateway
dhcp-option=3,192.168.100.1
# DNS
dhcp-option=6,192.168.100.1, 8.8.8.8
server=8.8.4.4
# Broadcast Address
# NTP Server
dhcp-option=42,0.0.0.0
pxe-prompt="Press F8 for menu.", 60
pxe-service=x86PC, "pxe network server 192.168.1.1 by Shen", pxelinux
enable-tftp
tftp-root=/var/lib/tftpboot
~


Step 3: Install and config syslinux

commands:
yum install syslinux
cp -r /usr/share/syslinux/* /var/lib/tftpboot

Step 4:Install and config http server for Linux installation

Install httpd commands
yum install httpd

systemctl enable httpd
systemctl start httpd


mount CenOS7 ISO and share install file by http
mkdir /mnt/CentOS7
mount /dev/cdrom /mnt/CentOS7
mkdir /var/www/html/CentOS7
cp -r /mnt/CentOs7 /var/www/html/CentOS7/
chmod -R 755 /var/www/html/CentOS7/

Check http server

Step 5:Config pxe boot menu for CentOS 7

commands
mkdir /var/lib/tftpboot/pxelinux.cfg
touch /var/lib/tftpboot/pxelinux.cfg/default

content of default:

DEFAULT menu.c32
MENU TITLE ESXi Linux  Boot Menu
PROMPT 0
TIMEOUT 600
LABEL CentOS7
  KERNEL vmlinuz
  APPEND initrd=CentOS7/initrd.img  ks=http://192.168.100.1/ks/CentOS7.cfg net.ifnames=0 biosdevname=0 ksdevice=eth0 ip=dhcp devfs=nomount
  MENU LABEL ^1 CentOS7


I added a kernel boot option for the nic number from eth0 
"net.ifnames=0 biosdevname=0"  


Step 6: Config pxe boot menu for Esxi and prepare installation file


prepare installation files
mkdir /mnt/ESXi6 
mount /dev/cdrom /mnt/ESXi6
mkdir /var/lib/tftpboot/ESXi6
cp -r /mnt/ESXi6 /var/lib/tftpboot/ESXi6


modify /var/lob/tftpboot/ESXi6/boot.cfg
Add prefix "prefix=ESXi6" at column 3
Delete slash eg "modules=/b.b00" to modules=b.b00
sed -e "s#/##g" -i.bak boot.cfg



update the boot menu
DEFAULT menu.c32
MENU TITLE ESXi Linux  Boot Menu
PROMPT 0
TIMEOUT 600
LABEL CentOS7
  KERNEL vmlinuz
  APPEND initrd=CentOS7/initrd.img  ks=http://192.168.100.1/ks/CentOS7.cfg net.ifnames=0 biosdevname=0 ksdevice=eth0 ip=dhcp devfs=nomount
  MENU LABEL ^1 CentOS7
LABEL ESXi 6.0
  KERNEL /ESXi6/mboot.c32
  APPEND -c /ESXi6/boot.cfg ks=http://192.168.100.1/ks/ESXi6.cfg
  MENU LABEL ^2 Esxi 6.0



Step 7 prepare ks files

mkdir /var/www/html/ks
touch /var/www/html/ks/CentOS.cfg
touch /var/www/html/ks/ESXi6.cfg

ks sample for CentOS7.cfg
#version=RHEL7
# System authorization information
auth --enableshadow --passalgo=sha512

# Use http install
install
url --url="http://192.168.100.1/CentOS7/"
# Use text mode install
text
# Run the Setup Agent on first boot
firstboot --enable
ignoredisk --only-use=sda
# Keyboard layouts
keyboard --vckeymap=us --xlayouts='us'
# System language
lang en_US.UTF-8
#time zone
timezone --utc Asia/Shanghai
# Network information
network  --bootproto=dhcp --device=eth0 --onboot=on --ipv6=off
# Root password
rootpw --iscrypted $6$vMgre45.Qllg
# Do not configure the X Window System
skipx
# System timezone
timezone Asia/Hong_Kong --isUtc
# System bootloader configuration
bootloader --append=" crashkernel=auto" --location=mbr --boot-drive=sda
autopart --type=lvm
# Partition clearing information
clearpart --all --initlabel --drives=sda

%packages
@core
kexec-tools
tcpdump
vim
net-tools
%end

%addon com_redhat_kdump --enable --reserve-mb='auto'

%end


ks sample for ESXi6.cfg
# Sample scripted installation file
# Accept the VMware End User License Agreement
vmaccepteula
# Set the root password for the DCUI and ESXi Shell
rootpw vmware123
# Install on the first local disk available on machine
install --firstdisk --overwritevmfs
# Set the network to DHCP on the first network adapater, use the specified hostname and do not create a portgroup for the VMs
network --bootproto=dhcp --device=vmnic0 --addvmportgroup=0
# reboots the host after the scripted installation is completed
reboot

%firstboot --interpreter=busybox
# Enable SSH and the ESXi Shell
vim-cmd hostsvc/enable_ssh
vim-cmd hostsvc/start_ssh
vim-cmd hostsvc/enable_esx_shell
vim-cmd hostsvc/start_esx_shell


Be careful: I closed selinux and firewalld on CentOS7 


Reference link:
http://www.vcritical.com/2011/07/vmware-esxi-5-interactive-pxe-installation-improvements/
http://www.virtuallyghetto.com/vmware-kickstart
http://www.tecmint.com/install-pxe-network-boot-server-in-centos-7/
http://www.bo-yang.net/2015/08/31/centos7-install-tftp-server
http://everythingshouldbevirtual.com/build-tftp-server-esxi-installs


Sunday, November 13, 2016

Analysis the esxtop data for performance partII

Follow up the partI

Due to my work platform is a Ubuntu server, but there is no pandas library in this server. I am trying to find a bash script to replace python.

So I met a first question, how to let awk use variable as a column number.  Specific ,  I want to awk can use variable from shell . Thanks  stackoverflow .com , someone meet same question .

The second question: How to combine  temp csv files ?
That’s easy to use paste command

I will continue work work the tools script and share on git hub
Here is my Github,
https://github.com/songshen06/esxtoptools

#!/bin/bash
# $1 is org esxtop data
if [ -z "$1" ]
  then
    echo "Please input the org esxtop date!"
fi

awk -F , {'print $1}' $1 > 1.csv
cat column | while read LINE
do gawk -F , -v var="$LINE" '{print $var}' $1 > temp"$LINE".csv
done
paste -d, 1.csv temp*.csv > new.csv
rm 1.csv
rm temp*.csv
~

Tuesday, October 18, 2016

A script to monitor APD and take tcpdump

Recently , I meet a NFS APD issue, so I need the pacap dump during the issue time.
But in product environment , NFS storage with heavy traffic , if is very hard to capture the traffic which you want.

I wrote a script to do that,  here is my ideas

  1. Run the command "tcpdump-uw " to take traffic 
  2. Monitor the pattern "apd.start" at vobd.log .If script capture "apd.start", wait for 10 seconds then stop the tcpdump. 
Here is script 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#!/bin/sh
NFS_IP=172.21.86.1 #set customer NFS server IP
PCAP_PATH="/vmfs/volumes/local" # set path for pcap file 
tcpdump-uw -i $1 -s 0 host $NFS_IP -C 100M -W 10 -w $PCAP_PATH/mycap.pcap &

tail -fn1 /var/run/log/vobd.log | \
while read line ; do
        echo "$line" | grep "apd.start"
        if [ $? = 0 ]
           then
      sleep 10
            kill $(lsof |grep tcpdump-uw |awk '{print $1}'| sort -u)
            pkill tail
            exit 44
        fi
                   done

Tuesday, August 23, 2016

Build your self configuration OVF or OVA

Now, there are more and more company, move their physical server product to virtualization platform.
It is very helpful for deployment, for example
When I worked in PLCM, I supported an enterprise web application, the stand deploy process:
1 Install CentOS
2 config system  IP ,and upload installation file
3 Install web application
What’s problem I met:
1 write a lot of document to educate partner engineer, how to install Linux ,config IP ,turn off  selinux
2 How to run install script, CentOS install on dell server, the nic name is em0 lead to the installation failed

But now we can use OVA  to help that

1         We can config Linux  system and install application in ova
2         During deploy OVA , customer can config IP very easy .




How to make a self-configuration OVA ?
I read VMWare public document and use vapp option to make a OVA
https://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.vsphere.vm_admin.doc%2FGUID-A6F34BAC-BF8B-4513-AB8C-14891B439D2D.html



During the deployment, I can input IP , but after the system start up .  what ? there is no IP on system , why ?
I need VMware tools, and configuration script to resolve this issue.
The stand process is

  1. 1.       Deploy ovf or ova , writing the ip info to ovf environment
  2. 2.       Run the command vmtoolsd --cmd 'info-get guestinfo.ovfEnv'
  3. Import the value from ovf setting to Guest OS
  4. 3.       Run bash or python script to config Linux





Friday, May 13, 2016

Trouble shooting network issue ,Part1, introduce some command

ESXi offer some command and tools to trouble shooting network issue, I just share my experience .


Monitor the network static 


Check vswitch port statics 


Actually , you can check vswitch port statics like a physical switch 


physical switch port check,




We can get similar result from esxi vswich port . 

First, run "net-stats -l" get the portID of VM, client name 

Second,run "esxcli network port stats get -p portID"


Check the physical NIC stats

esxcli network nic stats get -n vmnicX

This command will show more details from NIC driver

ethtool --statistcs vmnicX

Capture packet tools

There are two tools for capturing packet on ESXi, tcpdump-uw and pktcap-uw

Please keep in your mind.tcpdump-uw only can capture the vmkernel traffic. When you want to trouble shooting vmotion ,vsan , HA related network issue. tcpdump-uw is very helpful.
But , if you want to trouble shooting some vm traffic issue, pktcap-uw is your best option.
So after esxi 5.5, a new tool pktcap-uw ,I will discuss this command at my next blog.