Saturday, November 19, 2016

Build a multiple Distro tftp boot server for ESXi and CentOS7

I use CentOS 7.1 for tftp boot server ( I also tried cobbler ,but the cobbler can not support esxi latest version)

Summary the steps
  1. Install and config tftp server
  2. Install and config dnsmq for dhcp server
  3. Install and config syslinux
  4. Install and config http server for Linux installation  
  5. Config pxe boot menu for CentOS7
  6. Config pxe boot menu for Esxi and prepare installation file
  7. prepare ks file 

Step 1:Install and config tftp server

commands

yum install tftp-server 


systemctl start tftp.socket
systemctl enable tftp.socket

config file :
# default: off
# description: The tftp server serves files using the trivial file transfer \
#       protocol.  The tftp protocol is often used to boot diskless \
#       workstations, download configuration files to network-aware printers, \
#       and to start the installation process for some operating systems.
service tftp
{
        socket_type             = dgram
        protocol                = udp
        wait                    = yes
        user                    = root
        server                  = /usr/sbin/in.tftpd
        server_args             = -s /var/lib/tftpboot
        disable                 = no
        per_source              = 11
        cps                     = 100 2
        flags                   = IPv4
}


Step 2:Install and config dnsmq for dhcp server

commands:

yum install dnsmasq

systemctl start dnsmasq
systemctl enable dnsmasq

Config file

interface=eth1,lo
#bind-interfaces
domain=centos7.local
# DHCP range-leases
dhcp-range= eth1,192.168.100.100,192.168.100.254,255.255.255.0,360h
# PXE
dhcp-boot=pxelinux.0,pxeserver,192.168.100.1
# Gateway
dhcp-option=3,192.168.100.1
# DNS
dhcp-option=6,192.168.100.1, 8.8.8.8
server=8.8.4.4
# Broadcast Address
# NTP Server
dhcp-option=42,0.0.0.0
pxe-prompt="Press F8 for menu.", 60
pxe-service=x86PC, "pxe network server 192.168.1.1 by Shen", pxelinux
enable-tftp
tftp-root=/var/lib/tftpboot
~


Step 3: Install and config syslinux

commands:
yum install syslinux
cp -r /usr/share/syslinux/* /var/lib/tftpboot

Step 4:Install and config http server for Linux installation

Install httpd commands
yum install httpd

systemctl enable httpd
systemctl start httpd


mount CenOS7 ISO and share install file by http
mkdir /mnt/CentOS7
mount /dev/cdrom /mnt/CentOS7
mkdir /var/www/html/CentOS7
cp -r /mnt/CentOs7 /var/www/html/CentOS7/
chmod -R 755 /var/www/html/CentOS7/

Check http server

Step 5:Config pxe boot menu for CentOS 7

commands
mkdir /var/lib/tftpboot/pxelinux.cfg
touch /var/lib/tftpboot/pxelinux.cfg/default

content of default:

DEFAULT menu.c32
MENU TITLE ESXi Linux  Boot Menu
PROMPT 0
TIMEOUT 600
LABEL CentOS7
  KERNEL vmlinuz
  APPEND initrd=CentOS7/initrd.img  ks=http://192.168.100.1/ks/CentOS7.cfg net.ifnames=0 biosdevname=0 ksdevice=eth0 ip=dhcp devfs=nomount
  MENU LABEL ^1 CentOS7


I added a kernel boot option for the nic number from eth0 
"net.ifnames=0 biosdevname=0"  


Step 6: Config pxe boot menu for Esxi and prepare installation file


prepare installation files
mkdir /mnt/ESXi6 
mount /dev/cdrom /mnt/ESXi6
mkdir /var/lib/tftpboot/ESXi6
cp -r /mnt/ESXi6 /var/lib/tftpboot/ESXi6


modify /var/lob/tftpboot/ESXi6/boot.cfg
Add prefix "prefix=ESXi6" at column 3
Delete slash eg "modules=/b.b00" to modules=b.b00
sed -e "s#/##g" -i.bak boot.cfg



update the boot menu
DEFAULT menu.c32
MENU TITLE ESXi Linux  Boot Menu
PROMPT 0
TIMEOUT 600
LABEL CentOS7
  KERNEL vmlinuz
  APPEND initrd=CentOS7/initrd.img  ks=http://192.168.100.1/ks/CentOS7.cfg net.ifnames=0 biosdevname=0 ksdevice=eth0 ip=dhcp devfs=nomount
  MENU LABEL ^1 CentOS7
LABEL ESXi 6.0
  KERNEL /ESXi6/mboot.c32
  APPEND -c /ESXi6/boot.cfg ks=http://192.168.100.1/ks/ESXi6.cfg
  MENU LABEL ^2 Esxi 6.0



Step 7 prepare ks files

mkdir /var/www/html/ks
touch /var/www/html/ks/CentOS.cfg
touch /var/www/html/ks/ESXi6.cfg

ks sample for CentOS7.cfg
#version=RHEL7
# System authorization information
auth --enableshadow --passalgo=sha512

# Use http install
install
url --url="http://192.168.100.1/CentOS7/"
# Use text mode install
text
# Run the Setup Agent on first boot
firstboot --enable
ignoredisk --only-use=sda
# Keyboard layouts
keyboard --vckeymap=us --xlayouts='us'
# System language
lang en_US.UTF-8
#time zone
timezone --utc Asia/Shanghai
# Network information
network  --bootproto=dhcp --device=eth0 --onboot=on --ipv6=off
# Root password
rootpw --iscrypted $6$vMgre45.Qllg
# Do not configure the X Window System
skipx
# System timezone
timezone Asia/Hong_Kong --isUtc
# System bootloader configuration
bootloader --append=" crashkernel=auto" --location=mbr --boot-drive=sda
autopart --type=lvm
# Partition clearing information
clearpart --all --initlabel --drives=sda

%packages
@core
kexec-tools
tcpdump
vim
net-tools
%end

%addon com_redhat_kdump --enable --reserve-mb='auto'

%end


ks sample for ESXi6.cfg
# Sample scripted installation file
# Accept the VMware End User License Agreement
vmaccepteula
# Set the root password for the DCUI and ESXi Shell
rootpw vmware123
# Install on the first local disk available on machine
install --firstdisk --overwritevmfs
# Set the network to DHCP on the first network adapater, use the specified hostname and do not create a portgroup for the VMs
network --bootproto=dhcp --device=vmnic0 --addvmportgroup=0
# reboots the host after the scripted installation is completed
reboot

%firstboot --interpreter=busybox
# Enable SSH and the ESXi Shell
vim-cmd hostsvc/enable_ssh
vim-cmd hostsvc/start_ssh
vim-cmd hostsvc/enable_esx_shell
vim-cmd hostsvc/start_esx_shell


Be careful: I closed selinux and firewalld on CentOS7 


Reference link:
http://www.vcritical.com/2011/07/vmware-esxi-5-interactive-pxe-installation-improvements/
http://www.virtuallyghetto.com/vmware-kickstart
http://www.tecmint.com/install-pxe-network-boot-server-in-centos-7/
http://www.bo-yang.net/2015/08/31/centos7-install-tftp-server
http://everythingshouldbevirtual.com/build-tftp-server-esxi-installs


Sunday, November 13, 2016

Analysis the esxtop data for performance partII

Follow up the partI

Due to my work platform is a Ubuntu server, but there is no pandas library in this server. I am trying to find a bash script to replace python.

So I met a first question, how to let awk use variable as a column number.  Specific ,  I want to awk can use variable from shell . Thanks  stackoverflow .com , someone meet same question .

The second question: How to combine  temp csv files ?
That’s easy to use paste command

I will continue work work the tools script and share on git hub
Here is my Github,
https://github.com/songshen06/esxtoptools

#!/bin/bash
# $1 is org esxtop data
if [ -z "$1" ]
  then
    echo "Please input the org esxtop date!"
fi

awk -F , {'print $1}' $1 > 1.csv
cat column | while read LINE
do gawk -F , -v var="$LINE" '{print $var}' $1 > temp"$LINE".csv
done
paste -d, 1.csv temp*.csv > new.csv
rm 1.csv
rm temp*.csv
~

Tuesday, October 18, 2016

A script to monitor APD and take tcpdump

Recently , I meet a NFS APD issue, so I need the pacap dump during the issue time.
But in product environment , NFS storage with heavy traffic , if is very hard to capture the traffic which you want.

I wrote a script to do that,  here is my ideas

  1. Run the command "tcpdump-uw " to take traffic 
  2. Monitor the pattern "apd.start" at vobd.log .If script capture "apd.start", wait for 10 seconds then stop the tcpdump. 
Here is script 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#!/bin/sh
NFS_IP=172.21.86.1 #set customer NFS server IP
PCAP_PATH="/vmfs/volumes/local" # set path for pcap file 
tcpdump-uw -i $1 -s 0 host $NFS_IP -C 100M -W 10 -w $PCAP_PATH/mycap.pcap &

tail -fn1 /var/run/log/vobd.log | \
while read line ; do
        echo "$line" | grep "apd.start"
        if [ $? = 0 ]
           then
      sleep 10
            kill $(lsof |grep tcpdump-uw |awk '{print $1}'| sort -u)
            pkill tail
            exit 44
        fi
                   done

Tuesday, August 23, 2016

Build your self configuration OVF or OVA

Now, there are more and more company, move their physical server product to virtualization platform.
It is very helpful for deployment, for example
When I worked in PLCM, I supported an enterprise web application, the stand deploy process:
1 Install CentOS
2 config system  IP ,and upload installation file
3 Install web application
What’s problem I met:
1 write a lot of document to educate partner engineer, how to install Linux ,config IP ,turn off  selinux
2 How to run install script, CentOS install on dell server, the nic name is em0 lead to the installation failed

But now we can use OVA  to help that

1         We can config Linux  system and install application in ova
2         During deploy OVA , customer can config IP very easy .




How to make a self-configuration OVA ?
I read VMWare public document and use vapp option to make a OVA
https://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.vsphere.vm_admin.doc%2FGUID-A6F34BAC-BF8B-4513-AB8C-14891B439D2D.html



During the deployment, I can input IP , but after the system start up .  what ? there is no IP on system , why ?
I need VMware tools, and configuration script to resolve this issue.
The stand process is

  1. 1.       Deploy ovf or ova , writing the ip info to ovf environment
  2. 2.       Run the command vmtoolsd --cmd 'info-get guestinfo.ovfEnv'
  3. Import the value from ovf setting to Guest OS
  4. 3.       Run bash or python script to config Linux





Friday, May 13, 2016

Trouble shooting network issue ,Part1, introduce some command

ESXi offer some command and tools to trouble shooting network issue, I just share my experience .


Monitor the network static 


Check vswitch port statics 


Actually , you can check vswitch port statics like a physical switch 


physical switch port check,




We can get similar result from esxi vswich port . 

First, run "net-stats -l" get the portID of VM, client name 

Second,run "esxcli network port stats get -p portID"


Check the physical NIC stats

esxcli network nic stats get -n vmnicX

This command will show more details from NIC driver

ethtool --statistcs vmnicX

Capture packet tools

There are two tools for capturing packet on ESXi, tcpdump-uw and pktcap-uw

Please keep in your mind.tcpdump-uw only can capture the vmkernel traffic. When you want to trouble shooting vmotion ,vsan , HA related network issue. tcpdump-uw is very helpful.
But , if you want to trouble shooting some vm traffic issue, pktcap-uw is your best option.
So after esxi 5.5, a new tool pktcap-uw ,I will discuss this command at my next blog.


Monday, April 25, 2016

Analysis the esxtop data for performance issue (part1)

When you meet some performance issue on esxi platform , you will need the esxtop tools to trouble shooting .


Now this article just discussed , how to find useful data from the esxtop perf data .

Get the performance data

How to collect data , you can read the KB 

Gathering esxtop performance data at specific times using crontab (1033346)


Or  write a simple bash script to collect 24 hours performance data . Depend on your real environment , you change the esxtop parameter. 
!-- HTML generated using hilite.me -->
#!/bin/bash
for i in $(seq 1 24)
do
 echo $i >> esxtop.log
 esxcli system time get >> esxtop.log 
 esxtop -lab -d 2 -n 1800 >$i.csv
 esxcli system time get >> esxtop.log 
done
Sometimes , I usually get a log files, but the issue time just in one csv . Use following command 

1 get interesting csv file 

 for i in $(ls *.csv); do echo ======$i====== ;head -2 $i |awk -F , '{print $1}'; done

2 generate the headers( this part ,I referenceed http://virtuallyhyper.com/

$ head -1 esxtop-b.csv | sed 's/\,/\n/g' > headers
Now checking out the ‘headers’ file, I saw the following:
$ head headers
"(PDH-CSV 4.0) (UTC)(0)"
"\\local\Memory\Memory Overcommit (1 Minute Avg)"
"\\local\Memory\Memory Overcommit (5 Minute Avg)"
"\\local\Memory\Memory Overcommit (15 Minute Avg)"
"\\local\Physical Cpu Load\Cpu Load (1 Minute Avg)"
"\\local\Physical Cpu Load\Cpu Load (5 Minute Avg)"
"\\local\Physical Cpu Load\Cpu Load (15 Minute Avg)"
"\\local\Physical Cpu(0)\% Processor Time"
"\\local\Physical Cpu(1)\% Processor Time"
"\\local\Physical Cpu(2)\% Processor Time" 
The columns are now separated by new lines, that's I want 
Searching keyword , and get the number 

 grep -n 'SCSI.*Average Driver MilliSec/Command' heades
27183:"\\localhost\Physical Disk SCSI Device(naa.6005076801828732d800000000000151)\Average Driver MilliSec/Command"
29286:"\\localhost\Physical Disk SCSI Device(naa.6234567890abcde01e30fcc911b435f5)\Average Driver MilliSec/Command"

You can use awk  >> to generate new file "column" and store the column number which you want . 
grep -n 'SCSI.*Average Driver MilliSec/Command' heades | awk -F :  '{print $1}' >> column
check the content 

$ head column
27183
29286
If you just need pick up 2 or 3 column , you can use following command


1
 awk -F , '{print $1","$27183","$29286}' esxtop-b.csv > latency.csv

But If you need pick up ten or more column , you need a python script.Thanks my friend Haojie help on that.

import pandas as pd

df = pd.read_csv('esxtop-b.csv', header=None)
df_new = pd.DataFrame(df[[0]])

with open('coloumn') as f:
    for line in f:
        df_new = df_new.join(df[[int(line)]])

df_new.to_csv('out.csv')

Here is summary :
1 Create header files from CSV
2 Generate your custom column number to a column files
3 Get your interesting data.

Sunday, April 10, 2016

How to prepare PowerCLI environment on windows 7 and prepare PowerShell ISE

I am  a newbie for Powershell and PowerCli , so just write a notes .

Step one : Install Powershell on Windows 7 

 1.1, Install .NET Framework 4.0 or later
 I download .NET Framework 4.5.2 offline installer ,

 1.2  Install the Windows Management Framework 3.0 

  • If you have Windows 7 64-bit, you want the file named: Windows6.1-KB2506143-x64.msu
  • If you have Windows 7 32-bit, you want the file named: Windows6.1-KB2506143-x86.msu
   Step Two: Install PowerCLI
   Go to VMware PowerCLI Site,download the latest version .


Also I need a development environment  for PowerShell , the first option  is 

Integrated Scripting Environment (ISE)


If you want to use ISE , just need to type command "Powershell_ise" in vsphere powershell. 



After that, I got a new issue " ISE didn't load vsphere module". 

Here is my simply way , Write a Powershell_ISE profile .

"C:\Users\shensong\Documents\WindowsPowershell\Microsoft.PowerShellISE_profile"

the content:
"& 'C:\Program Files (x86)\VMware\Infrastructure\vSphere PowerCLI\Scripts\Initialize-PowerCLIEnvironment.ps1'"


You can type "echo $profile" to check your current powershell profile.Note that, Powershell ISE load different profile than Powershell 



Monday, March 21, 2016

Trouble shooting inventory service issue , case 1

What is inventory service ?


  1. All virtual infrastructure objects defined by VMware
  2. All physical objects like hosts, disks, arrays that are under Visualization management
  3. Partner defined virtual objects
  4. Physical objects like blades, enclosures, power systems that live outside a host
  5. Meta data information such as privileges

Inventory service actually is a tomcat server , we can config the heap size or max memory for JVM

Case description :
Customer search function abnormal 
Customer said , he added two new esxi host, but can not search the new vms by search function. 



Here is my trouble shooting flow .




First  step ,please check the inv-svc.log . The location , please reference KB 
location of VMware vCenter Server 6.0 log files (2110014)

check the issue time , there is no usful information . Alos use keyword "xdb" 
xdb is a file to store object information , eg: vm object

Ha~~~,got that 

2016-03-18T09:37:17.187+08:00 [provider-manager-task-66  ERROR com.vmware.vim.query.server.provider.impl.ProviderManagerServiceImpl  opId=] Caught unexpected excep     tion:^M
 382 java.lang.OutOfMemoryError: Java heap space^M
  at java.util.Arrays.copyOf(Arrays.java:2367)^M
 384         at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)^M
 385         at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)^M
 386         at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)^M
 387         at java.lang.StringBuilder.append(StringBuilder.java:204)^M
 388         at com.xhive.xDB_10_2_0.h.b(xdb:159)^M
 389         at com.xhive.xDB_10_2_0.f.a(xdb:201)^M
 390         at com.xhive.xDB_10_2_0.f.parseWithContext(xdb:47)^M
 391         at com.vmware.vim.query.server.store.impl.AtomFeedProcessor$EntryUtil.create(AtomFeedProcessor.java:756)^M
 392         at com.vmware.vim.query.server.store.impl.AtomFeedProcessor$PullEntryProcessor.processEntries(AtomFeedProcessor.java:329)^M
 393         at com.vmware.vim.query.server.store.impl.AtomFeedProcessor$EntryProcessor.call(AtomFeedProcessor.java:117)^M
 394         at com.vmware.vim.query.server.store.impl.StoreImpl.updateVmomiPullAtomFeed(StoreImpl.java:2539)^M
 395         at com.vmware.vim.query.server.store.impl.StoreImpl.update(StoreImpl.java:2412)^M
 396         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)^M
 397         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)^M
 398         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)^M
 399         at java.lang.reflect.Method.invoke(Method.java:606)^M
 400         at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)^M
 401         at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)^M
 402         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)^M
 403         at org.springframework.aop.interceptor.JamonPerformanceMonitorInterceptor.invokeUnderTrace(JamonPerformanceMonitorInterceptor.java:108)^M
 404         at org.springframework.aop.interceptor.AbstractTraceInterceptor.invoke(AbstractTraceInterceptor.java:111)^M
 405         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)^M
 406         at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)^M
 407         at com.sun.proxy.$Proxy46.update(Unknown Source)^M
 408         at com.vmware.vim.query.server.provider.AbstractAtomPullProviderBase.checkFeed(AbstractAtomPullProviderBase.java:150)^M
 409         at com.vmware.vim.query.server.provider.impl.ProviderManagerServiceImpl$FeedPump.run(ProviderManagerServiceImpl.java:1107)^M
 410         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)^M


It looks like some meory issue on JVM ,
Second steps ,check the wrapper.log what is warpper.log please see following KB

Deleting or rotating the wrapper.log file in vFabric tc Server
 (2010299)

I found from the 1/21 , the inventory serivce works abnormal 


16190 INFO   | jvm 1    | 2016/01/21 10:48:32 | java.lang.OutOfMemoryError: GC overhead limit exceeded
 16191 INFO   | jvm 1    | 2016/01/21 10:48:32 | Dumping heap to D:\ProgramData\VMware\vCenterServer\logs\invsvc\java_pid3320.hprof ...

Java process crash , and dumpping heap a lot of times 
Suspecting memory leak on inveroy service 6.0 ,check the knowissue at VMware KB .  The answer is no 

Continue check ,customer enviorment .  Customer using windows VC, Windows OS with 8192 MB memory 

Now ,ask other questions , how many VMs and how many esxi hosts under this VC control  ?About over 100 vms now. 


Oh....

The third steps 
We need check vCenter Server for Windows Hardware Requirements” 

 We have to add window OS  memory to 16G. 

Please consider the JVM setting , 

D:\ProgramData\VMware\vCenterServer\cfg/java/vmware-invsvc_jvm.conf

change 

-Xmx950m -XX:MaxPermSize=256m -XX:ThreadStackSize=256 -XX:ParallelGCThreads=1
to


-Xmx1506m -XX:MaxPermSize=256m -XX:ThreadStackSize=256 -XX:ParallelGCThreads=1
After change above settings , the issue is gone . 


KB Links :
http://kb.vmware.com/kb/2110014

Location of VMware vCenter Server 6.0 log files (2110014)

http://kb.vmware.com/kb/2010299
Deleting or rotating the wrapper.log file in vFabric tc Server (2010299)

Besides , I didn't suggest resetting DB firstly , because  I didn't see any xdb file io  error or read error at inv-svc.log