Okay Experts take it easy on me…
As you know I have been writing various post around building your VMware Workstation Lab. One of the key points that I am trying to drive during this lab build is how to get your environment as closely matching a production environment as you can. Obviously networking is a very broad subject especially when it comes to implementing a production vSphere environment. I am going to attempt to do this topic justice by sharing some key points that should help you understand more of how you should design your network. I am not going to attempt to do what Kendrick Coleman does (because his designs are solid). I am only going to provide talking points and recommendations about traffic separation, why it should be separated, and details. Also, keep in mind that the most important factor of networking is using N+1 or not to use N+1. I will say that it is highly recommended from VMware to have your physical networking N+1 so you can benefit further from High-Availability. So let’s get started with the traffic types.
- Management (High Availability)
- Fault Tolerance (Not in all cases)
- VM Networks
- Backup (Not in all cases)
- Storage/NAS (Depends on the type)
Note: backup and Storage say depends because in some cases you may or may not have iSCSI/NAS storage or be running backups for your virtual machines, Especially if you use a product like Veeam or CommVault. Fault tolerance isn’t really used and I believe that even when it does get better it still may not be worth it, considering all the bigger workloads and cost in licensing as well. Here are my recommendations and best practices I follow for dedicating traffic:
- Management: If possible: VLAN it, Separate the traffic (to a different switch), Use teaming or a single Nic (if you set up a MGMT kernel on another port group), You can run/share traffic with vMotion, Fault Tolerance, Backup, and Storage NAS. If you do share traffic use some sort of QOS or Network I/O control. BE mindful that running management with all this traffic isn’t recommended but this would provide you a way to run all this traffic over a separate switch a part from production VM traffic. If you have plenty of NICs then you can run it over the VM production network (but you don’t want to expose it to that network) but you must somehow separate it with a different subnet or VLAN. Most cases I see vMotion and MGMT being shared with Fault Tolerance (FT with big 10GB networks). Your NIC teaming should use explicit failover and over-ride so your vMotion/FT traffic will go over a seperate interface then your management traffic.
- vMotion-FT-Backup-Storage-NAS: L2 traffic, hopefully doesn’t have to be routed, in most cases I see this and management traffic being shared, especially with 10GB. vMotion+FT+Backup+NAS if you don’t have a ton of connections. On this particular set up it would be good to setup Jumbo Frames. This traffic you wouldn’t want running over production if possible so a dedicated switch would be really good, also VMware recommends using a dedicated storage switch anyways.
- VM Networks: I usually dedicate two NICs for VM production traffic and usually create separate port groups for each type of VM related traffic. In some cases you may have a customer who requires separating this out over different NICs. Again this is just one of those you have to look at based on requirements at that time. Normally the ladder is good enough.
- Storage/NAS and Backup: In most cases businesses may have their own backup network. You could run storage and backup traffic over those switches if you choose. In that case, you mines of well also run vMotion and FT.
The Switches and considerations:
You usually want 2 types of switches if that is all you can do. In some cases if you go 4 that would be even better because then you can look at N+1. Where you can try to separate the big traffic from the little traffic (Management). If you cannot separate it by using dedicated switches then use QOS or NIOC to control the traffic. Managed switches can be expensive, just remember in vSphere 5 support for LLDP came into being so you don’t have to worry about buying CISCO stuff to get that CDP information. If you do not plan on using a Converged Network Architecture (FCoE) then be sure to buy enough 1GB Nics. These things are cheap you can load them up even if you may not use them. Things like migrations and stuff come up and if you only buy what you need you’ll end up robbing Peter and paying Paul.
This is really just a quick overview and recommendations. Unfortunately we only have what we are given in most cases. We also work off budgets. I am going to cover some lab exercies that break this down even further. General Stuff… I hope you enjoy it and I am sure I am going to be updating it as well.
Here are some notes I find useful for anyone wanting to learn more about the latest that HDS has to offer. Please note the date of this post and that something are subject to change.
1.Thin Provisioning > Unmap primitive WILL not be availible until Q1 2012
PER HDS > Reason being is that the UNMAP primitive when issued from VMware ESXi (vSphere 5) causes all the workloads on the Provisioned Storage Pool
to take a performance hit. This is because the UNMAP is issued with a “high priority”.
The KB provides how to disable this feature.. It IS NOT disabled by default. This is a command ran locally on each ESXi 5 host.
2. sVmotions, VMDK deletions on various storage platforms with ANY thin provisioned Arrays will always be showns as space allocated. (The space is never reclaimed) To counter this negative effect UNMAP was introduced however is not production ready and causes problems.
3. Round – Robin PSP – is the HDS best practice for multi-pathing. HOWEVER we cannot do any clustering (MSCS or Oracle RAC) with this plug-in or AKA – Physical Compatible Raw Device Mappings. (VMs accessing physical SAN LUNS vs. VMFS) However HDS has released HDLM which is the HDS branded VMware Multi-Path Plug in and you can do both with this plug-in.
4. Block Size on HDS is 42MB – with vSphere 5 unified block size of 1 MB HDS arrays have to delete more blocks due to the unified block sizing. Stated that proper alignment is important between SAN, VMFS, and VMs.
5. Tier 1 Applications need special consideration depending on the use case – we may need to look at dedicating a particular level of Guaranteed I/O or creating seperate Dynamic Storage pools to meet the workloads. There is HDS documentation for using VSP with vCloud Director however it doesn’t cover application workloads > waiting for these from HDS as we virtualize MANY different workloads in our cloud and it needs to scale easily on demand.
7. vCenter plug in – more to come..
8. I/O and measurements for performance on the SAN side is done through Performance Tuner from HDS. More to come.
9. No firmware update should be needed on the HDS side for VSP array to utilize the two new vSphere 5 VAAI primitives. Not sure about AMS…
10. VASA is coming quickly and this will give customers the visibility and deeper reporting for other things especially for VMware environments using HDS storage.
11. HDS recommend doing VMware Guest OS’es as Thick and doing San array thin. However there is not a huge gain in performance between the two. Our direction is to use Thin on Thin provisioning for reporting purposes, and etc..
For more HDS white papers and such you get to those here under a resources tab:
Hitchi Data Systems VMware resources
***Disclaimer: The thoughts and views expressed on VirtualNoob.wordpress.com and Chad King in no way reflect the views or thoughts of his employer or any other views of a company. These are his personal opinions which are formed on his own. Also, products improve over time and some things maybe out of date. Please feel free to contact us and request an update and we will be happy to assist. Thanks!~
Well, as anyone knows starting a new job you almost always hit that point to where things get a little slow and you have to find things to do. If your VMware environments are anything like the majoirity of them out there chances are you can do some remediation. Now, First off you have to give credit where it’s due and I can honestly say that Alan Renouf and Luc Dekens both do a fabulous job of bringing all kinds of cool scripts to the table. Anyways lets get on with it.
First thing is first run over to Quest and grab PowerGUI Free and then you want to get the VMware Quest Powerpack and then the VMware Community Powerpack. You also want to pick Alan’s vCheck which is one of the most excellent tool ever!!
(Note: props to Alan and Kirk who spent majority of there time working on these excellent tools!)
Here is usually where I start:
1. Modify the vCheck to your liking (refer to the link on Alan’s log for any questions) the things I usually end up modifying on this script are:
- Snapshot age ( I change to 3 days 72 hours is long enough)
- Update NTP to your NTP server
- Change Datastore free space remaining
- Disable detecting dead path to LUN (seems to hang for me at times)
- Can adjust the VM free space (though personally it needs to be % based not MB Free
- Change vCenter alerts to something appropriate (I use 7 days)
- Change VM removal time frame (I also use 7 days)
2. Now run the script and check out your remediation items, pay attention to certain issues like:
- vMotion restraints because of CD-Roms attached
- Datastores low on space (powerpack can help with this)
- VMware tools out of date, issues, or not installed at all
- The above appear to be the more common ones I run this script weekly.
3. From the Powerpacks I usually run the following scripts.
- Best Practice Queries >Disk Queries > Orphaned VMDKs
- Best Practice Queries > Disk Queries > RDM’s
- Best Practice Queries > Disk Queries > Thin Disk
- Community PowerPack > Resource Pools > Ballooning Script
- Virtual Machine > VM with over X number CPU’s
- CD-ROMs mounted to VM
4. On an interim basis I will rerun a vCheck or I will run single scripts on an as needed basis:
- Snapshots > All Snapsots
- Virtual Machines > HAL Information (Not really an issue with win2k8
- Virtual Machines > CPU Ready %
- Virtual Machines > VM with active memory ballooning
- Waste Finder > If I feel like doing some deeper Datastore Cleanup
- Powered off VM
- Scan VM’s for NIC Drivers (Update install vmxnet3 if using e1000)
- Check Disk Alignment of all your VMs both Linux and Windows and each drive then update templates
- Enable LPS for certain VM if needed – windows doesnt enable by defualt but ESX 3.5 and up does. Yields memory savings based on app.
- Check to ensure windows 2008 templates and VMs have the WDDM display driver
At first glance some of these items may not makes sense, but you have consider your own environment. HAL is a good one to run really more so the first time around just to make sure your older stuff windows 2k3 and 2k are using the right HAL for the vCPU. I also like to run and dismount all ISO’s from the VMs. Now, you may want to make sure its not a VMware tools ISO mounted to the VM. If it is then you can possibly get a pop-up for a linux VM and its will appear to be unresponsive until someone answers the pop-up with a yes or no. Keep in mind remediation is about starting with the quick and easy and then working your way down. It takes time and creativity.
Now you will have challenges when remediating some things like snapshots when they are really big and I will add a secondary part describing what I normally do, in most cases a clone fixes the issue.
(NOTE: I will be adding additional links later on)
***Disclaimer: The thoughts and views expressed on VirtualNoob.wordpress.org and Chad King in no way reflect the views or thoughts of his employer or any other views of a company. These are his personal opinions which are formed on his own. Also, products improve over time and some things maybe out of date. Please feel free to contact us and request an update and we will be happy to assist. Thanks!~