Category Archives: Troubleshooting
So the sad truth.
Regarding my post from a few days ago allow me to elaborate.
VMturbo takes what vCOPS AKA VROPS 6.0 just introduced. Prior it was based on integration with VCO.
Long story short VMturbo has been doing this for a long time. Its time to trust something that can both – automate capacity and properly balance any workload based on SLA.
I have mad respect for VMware and VMturbo but they have been at this for so long. What took me months in vROPS – takes me seconds or hours in VMturbo. I will say VROPS 6.0 is almost NEARLY caught up to VMTurbo…
When I decided to trust VMturbo to manage my workloads the ROI became real. In my experience I have worked for service providers and enterprises but VMturbo deserves a fair chance with any business. When you automate DAY 1 operations.. that is pretty darn nice.
I won’t get into the nitty gritty but it comes down to patented analytics that recommend actions based on performance data…In other words KPIs mixed with Super Metrics and “Insert VROPS expert here” tag…
May the vForce be with you..
VMturbo or Virtual ops.
Automation out of the box or SME pro MLG stuff?
Why bother with it to begin with if it is a fail.
#CISCOUCS #WINNING #BIGDATA #UCSDIRECTOR
So I was doing some testing in vCloud Director 1.5 and noticed my RHEL Linux 5 vApp wasn’t able to enable Virtual CPU Hot add.
I went in and check my vCenter settings to see what the deal was:
Changing the setting on my vCenter updated it in my vCloud Director..
The alternative to having to do this workaround would be to change the template version within vCloud Director to RHEL version 6
You will notice the Virtual CPU hot add becomes available to check. I used this method on existing templates and it did not seem to break the templates.
However, if you are trying to create new templates of RHEL 6 with RHEL5 5 OS you may want to make sure your SCSI controller is correct. Again, changing it on my vApps seemed to make no impact to my OS currently installed.
It’s apparent bug to vCloud Director and @Lamw was kind enough to help me out.
So this article is more of a FYI than anything. I wanted to just bring some attention to this as some may really be puzzled by why the hypervisor stinks at performing large copies. @Lamw can verify as well especially when working the VM Disk files. I think it is important to highlight the distinct difference. The CP command is for files (although a VM by definition is a subset of files per VMware) but not the VMDISK files. I am sure there can be much conspiracy for why this is the case but this has actually been around for a while. If I was probably one of the age old VMware guys out there this would probably not catch me off guard because it has been around or published I should say since VI3 (ESX 3). So obviously since I did not finish my Back to the Future Delorian ride in time yet, well I just didn’t know.
During a particular situation I was copying some data from one ESX to another. This was basically a copy using the Datastore Browser in the vSphere client. I had staged some files from a NFS mount and wanted to copy them over to the SAN datastores. This NFS mount was read only so doing a storage migration would not work because they would require removing the VMDK files on the NFS mount after the copy. So I could do some clones but I could only do so many at a time. What I decided was to pop open the datastore browser and do a copy paste from the NFS to the SAN datastore. It’s also important to understand that the Datastore Browser uses HTTP GET and PUT not CP. Keep in mind this is over 10GB Ethernet (NFS) and copying to the SAN which is 4GB FC HBA. It took a while to do the copy but I didn’t really notice. After staging all the data to the new SAN datastore I had to then turn it over to another ESX that had yet another separate datastore from the one hosting all the VMDK files. So there again another copy…. This time I noticed how slow it was really going even from datastore to datastore. I knew that the copy process would more than likely run over the Management Interface but even that was on a 10GB Ethernet connection so that should be screaming as well. Not the case… So as a last test I decided to try a copy from Datastore to Datastore that is mounted to the same host. I still averaged around 20-50kbs which is pretty terrible. So no matter how I went about it performance was terrible. I pretty much knew it had to do with the process at this point although I wasn’t sure why. In many of these scenarios I used different methods from SCP applications, the Datastore Browser, and CP in the shell of ESXi.
Trying a Different Approach
So after talking with VMware support and confirming my suspicions on the issues being around the process (using CP) we went through the very same instances I noted above to rule out any issues. We tested the same scenarios; Different Protocol Datastores, Non-shared Datastore copies, Shared Datastore Copies, Local Datastore to Datastore copies, all with the same affect – even when copying just a single disk. Of course at this point the support guy was a little stumped and had to get off the line to go talk to someone else. Usually that means they need to go to someone with a fresh set of eyes or more experience to help out and sure enough he came back with another suggestion; use cloning and storage migrations as a test. I of course didn’t think of this but when he mentioned it I pretty much had a Homer the Simpson “DOH!” moment. I guess by then my head was hurting trying to figure this stuff out. When we did the storage migrations and clones it was actually MUCH faster. In fact after the support call we did some testing. I could do 10 storage migrations to 1 VM copy using the CP command. In some cases it was 10+ to one VM copy. Granted I had to now have an additional step of adding to inventory the VM Guest but that wasn’t as bad as taking 1 hour to copy 1 virtual machine. Note: The array was not VAAI capable
What does this mean?
Yeah, so that is the million dollar question isn’t? Well CP has pretty much been deprecated since VI3 but its better said “Not to be used for handling Virtual Disk”. To better understand see/read for yourself: http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=1000936
In http://www.vmware.com/pdf/esx_3p_scvcons.pdf page 3
NOTE:notice the words “SIGNIFICANT PERFORMANCE IMPROVEMENTS”
So all this to tell you that CP is not a very good solution for doing mass copies or datastore copies. For me this present a problem when using any other tools like VEEAM SCP, Putty SCP, and etc.. So make sure you know what you want to accomplish beforehand as you don’t want to end up with some headaches as I did. I know that some of you may think it was a waste of a VMware case but anytime I can find information like this and share it out for others for me is invaluable. To add to my findings I should also mention that VMKFSTOOLS also ensures the integrity of the disk and is more suited for these things by design. I think VMware intentionally focused on VMKFSTOOLS as the solution because I don’t think CP was ever something intended to be used due to the lack of functionality. It may have some to do with licensing as well.
One Last Thing:
This was a huge pain at the time of moving some data between the NFS and SAN because I really didn’t have an automated solution for doing the copies. Many of you know that VEEAM FAST SCP before the new version did not have 64-bit support. I didn’t have any 32 bit machines and I didn’t want to waste time hacking away. However, I did want to mention that VEEAM released their new version of the product which is known as VEEAM free backup; you can get that here. I also did some testing and was very impressed with the copying speeds compared to that of the CP command. Another nice thing is that even if you have no Virtual Machines registered in the vCenter it still picks them up in the copy process as VMs. Not to mention you can get statistics and automate-schedule copy jobs with the application. For me and with what I do this is priceless. Simplicity, automation, and reporting – all free! I love it! Thanks to VEEAM for listening to all those out there wanting an improved solution. They did a good job. NOTE: Thanks again to @Lamw for pointing this out. The Datastore Browser uses HTTP Get/Put not CP. I will correct this in the post later.
Alright, this is going to be difficult for me to really explain so I will do my best to serve it justice. First, I am not a coder and I do not know the ins and outs of the API and code. What I will attempt to explain to you is how you can reproduce this issue on your VCD instance. I also want to note this is vanilla VCD 1.5 with no updates yet. I currently do have a case with VMware opened and I have yet to resolve it.
Let’s get to the nitty gritty.
First off, I want to say that I am not 100% sure that any other queries you use produce the same affect. This issue seems to happen with only the VMadmin query.
First I would recommend reading about connecting the Rest API with Will’s blog over at VMware:
Now that you have read that and understand how to connect to the REST API I will show you an example of a basic VMadmin query.
(Note: you need to have over 128 VCD Vapps to reproduce this type of issue)
This showed me that I had 333 queries returned however on the 1st page I only found 128. Now the way the script talked to VCD API was rather plain and it was basically doing this query and dumping it to a XML file. The idea was that this was similar to 1.0 API where I could get all the data I wanted and dumped into an XML file. This wasn’t the case. It seems I couldn’t get around this 128 limit. So I decided to try the next query:
After running it I still got 333 queries returned but only 128 on the single page EVEN after specifying a pageSize=999 so this isn’t the end of it… let’s dig deeper. After further researching I had actually found documented proof that this was a hard setting somewhere.
Page 212 of the VCD 1.5 API Guide taken from here: http://www.vmware.com/pdf/vcd_15_api_guide.pdf
So it became obvious to me at this point that no matter what your query is it would always default to 128 objects per page. So I tried to also do the following to change this hard setting (at the recommendation of someone) located in a global.properties file in the following directory on the vCloud Director cells:
add/change the following: restapi.queryservice.maxPageSize=1024
I added this to the global.properties file and the VCD cells service were also restarted. Can you guess what still happened? Nothing… this didn’t change anything at all. In fact, it still remained broken. Folks, this still wasn’t the worse part about it. Lets cover the part that I believe is a true bug in the API and had someone on Twitter also comment that there is a possible bug in adminVM query.
Lets say I do a query for a pageSize=135 and my query returns 153 results. We get the usual 128 queries per page. Here is an example of the commands I used:
Sort ascending gives me an alphabetical sorting of all my vApp names and I can find a Breaking point for my virtual machines (I know my ABC’s and what should be next so to speak). So I copy and paste the results into Notepad++ and it shows me 128 entries of the page size of 135 (give or take a few for other lines returned not relevant to the query. The bug as discussed is evident. However, it doesn’t show the other 7 entries it should be showing. Remember, we did the page size for 135. So now let’s take a peek at page 2.
So after you run this query you will the list of the remaining 153 results. However if you take notes you will notice that it is in fact completely missing the 7 other entries. So basically your query takes the 7 it could NOT list and dumps it out to somewhere in the Cloud…. So what does this mean aside from the fact that there is a bug?
You will need to use a looping construct and not specify a page size greater then 128. (see Will’s comments below)
This is a bug and I don’t think I could make it any clearer. I wish I could’ve provided some screenshots but I think if someone does there due diligence they will see what I am talking about. If you have 2000 VCD vApps and you do a page size of 500 you would lose 372 queries between each page. No matter how you specify the page size, modify the Global.properties its just broken plain and simple. If someone would like to provide some screen shots I would be happy to put them up here to show some better detail.
If you want to discuss in further detail feel free to comment and I will follow up.
UPDATE: After reviewing with VMware on some things I found out this is actually a true but with the vCloud 1.5 API bug. The good news is that there is a fix slated to be published in August, perhaps they will allow for a private fix if you really need it. Stay tuned. If anyone has some information aside from this please provide and I will link it! Thanks again. Also, this is not related to any type of Query parameter this is more to do with how the Query service works.
Well, I wasn’t sure how to name this blog as VMware continues to use all kinds of different lingos for all of their bells and whistles. I had the unique opportunity to begin working with migrating management interfaces or also know as vmkernel interfaces around from VSS to the DVS switching. This present a lot of struggles but it seems to me that VMware has really improved this functionality in the later versions of vSphere. I recall running into many kinds of issues when doing this on 4.0. So far using a vCenter 5 server with a mix of 4.1 and 5.0 host testing has proved to be seamless and non-interruptive. However, I would still highly recommend considering all your options and testing this method THOROUGHLY before ever touching production environments.
I was able migrate a single physical NIC running ESXi management from a VSS to a VDS. This video covers how I did that. The reason for the video was because I got all kinds of senseless google links when trying to search for something documented. So, I did myself a favor and published one.
Remember, this is a test and this is only applicable for me to use in a few environments. In most cases I use redundant NICs. Now the real kicker about this is that to migrate from a VDS to a VSS requires a bit more thinking and planning. Especially if you only got access to a single PNIC. Maybe I will cover that some other time… for now try to use two. Also, this may be a solution for environments running single 10GB and need to use PVLANS or centralize managment.
What’s the deal man?
Well to be honest I have ran into two very specific issues and what I want to iterate is how crucial it is to review updates before just deploying a normal vSphere 5 implementation. First off, I want to say that in the middle of my experience with performing the upgrade to vSphere 5 the release of Update 1 occurred. So with that being said comes the dilemma. Coordinating the update process and procedure should always be critical. You should also do your due diligence and review the updates along with bugs. I have to honestly give credit to the VMware Community which has definitely allowed me to identify problems before hand and how to avoid and workaround those. Now on to my issues.
Issue Number 1:Broken sVmotion (Storage Migration)
Well, this one was obvious but being the optimist I am didn’t think I would run into this little issue. However it appears to be a ESXi special feature for vSphere 5! I would highly recommend reviewing the following issues if you are having problems performing storage vMotions on vSphere 5/vCloud 1.5. I believe it is actually an issue with the ESXi hypervisor because prior to Update 1 there was a patch you could install on your ESXi box. Please see the following references for resolution:
FIX? > Install UPDATE 1 or ESXi Patch 2
Issue Number 2: vCenter Network Alarm Feature!
So, key words to stress in this issue is probably one that makes many CRINGE. Test and Prod should always be the SAME > We all know how important that is but SERIOUSLY how many of us actually MIRROR everything even the alarms? This is more of an issue with standards and procedures then anything… again I am reminded of the 9 parts planning and 1 part implementing or the “Your poor planning doesn’t account for an emergency on my part”.
If the following statement doesn’t tell you what happened then this KB most certainly can.. 🙂
FIX? > Yeah just the read the kb its quite ridiculous… oh wait just install update 1?
So, I am writing this tell you that I would recommend applying or using the in place upgrade for vSphere 5 Upate 1. Oh, and just so you know I warned it still doesn’t support the following build number:
NOTE: I would highly recommend to updating from vanilla 4.1 to avoid the special VMware feature of PSOD.
Last but not least I equally thought it would be important to highlight a video that we can all share and relate to when facing unexpected results…. It’s not exactly the same but I can definitely relate to the frustrations..
This is a super simple tutorial that I wanted to do on how to rename VMnics. This is great for network card replacement and is just a good thing to know in case something does go south on a hardware replacement. Let’s move on.
SSH to your host and try to keep in mind to use a DRAC or ILO just in case. There should be no need to do this if you are not touching the management or service console uplinks for ESX or ESXi.
- cp -p /etc/vmware/esx.conf /etc/vmware/esx.conf.backup (This will backup the current configuration seek kb here)
- VI /etc/vmware/esx.conf
- scroll down and locate the Dev/ids and these will be followed with a =vmnic#
- Type i
- go over to the VMnic you wish to modify
- delete of backspace
- Type in the right name
- Hit “ESC”
- Type “:wq” yes that is a colon wq.
You can either reboot or try cycling services but that is pretty much all that really to it. Enjoy and thinks for stopping by!
This video explains how I solved my own issue after I upgraded to VMware Workstation 8. It seems going throught he process of removing and adding virtual devices narrowed mine down the A: drive or aka Floppy Disk. I just simply disconnected it through VMware Workstation and I would recommend removing it entirely if you do not need it. To this you would need to power down the VM. Just for the record I did an upgrade for VMware workstation, however it completely uninstalls and reinstalls the new version.
1. Power Down the VM
2. Right Click and go to “Settings”
3. Click on the Floppy Drive
4. Click Remove
5. Click Ok
*NOTE* You can also just uncheck the connect at power on option as well. I hope this fixes your issue as well.
To Build a lab:
I have been thinking a lot about how there seems to be a few gaps in the VMware community when it comes to learning to set up a VMware vSphere lab environment. So I thought I would take the time to try and put together a full on post dedicated to resources on building a VMware Lab. When I first thought about this I wasn’t sure if I wanted to do a full A-Z build. Covering every single feature or deployment, but often times I would rather not re-invent the wheel. There are MANY post covering how to do this in general but I wanted to make a point of identifying the types of labs that you can set up and how to exactly go about it as well. The key word is “lab” so you don’t want to spend a ton of money (unless you have it) on your lab. To start off there are a multitude of setups you can do and many ways you can do it. I also want to stress that if you are getting ready for your test then YOU need to have one of these labs.
vSphere Lab video 2 Cents and quick overview! (this is my fist video post)
Nested VMware vSphere Lab
- Hosted on a Desktop Virtualization Product Like VMware Workstation 7 or 8
- Allows for easy HCL compliance
- Does require a robust desktop
- Can get slow depending on what you’re doing (design)
- Networking is all virtualized (plus)
- Storage can be virtualized or something like iSCSI can be used
- Mobility (can move VM’s around between desktops and laptops)
Physical VMware vSphere Lab
- Runs ESXi as bare metal
- Is more expensive
- “Real World” set up so is truly a lab
- Must meet HCL
- Will need Physical Networking (Managed networking highly recommended)
- Takes longer to build out or rebuild
- Can run nested labs on top of ESXi (pretty much using ESXi in the way you would use VMware Workstation)
- Storage can be virtualized or something like iSCSI can be used
- Can move hosted VM’s but the physical systems are not portable/mobile (depends I guess)
In a nutshell I will be covering the nested set-up since that seems to be the less expensive rig. I also love the fact that I can move it around to my laptop and desktop which is quite handy. Also fairly easy to backup as well.
***Disclaimer: The thoughts and views expressed on VirtualNoob.wordpress.com and Chad King in no way reflect the views or thoughts of his employer or any other views of a company. These are his personal opinions which are formed on his own. Also, products improve over time and some things maybe out of date. Please feel free to contact us and request an update and we will be happy to assist. Thanks!~