Category Archives: vSphere 5

Cloud Management – Turbonomics is Awesome!

Hello my friends, It has been quite a while since I last blogged but I wanted to take some time to share some of my experience over the past couple of years. I have had the opportunity to work with some great companies, people, and it has definitely been a very enlightening experience.

I had the privilege of being apart of a special project nearly 5 years ago which began my career in the cloud. I got to engineer and deploy one of the nations first ever GSA clouds which was a great experience. As time rolled on and cloud was adopted many things came into the light. Being a VMware savvy guy I really didn’t have all the time to spend learning all these new technologies which were directly competing. At this time, Amazon was getting big, VMware was about to release VRA, and the market stood still… or so it felt.

Microsoft had launched their On-Prem cloud and before we knew it we had to start getting serious about the cost of our delivery and compute. If you have never had the pleasure of working for service providers let me tell you – its all about cost. So we put Azure to the test, compared it, vetted it, did anything we could to ensure it could be operationally supported. It was a very interesting time and nice comparison to our existing IaaS architecture. We definitely had our work cut out for us.

Since then the challenges of hybrid cloud have become real. Although some vendors had good solutions like UCS Inter-Cloud Fab or vCloud Connector… (insert whatever else here) we always seemed to have unique enough requirements to disqualify it. Needless to say we still deployed, stood them up, tested them, and found great value it still wasn’t justifiable enough for us to warrant a change. Being a service provider isn’t about offloading to another cloud… it’s about how you can upsell your services and provide more value for customers.

As time grew on people adopted Cisco UCS into their infrastructures and eventually it seemed like updating and maintaining infrastructure became critical and the speed of delivery is only hindered by how fast we can adopt new offerings.. If we cannot seamlesly update, migrate, or refresh to new then what can we do?

“Its so old its not even supported!”
“Wow, no new firmware for 5 tears?!”
“Support for VMware has lapsed :(”

“Who cares?”

You can automate this pain away easily. Just because one vendor doesn’t support a feature or a new version does not mean you have to still burden your IT staff. If you can standardize operational processes between your cloud(s), Visibility, Integration, and Support – would you?

The biggest challenge is getting out of the old and into the new. Most legacy infrastructure runs on VMware and you can do this with Turbonomics and a variety of other tools. One of the benefits of going 3rd party is that you don’t have “lock-in” to any infrastructures or software. You can size it, optimize it, price it, and compare it to ensure things run as they should. Versioning, Upgrades, and these things will always be challenge but as long as you can ensure compliance, provisioning, optimization, and performance it won’t be an after thought. I found Turbonomics to always get the job done and always respond in a way that provided a solution and more than that… at a push of a button.

Some of the benefits:
– Agnostic Integration with a large set of vendors
– Automated Provisioning for various types of compute
– Easily retrofit existing infrastructure for migration
– Elastic compute models
– Cost Comparison, Pricing Existing, Etc…
– I.e. Amazon AWS, Azure
– Track and exceed your ROI Goals
– Eliminate Resource Contention
– Automate and Schedule Migrations between Compute Platforms (Iaas > DBaaS)
– Assured performance, control, and automated re-sizing
– Not version dependent and can be used in a wide variety of scenarios – I.e. I can elaborate if needed.
– Get rolling almost instantly with it…

5 years and I still think Turbonomics is a great product. I have used it extensively in the early days and also worked with it during the vCloud Integration piece. The free version is also amazing and very helpful. Spending time checking capacity, double checking data, ensuring things are proper and standard, all that stuff you can forget about it. Configure your clouds; private, public, or dedicated into Turbonomics quickly.

You just have to trust proven software especially if its been 7 years in the making and exceeds capabilities that most tools require significant configuration for. Also, always keep in mind that TURBONOMICS can learn your environment and the value of understanding the platform and providing insight can be huge.  You have to admit that some admins may not understand or know other platforms. This simplifies all that by simply understanding the workload and infrastructure that it runs on.

Other Great Information or References:
Cisco One Enterprise Suite – Cisco Workload Optimization Manager:
CWOM offered with

Click to access solution-overview-c22-739078.pdf

Click to access at-a-glance-c45-739098.pdf

https://www.sdxcentral.com/articles/news/cisco-launches-new-ucs-servers-hybrid-cloud-management-software/2017/07/

Turbonomics and BMC:
“Running it Red Hot with Turbonomics”
https://turbonomic.com/resources/videos/cloud-economics-on-prem-or-off-with-turbonomic-bmc/

 

vCloud Director – 1.5 RHEL 5 Bug Hot CPU ADD – Quick Work Around

So I was doing some testing in vCloud Director 1.5 and noticed my RHEL Linux 5 vApp wasn’t able to enable Virtual CPU Hot add.

I went in and check my vCenter settings to see what the deal was:

Changing the setting on my vCenter updated it in my vCloud Director..

The alternative to having to do this workaround would be to change the template version within vCloud Director to RHEL version 6

You will notice the Virtual CPU hot add becomes available to check. I used this method on existing templates and it did not seem to break the templates.
However, if you are trying to create new templates of RHEL 6 with RHEL5 5 OS you may want to make sure your SCSI controller is correct. Again, changing it on my vApps seemed to make no impact to my OS currently installed.

It’s apparent bug to vCloud Director and @Lamw was kind enough to help me out.

vSphere 5 – Storage pt.3 LUN Sizing – Why it matters..

Well, I guess I am on a roll this week. I feel like a lot of my themes have been around storage and VMware this week. I don’t think that is a bad thing but I am seeing some gaps out there as far as considerations and recommendations. My only point in this post is to share my thoughts for you and what you should consider when facing this after your vSphere 5 upgrade or after you install it. I have to wonder just how many enterprises out there have seriously pushed the envelope of LUN sizing in VMware. One has to think; “If you are carving up large LUNS does that mean your scaling up?”. There are so many implications one should consider when designing your storage. One of the more critical pieces is I/Ops and the cluster size and what your target workload is. With bigger LUNS this is something you have to consider and I do think it is common knowledge for the most part.

There are so many things one should consider when deciding on a LUN Size for vSphere 5. I sincerely believe VMware is putting us all in a situation of scaling up sometimes. With the limitations of SDRS and Fast Provisioning it has really got my mind thinking. It’s going to be hard to justify a design scenario of a 16 node “used to be” cluster when you are trying to make a call on if you really want to use some of these other features. Again, you have heard me says this before but I will say it again; it seems more and more that VMware is making a huge target of this to Small to Medium sized businesses but offering some features larger sized companies (with much bigger clusters) now have to invest even more time in reviewing their current designs and standards – Hey, that could be a good thing 🙂 . Standards to me are a huge factor for any organization. That part seems to take the longest to define and some cases even longer to get other teams to agree to. I don’t think VMware thought about some of those implications but I am sure they did their homework and knew just were a lot of this was going to land…

With that being said I will stop my rambling on about these things and get to the heart of the matter or better yet heart of the storage.

So, After performing an upgrade I have been wondering what LUN size would work best. I believe I have some pretty tough storage and a solid platform (CISCO UCS) so we can handle some I/Ops. I wanted to share some numbers with you that I found was very VERY interesting. I have begun to entertain the notion of utilizing Thin Provisioning even further. However, we are all aware that VMware still has an issue with UNMAP command which I have pointed out in previous blogs (here). However being that I have been put between a rock and hard place I believe update 1 to vSphere 5 at least addressed 1/2 of my concern of it. The other 1/2 that didn’t was the fact that now I have to defer to a manual process that involves an outage to reclaim that Thin Provisioned space… I guess that is a problem I can live it with given the way we use our storage today. It doesn’t cause us to much of a pain, but it is a pain none the less.

Anyways, so here is my homework on LUN sizing and how to get your numbers (Estimates):
(Note: This is completely hypothetical and not related to any specific company or customer; this will also include Thin Provisioning and Thick)

  • Factor an Average IOps per LUN (if you can from your storage vendor or from vCenter or an ESXi host)

    Take the IOps per all production LUNS and divide it by the number of datastores

    Total # IOps / # of Datastores

  • Gather the average numbers of virtual machines per datastore

    Total # VM’s / # of Datastores

    Try to use Real World production virtual machines

  • Decide on the LUN Size and use your current baseline as a multiplication factor from your current.

    So if you want to use 10TB Datastores and you are using 2TB datastores you can take whatever numbers and

    10TB / 2TB = 5 (this is you multiplication factor for IOPs and VM:Datastore Ratio)

So now let’s use an example to put this to practical use… and remember to factor in free space for maintenance I always keep it at 10% free.

Let’s say we have a customer with the following numbers before:

16 VM’s per Datastore

1200 I/Ops Average per Datastore (we will have to account for peak to)

2TB Datastore LUNS

Now for the math (Lets say the customer is moving to 10TB LUNS so this would be a factor of 5):

16 x 5 = 80 VM’s per Datastore (Thick Provisioned)

120 x 5 = 600 IOps per Datastore…

Not bad at all, but now let’s seriously take a look at thin provisioning which is QUITE different on numbers. Let’s say we check our storage software and it tells us on average a 2TB LUN only really uses 500 GB of space for the 16 VM’s per Datastore. Lets go ahead and factor some room in here (10% for alerting and maintenance purposes this time around). You can also download RVTools to get a glimpse of actual VM usage versus provisioned for some thin numbers.

First off:

16 VM per 500GB so that times 4 for the 2TB LUN; Makes 64 Thin VMs per 2TB Datastore.

Times that by the new LUN size 9TB / by 2TB = 4.5 (minus 10% for reserved for alerting purposes and Maintenance; this could also be considered conservative)

64 x 4.5 = 288 Average VM Per 10TB Datastore (and that 1 TB reserved too!)

We aren’t done yet; here comes the IOPs and lets use 1500 IOPs. Since we times the VM’s by a factor of 4 we want to do this for the average of IOPs as well:

1500 x 4 = 6000 per 2TB LUN; Using thin provisioning on VMs

600 x 4.5 = 2700 IOps per LUN.

So this leave use with the following numbers for thick and thin:

VM to 10TB Datastore ratios:

80 Thick

288 Thin

IOps to 10TB Datastore ratios:

6000/IOps Thick Provisioning

2700/IOps Thin Provisioning

So, I hope this brings to light some things you will have to think about when choosing a LUN size. Also note that this is probably more of a service provider type of scenario as we all know most may use a single 64TB LUN though I am not sure I would recommend that. It all comes down to use-case and how it can be applied. So this also begs to question what’s the point of some of those other features if you leverage Thin Provisioning. Here are some closing thoughts and things I would recommend:

  • Consider Peak loads for your design; the maximum IOps you may be looking for in some cases
  • Get an average/max per VM datastore ratio (locate your biggest Thin VM)
  • Consider tiered storage and how it could be better utilized
  • Administration and Management overhead; essentially the larger the LUN the less over all provisioning time and so on.
  • VAAI capable array for those Thin benefits (running that reclaim UNMAP script..)
  • Benchmark, Test using some other tools on that bigger LUN to ensure stability at higher IOps
  • Lastly the storage array benchmarks and overall design/implementation
  • The more VM you can scale on a LUN can affect your cluster design; You may not want to enable your customers to scale that much
  • Alerting considerations and how you will manage it efficiently to not be counterproductive.
  • Consider other things like SDRS (fast provisioning gets ridiculous with Thin Provisioning)
  • Storage latency and things like Queues can be a pain point.

I hope this helps some of those out there that have been wondering about some of this stuff. The LUN size for me dramatically affect my cluster design and what I am looking to achieve. You also want to load test your array or at least get some proven specs on the array. I currently work with HDS VSP arrays and these things can handle anything you can throw at them. They are able to add any type of additional capacity you need rather it be Capacity, IOps, Processing or what not you can easily scale it out or up. Please share your thoughts on this as well. Here are some great references:

http://www.yellow-bricks.com/2011/07/29/vmfs-5-lun-sizing/
http://serverfault.com/questions/346436/vmware-vmfs5-and-lun-sizing-multiple-smaller-datastores-or-1-big-datastore
http://communities.vmware.com/thread/334553
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2014849 

Note: these numbers are hypothetical but its all in the numbers.

vSphere – vCloud – Fast Provisioning – My Thoughts…

Yea, some would say this post is probably overdue but lately I have sincerely been thinking. Have we been drinking some Kool-Aid around this feature? I couldn’t help but have some concerns around possible implementation of this feature in VCD installments. I in particular, am not sold on it completely. Here are just some quick reasons for me that didn’t exactly sell me.

  1. It’s a very “new” feature in regards to VCD which is still early in its years as a cloud platform.
  2. No way of currently updating those linked clones unlike VMware View. (some admin over head as well as using local and shared catalogs)
  3. Added complexity (with linked images, snap chains, and how you have handle storage motion)
  4. By Default ALL linked clone images are mis-aligned. (VMware has yet to address this problem) In some cases this could be a compounding factor causing some additional I/O overhead.
  5. Design has to be highly considered and evaluated with a max of 8 node clusters (This will affect current installments as well)

So yeah, I know I look like the bad guy but I seriously think this release was just a target more to SMB than anything. IMO, this is more like a feature for those of smaller businesses because now they don’t have to go out and spend all that crazy dough on a VAAI capable array (Hooray for them :)) which begs to question….

Why do you need to enable this feature if you already leverage VAAI capable arrays?

It just seems to me that Fast Provisioning is a little pre-mature in its release. Although VCD continues to improve I think this features needs some serious improving before some bigger shops may decide to utilize it. The other down is that we have yet to see any real progress on the UNMAP problem and it’s now treated as a manual task we should run during certain times… or outages I should say. That really blows because we all know what kinds of benefits and problems thin provisioning on some array can cause. For the most part, it’s just really bad reporting… lol.

Here are some other sources I would recommend reading and I seriously think you should read them and learn for yourself if it’s really worth it. Also, be careful not to put the cart before the OX and do your homework. Some people drink the kool-aid and don’t think to question or ask “What’s really under the hood?”. Fast Provisioning should never be compared to VMware View… It’s similar but not identical.. I would definitely recommend reading Nick’s blog it opened my eyes to what he calls the “Fallacies” and of course Chris has a good read.

http://datacenterdude.com/vmware/vcd-fast-provisioning-vaai-netapp/
http://www.chriscolotti.us/vmware/info-vcloud-director-fast-provisioned-catalog-virtual-machines/
http://www.kendrickcoleman.com/index.php?/Tech-Blog/vcloud-director-15-features-that-effect-limitation-and-design.html

vSphere – Networking – ESXi Single NIC VDS Management Migration

Well, I wasn’t sure how to name this blog as VMware continues to use all kinds of different lingos for all of their bells and whistles. I had the unique opportunity to begin working with migrating management interfaces or also know as vmkernel interfaces around from VSS to the DVS switching. This present a lot of struggles but it seems to me that VMware has really improved this functionality in the later versions of vSphere. I recall running into many kinds of issues when doing this on 4.0. So far using a vCenter 5 server with a mix of 4.1 and 5.0 host testing has proved to be seamless and non-interruptive. However, I would still highly recommend considering all your options and testing this method THOROUGHLY before ever touching production environments.

I was able migrate a single physical NIC running ESXi management from a VSS to a VDS. This video covers how I did that. The reason for the video was because I got all kinds of senseless google links when trying to search for something documented. So, I did myself a favor and published one.

Remember, this is a test and this is only applicable for me to use in a few environments. In most cases I use redundant NICs. Now the real kicker about this is that to migrate from a VDS to a VSS requires a bit more thinking and planning. Especially if you only got access to a single PNIC. Maybe I will cover that some other time… for now try to use two. Also, this may be a solution for environments running single 10GB and need to use PVLANS or centralize managment.