Category Archives: Hardware
Overview of Non-Vendor specific solutions and considerations for a virtual platform.
Well, I guess I am on a roll this week. I feel like a lot of my themes have been around storage and VMware this week. I don’t think that is a bad thing but I am seeing some gaps out there as far as considerations and recommendations. My only point in this post is to share my thoughts for you and what you should consider when facing this after your vSphere 5 upgrade or after you install it. I have to wonder just how many enterprises out there have seriously pushed the envelope of LUN sizing in VMware. One has to think; “If you are carving up large LUNS does that mean your scaling up?”. There are so many implications one should consider when designing your storage. One of the more critical pieces is I/Ops and the cluster size and what your target workload is. With bigger LUNS this is something you have to consider and I do think it is common knowledge for the most part.
There are so many things one should consider when deciding on a LUN Size for vSphere 5. I sincerely believe VMware is putting us all in a situation of scaling up sometimes. With the limitations of SDRS and Fast Provisioning it has really got my mind thinking. It’s going to be hard to justify a design scenario of a 16 node “used to be” cluster when you are trying to make a call on if you really want to use some of these other features. Again, you have heard me says this before but I will say it again; it seems more and more that VMware is making a huge target of this to Small to Medium sized businesses but offering some features larger sized companies (with much bigger clusters) now have to invest even more time in reviewing their current designs and standards – Hey, that could be a good thing 🙂 . Standards to me are a huge factor for any organization. That part seems to take the longest to define and some cases even longer to get other teams to agree to. I don’t think VMware thought about some of those implications but I am sure they did their homework and knew just were a lot of this was going to land…
With that being said I will stop my rambling on about these things and get to the heart of the matter or better yet heart of the storage.
So, After performing an upgrade I have been wondering what LUN size would work best. I believe I have some pretty tough storage and a solid platform (CISCO UCS) so we can handle some I/Ops. I wanted to share some numbers with you that I found was very VERY interesting. I have begun to entertain the notion of utilizing Thin Provisioning even further. However, we are all aware that VMware still has an issue with UNMAP command which I have pointed out in previous blogs (here). However being that I have been put between a rock and hard place I believe update 1 to vSphere 5 at least addressed 1/2 of my concern of it. The other 1/2 that didn’t was the fact that now I have to defer to a manual process that involves an outage to reclaim that Thin Provisioned space… I guess that is a problem I can live it with given the way we use our storage today. It doesn’t cause us to much of a pain, but it is a pain none the less.
Anyways, so here is my homework on LUN sizing and how to get your numbers (Estimates):
(Note: This is completely hypothetical and not related to any specific company or customer; this will also include Thin Provisioning and Thick)
Factor an Average IOps per LUN (if you can from your storage vendor or from vCenter or an ESXi host)
Take the IOps per all production LUNS and divide it by the number of datastores
Total # IOps / # of Datastores
Gather the average numbers of virtual machines per datastore
Total # VM’s / # of Datastores
Try to use Real World production virtual machines
Decide on the LUN Size and use your current baseline as a multiplication factor from your current.
So if you want to use 10TB Datastores and you are using 2TB datastores you can take whatever numbers and
10TB / 2TB = 5 (this is you multiplication factor for IOPs and VM:Datastore Ratio)
So now let’s use an example to put this to practical use… and remember to factor in free space for maintenance I always keep it at 10% free.
Let’s say we have a customer with the following numbers before:
16 VM’s per Datastore
1200 I/Ops Average per Datastore (we will have to account for peak to)
2TB Datastore LUNS
Now for the math (Lets say the customer is moving to 10TB LUNS so this would be a factor of 5):
16 x 5 = 80 VM’s per Datastore (Thick Provisioned)
120 x 5 = 600 IOps per Datastore…
Not bad at all, but now let’s seriously take a look at thin provisioning which is QUITE different on numbers. Let’s say we check our storage software and it tells us on average a 2TB LUN only really uses 500 GB of space for the 16 VM’s per Datastore. Lets go ahead and factor some room in here (10% for alerting and maintenance purposes this time around). You can also download RVTools to get a glimpse of actual VM usage versus provisioned for some thin numbers.
16 VM per 500GB so that times 4 for the 2TB LUN; Makes 64 Thin VMs per 2TB Datastore.
Times that by the new LUN size 9TB / by 2TB = 4.5 (minus 10% for reserved for alerting purposes and Maintenance; this could also be considered conservative)
64 x 4.5 = 288 Average VM Per 10TB Datastore (and that 1 TB reserved too!)
We aren’t done yet; here comes the IOPs and lets use 1500 IOPs. Since we times the VM’s by a factor of 4 we want to do this for the average of IOPs as well:
1500 x 4 = 6000 per 2TB LUN; Using thin provisioning on VMs
600 x 4.5 = 2700 IOps per LUN.
So this leave use with the following numbers for thick and thin:
VM to 10TB Datastore ratios:
IOps to 10TB Datastore ratios:
6000/IOps Thick Provisioning
2700/IOps Thin Provisioning
So, I hope this brings to light some things you will have to think about when choosing a LUN size. Also note that this is probably more of a service provider type of scenario as we all know most may use a single 64TB LUN though I am not sure I would recommend that. It all comes down to use-case and how it can be applied. So this also begs to question what’s the point of some of those other features if you leverage Thin Provisioning. Here are some closing thoughts and things I would recommend:
- Consider Peak loads for your design; the maximum IOps you may be looking for in some cases
- Get an average/max per VM datastore ratio (locate your biggest Thin VM)
- Consider tiered storage and how it could be better utilized
- Administration and Management overhead; essentially the larger the LUN the less over all provisioning time and so on.
- VAAI capable array for those Thin benefits (running that reclaim UNMAP script..)
- Benchmark, Test using some other tools on that bigger LUN to ensure stability at higher IOps
- Lastly the storage array benchmarks and overall design/implementation
- The more VM you can scale on a LUN can affect your cluster design; You may not want to enable your customers to scale that much
- Alerting considerations and how you will manage it efficiently to not be counterproductive.
- Consider other things like SDRS (fast provisioning gets ridiculous with Thin Provisioning)
- Storage latency and things like Queues can be a pain point.
I hope this helps some of those out there that have been wondering about some of this stuff. The LUN size for me dramatically affect my cluster design and what I am looking to achieve. You also want to load test your array or at least get some proven specs on the array. I currently work with HDS VSP arrays and these things can handle anything you can throw at them. They are able to add any type of additional capacity you need rather it be Capacity, IOps, Processing or what not you can easily scale it out or up. Please share your thoughts on this as well. Here are some great references:
Note: these numbers are hypothetical but its all in the numbers.
Yea, some would say this post is probably overdue but lately I have sincerely been thinking. Have we been drinking some Kool-Aid around this feature? I couldn’t help but have some concerns around possible implementation of this feature in VCD installments. I in particular, am not sold on it completely. Here are just some quick reasons for me that didn’t exactly sell me.
- It’s a very “new” feature in regards to VCD which is still early in its years as a cloud platform.
- No way of currently updating those linked clones unlike VMware View. (some admin over head as well as using local and shared catalogs)
- Added complexity (with linked images, snap chains, and how you have handle storage motion)
- By Default ALL linked clone images are mis-aligned. (VMware has yet to address this problem) In some cases this could be a compounding factor causing some additional I/O overhead.
- Design has to be highly considered and evaluated with a max of 8 node clusters (This will affect current installments as well)
So yeah, I know I look like the bad guy but I seriously think this release was just a target more to SMB than anything. IMO, this is more like a feature for those of smaller businesses because now they don’t have to go out and spend all that crazy dough on a VAAI capable array (Hooray for them :)) which begs to question….
Why do you need to enable this feature if you already leverage VAAI capable arrays?
It just seems to me that Fast Provisioning is a little pre-mature in its release. Although VCD continues to improve I think this features needs some serious improving before some bigger shops may decide to utilize it. The other down is that we have yet to see any real progress on the UNMAP problem and it’s now treated as a manual task we should run during certain times… or outages I should say. That really blows because we all know what kinds of benefits and problems thin provisioning on some array can cause. For the most part, it’s just really bad reporting… lol.
Here are some other sources I would recommend reading and I seriously think you should read them and learn for yourself if it’s really worth it. Also, be careful not to put the cart before the OX and do your homework. Some people drink the kool-aid and don’t think to question or ask “What’s really under the hood?”. Fast Provisioning should never be compared to VMware View… It’s similar but not identical.. I would definitely recommend reading Nick’s blog it opened my eyes to what he calls the “Fallacies” and of course Chris has a good read.
VMware® vStorage Virtual Machine File System (VMFS) is a high-performance cluster file system that provides storage virtualization optimized for virtual machines. Each virtual machine is encapsulated in a small set of files and VMFS is the default storage system for these files on physical SCSI disks and partitions. This File system enables the use of VMware® cluster features of DRS, High-Availability, and other storage enhancements.
For more information please see the following document here and the following KB here.
There are two ways to upgrade the VMFS to version 5 from previous 3.xx. An important for when upgrading VMFS-5 or provisioning new VMFS-5 is that legacy ESX host will not be able to see the new VMFS partitions. This is because of the enhancements made into ESX and the partitioning. Upgrading VMFS-5 is irreversible and consider always what you are doing. Lastly, there are many ways to provision VMFS-5 these are just two of the more common ways of doing it.
Method 1: Online Upgrade
Although an online upgrade does give you some of the new features in VMFS-5 it does not give you all of them. However, it is the least impacting and can be performed at anytime without an outage. Below are the features you will not gain by doing an in-place upgrade:
- VMFS-5 upgraded from VMFS-3 continues to use the previous file block size which may be larger than the unified 1MB file block size.
- VMFS-5 upgraded from VMFS-3 continues to use 64KB sub-blocks and not new 8K sub-blocks.
- VMFS-5 upgraded from VMFS-3 continues to have a file limit of 30720 rather than new file limit of > 100000 for newly created VMFS-5.
- VMFS-5 upgraded from VMFS-3 continues to use MBR (Master Boot Record) partition type; when the VMFS-5 volume is grown above 2TB, it automatically & seamlessly switches from MBR to GPT (GUID Partition Table) with no impact to the running VMs.
- VMFS-5 upgraded from VMFS-3 continue to have its partition starting on sector 128; newly created VMFS5 partitions will have their partition starting at sector 2048.
RDM – Raw Device Mappings
- There is now support for passthru RDMs to be ~ 60TB in size.
- Non-passthru RDMs are still limited to 2TB – 512 bytes.
- Both upgraded VMFS-5 & newly created VMFS-5 support the larger passthru RDM.
The end result in using the in place upgrade can be the following:
- Performance is not optimal
- non-standards can still be in place
- Disk Alignment will be a consistent issue with older environments
- File limit can be impacting in some cases
Method 1: How to perform an “Online” upgrade for VMFS-5
Upgrading a VMFS-3 to a VMFS-5 file system is a single-click operation. Once you have upgraded the host to VMware ESXi™ 5.0, go to the Configuration tab > Storage view. Select the VMFS-3 datastore, and above the Datastore Details window, an option Upgrade to VMFS-5 will be displayed:
Figure 3. Upgrade to VMFS-5
The upgrade process is online and non-disruptive. Virtual machines can continue to run on the VMFS-3 datastore while it is being upgraded. Upgrading the VMFS file system version is a one-way operation. There is no option to reverse the upgrade once it is executed. Additionally, once a file system has been upgraded, it will no longer be accessible by older ESX/ESXi 4.x hosts, so you need to ensure that all hosts accessing the datastore are running ESXi 5.0. In fact, there are checks built into vSphere which will prevent you from upgrading to VMFS-5 if any of the hosts accessing the datastore are running a version of ESX/ESXi that is older than 5.0.
As with any upgrade, VMware recommends that a backup of your file system is made prior to upgrading your VMFS-3 file system to VMFS-5.
Once the VMFS-5 volume is in place, the size can be extended to 64TB, even if it is a single extent, and ~2TB Virtual Machine Disks (VMDKs) can be created, no matter what the underlying file-block size is. These features are available ‘out of the box’ without any additional configuration steps.
NOTE: Some documentation are excerpts and provided and used from VMware Documentation and Sources..
Method 2: Provisioning New VMFS-5
This method explains how to update VMFS without performing an “online” upgrade. Essentially this would be the normal process of provisioning a VMFS LUN for ESXi 5 or older. Here are the listed benefits of VMFS-5 provisioning without doing an “online” upgrade.
- VMFS-5 has improved scalability and performance.
- VMFS-5 does not use SCSI-2 Reservations, but uses the ATS VAAI primitives.
- VMFS-5 uses GPT (GUID Partition Table) rather than MBR, which allows for pass-through RDM files greater than 2TB.
- Newly created VMFS-5 datastores use a single block size of 1MB.
- VMFS-5 has support for very small files (<1KB) by storing them in the metadata rather than in the file blocks.
- VMFS-5 uses sub-blocks of 8K rather than 64K, which reduces the space used by small files.
- VMFS-5 uses SCSI_READ16 and SCSI_WRITE16 cmds for I/O (VMFS-3 used SCSI_READ10 and SCSI_WRITE10 cmds for I/O).
- Disk Alignment for Guest OS’s become transparent and have less impact.
- Performance I/O and scalability become a greater value to running online vs. new.
As you can see the normal provisioning of VMFS-5 is a lot more robust in features and offers a great deal of improvement to just performing an “Online” upgrade. The online upgrade is easy and seamless but for normal considerations all benefits should be considered. In my case the chosen Method would be Method 2. The only instance in which an “Online” upgrade would be considered under normal circumstances would be if you were already at capacity on an existing array. In this type of scenario it could be viewed as a more beneficial way. Also, if you did not have Storage vMotion licensed through VMware further considerations on how to migrate to the new VMFS would have to be made. Migrating workloads to new VMFS-5 would be a bit more of a challenge in that case as well. However this is not an issue under most circumstances.
Method 2: How To provision new VMFS-5 for ESXi
- Connect to vSphere vCenter with vSphere Client
- Highlight a host and click the “Configuration” tab in the right pane.
- Click on “Storage”
- In the right pane click “Add Storage” (See image)
- Select the LUN you wish to add
- Expand the Name column to record the last four digits (this will be on the naa name) In this case it will be 0039. Click “Next”
- Select to use “VMFS-5” option
- Current Disk Layout – Click “Next”
- Name the datastore using abbreviations for the customers name with the type of storage followed by the LUN LDEV (Yes, a standard). This example would be “Cust-Name”=name “SAN”=type “01”= Datastore Number “LDEV” = 0038. (cus-nam-san-01-1234)
- Select the radio button “Maximum available space” click > Next
- Click Finish and watch for the “task” to complete on the bottom of vSphere client
- After the task completes go to the Home > Inventory > Datastores
- Make sure there is a Fiber Storage folder created. Under that folder create a tenant folder and relocate the datastores in the new tenant name folder.
- After moving the folder you may need to provision this datastore for vCloud. Proceed to the optional method for this below.
Note: Some of the information contained in this blog post is provided by VMware Articles and on their website http://www.vmware.com
Don’t update if its not broke. If it breaks then update it. If you have issues with false alerts you may want to update firmware. I saw this with 1.4j.
The issue is not with the IOM but with the chassis communication bus(i2c bus) and hence the IOM is not getting detected and backplane ports never come up. If you seeing alerts related to PSU and those types of things then you may want to pay attention.
I2C is a bus that provides connectivity between different components in the chassis.
The PCA9541 is an I2C part that helps us control access to the shared devices in a chassis; the chassis serial eeproms, power supplies, and fan modules.
The 9541 I2C mux has known hardware/hang issues that can cause failures to access hardware components on the chassis. This can result in failures to read fan and PSU sensor data (such as fan speed and temp), triggering faults to be raised for the component (such as fan inoperable).
Some early PCA9541s that were used have a bug that if they are switched back and forth between IOM1 access and IOM2 access too quickly, they will get stuck and not allow any connection to the devices behind them.
Required to upgrade firmware version to 1.4(3q) or above.
Workaround to be followed before going for firmware upgrade:
• Reseat all the PSUs one by one in the chassis. Wait for 10min after inserting one unit ,so that it could stabilize.
• Reseat all the Fan Units on the backside of the chassis. Wait for 3min before going for the next one.
• Reseat both the IO modules. Wait for 20min before going for the next one.
• Verifying the i2c counter for the chassis.
• (Requires Down Time)Power cycle to reset all counters to fix issues in the running version.
• (Requires Down Time)Upgrading to firmware version 1.4(3q) or above (2.0 release) for a permanent fix.
Please follow the link to download the 1.4(3q) bundle:
Related Issue with firmware version used:
Incorrect behavior of I2C bus or CMC software interpreting I2C transactions?
- Fans (count 8 or less), PSU (count 4 or less) can be reported as inoperable. State never cleared.
- Fans are running at 100% rotation rate.
- UCSM cannot retrieve the PSU/Fan part detailed information
- Transient errors indicating Fan inoperable, cleared in one minute time interval.
- LED state does not match faults reported in UCSM and actual health of the system.
- Incorrectly reported thermal errors on blades and chassis .
Fixes that are promised for 1.4(3q):
- CSCtl74710 I2C bus access improvements for 9541
PCA9541 (NXP I2C bus multiplexor) workaround to improve bus access for parts built prior of mid 2009. The workaround assures that if internal clock fails to #:start it gets retried. The change designed and works as expected for both PCA9541 and PCA9541A parts from NXP. PCA9541 parts due to the internal clocking bug #:had a high number of bus_lost events.
- CSCtn87821 Minor I2C driver fixes and instrumentation
New Linux I2C driver has optimization to handle I2C controller and slave devices synchronization. With older driver simple synchronization error could appear as uncorrectable device errors.
- CSCtl77244 Transient FAN inoperable transition
During UCS (CMC) firmware upgrade and switching to new master/slave mode CMC erroneously takes information from the slave IOM and evaluates fans as inoperable based on stale data.
- CSCtl43716 9541 device error. Fan Modules reported inoperable, running 100%
Software code routine bug where single bus_lost event followed by successful retry will result in an infinite loop. As result Fans are reported as inoperable and are not controlled by CMC.
Removed an artificial cumulative threshold to enable amber color LED upon reaching 1000 bus_lost events. This was implemented as a monitoring mechanism to simplify identification of the PCA9541 devices. This is no longer needed since a proper software workaround is implemented.
Since this email we have started the update to firmware 2.0. This is a separate blog I am going to write because that too was pretty intense. I will provide some additional steps that we performed to lessen the impact. One thing is for certain don’t expect it to NOT be impacting….
Here are some notes I find useful for anyone wanting to learn more about the latest that HDS has to offer. Please note the date of this post and that something are subject to change.
1.Thin Provisioning > Unmap primitive WILL not be availible until Q1 2012
PER HDS > Reason being is that the UNMAP primitive when issued from VMware ESXi (vSphere 5) causes all the workloads on the Provisioned Storage Pool
to take a performance hit. This is because the UNMAP is issued with a “high priority”.
The KB provides how to disable this feature.. It IS NOT disabled by default. This is a command ran locally on each ESXi 5 host.
2. sVmotions, VMDK deletions on various storage platforms with ANY thin provisioned Arrays will always be showns as space allocated. (The space is never reclaimed) To counter this negative effect UNMAP was introduced however is not production ready and causes problems.
3. Round – Robin PSP – is the HDS best practice for multi-pathing. HOWEVER we cannot do any clustering (MSCS or Oracle RAC) with this plug-in or AKA – Physical Compatible Raw Device Mappings. (VMs accessing physical SAN LUNS vs. VMFS) However HDS has released HDLM which is the HDS branded VMware Multi-Path Plug in and you can do both with this plug-in.
4. Block Size on HDS is 42MB – with vSphere 5 unified block size of 1 MB HDS arrays have to delete more blocks due to the unified block sizing. Stated that proper alignment is important between SAN, VMFS, and VMs.
5. Tier 1 Applications need special consideration depending on the use case – we may need to look at dedicating a particular level of Guaranteed I/O or creating seperate Dynamic Storage pools to meet the workloads. There is HDS documentation for using VSP with vCloud Director however it doesn’t cover application workloads > waiting for these from HDS as we virtualize MANY different workloads in our cloud and it needs to scale easily on demand.
7. vCenter plug in – more to come..
8. I/O and measurements for performance on the SAN side is done through Performance Tuner from HDS. More to come.
9. No firmware update should be needed on the HDS side for VSP array to utilize the two new vSphere 5 VAAI primitives. Not sure about AMS…
10. VASA is coming quickly and this will give customers the visibility and deeper reporting for other things especially for VMware environments using HDS storage.
11. HDS recommend doing VMware Guest OS’es as Thick and doing San array thin. However there is not a huge gain in performance between the two. Our direction is to use Thin on Thin provisioning for reporting purposes, and etc..
For more HDS white papers and such you get to those here under a resources tab:
Hitchi Data Systems VMware resources
When you look at cloud today in context of VMware what is your biggest concern? For some of us it may be networking, others storage, and maybe even focused in a more broader perspective like; availability, scalability, and BU/DR. Since I have been working with vCloud director day in and day out I have been asking some deeper technical questions centered more around scalability of storage and other related components to the overall design. I have been challenged in various ways because of this technology. Prior to vCloud it was vSphere and a lot of how you implemented and managed vSphere was much less complex. Cloud brings another level of complexity – especially if your initial “design and management” is poor to begin with. Usually you end spending more time and money going back addressing issues related to simple best practices that most Architects and Engineers should already know. In some cases it’s a disconnect between that design and infrastructure team and the help desk. This may not always be the case but in my experience it seems to happen more often than naught.
I am sure we could all spend plenty of time talking about operations, procedures, protocols, standards, and blah blah…. but this isn’t the point of this blog…. Even though these things are of the highest importance and the more effort that is put into this the better the results you will get and the less cost you will end up spending. Anyways…
So, as I was saying vCloud has challenged me in several ways. Now not only do I have to consider the design of vSphere, but I also have to look at the design of vCloud director and how we manage all these different components. Even though you simply add vCloud director still doesn’t mean that is in the end of it all. More complexity comes with integration of other applications, Application availability, and Backup and DR. I have been amazed at how many things I see as an oversight due to the lack of expertise in this area. This is no offense to anyone but really VMware is still in its infancy when running against other markets. Though I strongly sense that VMware is going to be majority market share for a while.
Crossing the gaps:
Since I have been studying and learning day and day out covering VMware best practices and other companies best practices (not VMware) I continue to see a lot of disconnects in certain areas (vCloud Director). Storage guys have no idea or clue about running virtualized workloads on Arrays and often times they care not to even want to learn about VMware. Usually they already have plenty to do but this disconnect on some level will affect the implementation. I honestly say that in most cases the Architect should be the one researching and ensuring that all the components which make up the cloud computing stack should be standardize and implemented correctly, even so these gaps still cause setbacks. Which now leads me into the networking side of things. Networking engineers I see are beginning to come up to speed more quickly on virtualization. The main factor of this is because of Cisco UCS and how it appeals to those network administrators and engineers, and add to that FCoE/CNA’s. However, the disconnect once again lies in that knowledge transfer of the virtual platform of how it works and best practices designed around VMware. I first one to say that many don’t really get the choice especially if a company just threw you into the fire. It’s like right now we are looking at giving our network team the keys to the kingdom (CISCO UCS) but yet they have nearly ZERO understanding and training of how any of it works…. scary right? We have to cross these gaps people we need to make sure that we have people positioned in areas who can understand and impart that training or have someone available as a resource.
My Real Concerns:
vCloud director is something totally new and alien to me when I first stepped in the cloud. I had to learn and quickly. Having my background I quickly go to the manuals, read the blogs, get plugged into good sources, learn even more, read books, and I start auditing. I start looking at designs that may be questionable and start asking the questions of “Is it ignorance” or “What the … was he thinking?” and quickly find that usually it was the latter.. simply ignorance. No one really is to blame because we have to understand YES, it is a NEW technology – BUT how much more critical is it that we research and ensure that we are implementing a design that is “rock” solid before rolling it out… Yes, I know deadlines are deadlines but it is what it is either way. You either spend a lot more money in the long run or spend a little bit more to get it right the first time. We are now having to go back and perform a second phase and for the past couple of months we have been remediating a lot of different things that could’ve been done right had a simple template been designed correctly. We now spend countless additional hours updating and working more issues because of this one simple thing. This isn’t even getting into the storage and other concerns I have.
Cloud and What’s Scary?:
Yeah, I know right scary? I don’t know about yours but some of the ones I have seen are. Here is what scares the heck out of me. ABC customer decides deploy a truck load of Oracle, MSSQL, IIS, Weblogic, and etc Virtual machines all on the fly. Next thing we know we see some latency on the storage back-end and see some impact to performance. Come to find out a bunch of cloning operations are kicking off… I/O is spiking, the VM’s are writing many types of Iops and in a matter of about 12 hours we are having some major issues. This is called “Scalability” or sometimes “Elasticity” whatever you want to call it. Some catalogs host every kind of application and majority of the apps are all tier 1 virtualized workloads. This isn’t the little stuff most corporations virtualize. They usually put this stuff off for later because the need of having a high performance server and old traditional thinking still tells them to not do it (Playing it safe). Scaling a cloud to accommodate tier 1 workloads is going to be something I think we are going to be seeing a lot more. In fact, most vendors provide documentation of implementing solutions on VMware Cloud Director – but they almost NEVER cover the application workloads. I am speaking to Storage, Networking, and Server Hardware. This is probably because in most cases due to the mixed nature you can have in an environment you should do THOROUGH testing to ensure that you can scale out and run an optimal amount of workloads… some would call it vBlock..
Anyways I didn’t mean to write a blog this long but I have just had a lot on my mind lately and I will continue to write more as I continue my VMware Cloud journey.
Well so I have been at it again. Attempting to learn enough stuff about CISCO UCS to better understand what it can do. I already know there is a lot of potential and that we probably don’t utilize it to its capacity.
The other day a colleague and I were talking about slowness in general in cloud environments and he mentioned how we could improve performance for all the VMs from E1000 to the VMXNET3. Now I am fully aware of all the benefits and features of the VMXNET3 but I have to say; I was very reluctant to buy into the EVERY VM now gets a 10GB link – In my opinion, that terrifies me at first though. What if a VM all of sudden decided to GO NUTS and completely saturate the link? That would impact other VMs, would it not? At first yes, that could happen on a “RARE” occasion but you obviously have to understand your design and how Cisco UCS works.
Now onto the other observations and misconceptions I had about the VMXNET3. I have to say from what I have researched and gathered it does seem that most articles point to an increase in overall performance. Others reported that Host to Host communications was greatly increased even more than the percentages seen in outbound traffic. One blog post stated nearly a %300 percent increase! > that’s very impressive. So now I can confidently say if you are using CISCO UCS you should definitely consider using VMXNET3 driver. (NOTE: You cannot use FT with VMXNET3)
So how exactly does all this tie into my CISCO UCS post?
In short it’s this link here.
“The revolutionary Cisco® UCS M81KR Virtual Interface Card (VIC) helps increase application performance and consolidation ratios, with 38 percent greater network throughput, complementing the latest increases in Cisco Unified Computing System™ CPU performance and memory capacity. The virtual interface card and the Cisco Unified Computing System together set a new standard for balanced performance and efficiency.”
Now the VIC Card seems pretty cool, but what I thought was a little disappointing is that most companies will only really use something like this for a particular “Use Case” and It’s also curious because they don’t get into other things like upstream traffic and how it would affect host to host communication. The other disappointing factor was they tested this using RHEL which I can understand and it wasn’t really a real world test. What they only wanted to prove was that by offloading network traffic to UCS you get better performance. Now, this doesn’t mean I still wouldn’t want to know what it is capable of. Even so they showed just how having the interface card and VMXNET3 how much further traffic was improved.
Now Down to the nitty gritty:
1) Limitation on total overall Network Interfaces for VM’s
a) 1/2 height can only have 1 VIC = 128 Virtual Interfaces
b) Full Height can only have a maximum of 2 VICs = 128-256 Virtual Interfaces
2) Doesn’t really benchmark windows – that really does matter in the scheme of things considering MOST environments RUN windows.
3) Doesn’t really go into detail on how you would bind these NICS between UCS and vSphere Hypervisor. Only allocating a MAC in UCS and then using VMDirect Path for the NIC. (this is probably more simple then I think)
4) They don’t cover host to host but they do cover Chassis to Chassis which is great to see that kind of performance – but come on show us host to host!!!
5) Scenario 3 isn’t real clear on the VM ethernet interface used – it says “Default enic” so my guess is they couldn’t use anything else but a VMXNET3 – not sure why it says that.
6) Statistics for how CPU performance was affected per scenario
7) Does this mean there is no needs for 1000kv switching since you can use the “VIC” to set up your interface within UCS itself? (This would be my biggest reasoning > hand off to Net Eng = WIN!)
8) Lastly, VMware vCloud Director uses templates and is automated..how could you creatively design this to work with an automated cloud solution? (I mean heck I would love the performance; Only thing I can think is VCO plug-in for UCS and Tie it into VCO/VCD plug-in, Maybe? Why I say “USE-CASE”)
Obviously this is a lot of information but I would honestly like to test this in my own environment and see how well it does perform. Our cloud platform offers everything from weblogic, oracle, SQL, and more. Anyways let me know your thoughts and any other information would be greatly appreciated! Yes, I know I am a Noob 🙂 .
I got an email from EMC at my personal account for them broadcasting a live or on demand event. Of them breaking a record? I just now caught it and it looks pretty exciting. Cannot wait to see what the record is..
check it out here.
Things that excited me was the offerings to SMB and the value sounds pretty promising. I will be looking forward to the future of EMC.
Things to note:
EMC previous records were talked about from far back as 1994:
– 1997 – Introduced Multi – Terabyte storage
– 2002 – CAS system was introduced – tight security + data protection.
– 2005 – First Petabyte storage introduced – (1,000 Terabytes, 1,000,000 Gigabytes, or 1,000,000,000 Megabyets)
– 2009 – FAST storage introduced – Tiered Storage – Changing the management of storage.
– 1997-2010 Number 1 Storage Market Share for Over Thriteen years.
– + 3-5% IT spending Growth – No surprise really.
– IT spending is really high on IT Cost.
– Storage Data will be going to over the “Zetabytes”
– Storage Vendors currently have over a Zetabyte of data.
– Though most things were down to recession information was up over 60%.
– Talked about VNX-e and how it works with Unisphere
– VNX-e Is lower then most competitors in SMB Market- those include Dell, NetAPP, IBM, and etc.
– They will be offering several different families of the VNX series storage – All featuring the next generation of stroage architecture:
Intel Westmere Procs
6 Gbp SAS Back-end
Expanded Ultrafelx I/O – Several different protocols.
File, Block, and Object
Flash Suite Drives Flash Adoption
VNX is 3 times more efficient – and continues to boast the best efficiency on the market
Record breaking performance on Oracle, SQL, and VMware.
– Shipped out the most Flash Storage – > 10 Petabytes
Oh yeah don’t forget the world record of 26 people fitting inside a Mini Cooper.
Feel free to comment.
***Disclaimer: The thoughts and views expressed on VirtualNoob.wordpress.org and Chad King in no way reflect the views or thoughts of his employer or any other views of a company. These are his personal opinions which are formed on his own. Also, products improve over time and some things maybe out of date. Please feel free to contact us and request an update and we will be happy to assist. Thanks!~
So me and a colleague had some conversation about storage and we are doing in our environments at this moment in time. He mentioned that they do use XIV and mentioned some of the things it could do when it comes to provisioning, load balancing, and things of that nature. Not to mention the ability for adding capacity without downtime and so forth.
I am ofcourse kind of a NetAPP fan but in the end I always want to be unbiased as much as possible as I dont really work for any vendor out there. In fact, we only started using NetAPP recently and in our current VDI/DDV environment use NFS in particular. In our server virtualization environment we use it for FC and ofcourse run ESX on the bl460c blades – thats HP ofcourse.
After this conversation I decided to do some digging on my own and as always I came across some old blog post and decided if anyone else was interested in learning about IBM XIV they too can take a look. Ofcourse don’t try to pass it off as something that is a hot topic as it seems the dust may of settled from what I could read. If you also have any good reading links that you would like me to post leave them in the comments and I will certainly add them. As alway Thanks!
That’s right you heard me say it alright. CISCO UCS or in twitter terms #CiscoUCS #Cloud. Tonight I got my first stab at actually researching, and read up on Cisco UCS, and I have to say, its does sound promising. Right now, though, I haven’t given much thought to the cost of such a system. Lately we see a lot of different offerings when it comes to hardware platforms to run a virtual shop on, and up til recent I haven’t even read about or seen a Cisco Server in a while. In fact, the last time I saw a Cisco server was when call manager was running on Windows 2000 SP4 (HP MCS Hardware) back on version 5.5. I guess I am beginning to get old…
Enough Said… let’s move on… Nothing to see here..
The first reading I did on Cisco UCS was today on Ciscos site: http://bit.ly/grL4EY
Joe wrote on inter-fabric communication on the Cisco blade servers. It peaked my interest seeing how UCS is uniquely designed to handle communication.
You can run the fabric interconnects in two seperate modes: End-Host Mode and Host Mode (EHM and HM). Most users typically choose EHM for simplicity. It took me a while to get it all to sink in, but I think I finally got it in a nutshell. The big point is that you can have 10GbE, and if you need to manage traffic more effectively, at the host level, you can utilize vSphere switching such as: vSS, dVS, and Cisco Nexus 1000V. Essentially, the Cisco 1000V is what you can use to make it even more managable. It also seems like it is definitely more geared to the clouddue to the so-called simplicity. You still will have to utilize 10gbe networks which can still cost a pretty penny. I am just glad it is finally beginning to make sense… at least right now..
Props to Joe who did a good job and I think he even knows a thing or two about vmware. 😉
Thanks to ADAM Hash tag corrected!
***Disclaimer: The thoughts and views expressed on VirtualNoob.wordpress.org and Chad King in no way reflect the views or thoughts of his employer or any other views of a company. These are his personal opinions which are formed on his own. Also, products improve over time and some things maybe out of date. Please feel free to contact us and request an update and we will be happy to assist. Thanks!~