Blog Archives

Cisco UCS Journey – When to update firmware

Don’t update if its not broke. If it breaks then update it. If you have issues with false alerts you may want to update firmware. I saw this with 1.4j.

The issue is not with the IOM but with the chassis communication bus(i2c bus) and hence the IOM is not getting detected and backplane ports never come up. If you seeing alerts related to PSU and those types of things then you may want to pay attention.

I2C is a bus that provides connectivity between different components in the chassis.

The PCA9541 is an I2C part that helps us control access to the shared devices in a chassis; the chassis serial eeproms, power supplies, and fan modules.

The 9541 I2C mux has known hardware/hang issues that can cause failures to access hardware components on the chassis. This can result in failures to read fan and PSU sensor data (such as fan speed and temp), triggering faults to be raised for the component (such as fan inoperable).

Some early PCA9541s that were used have a bug that if they are switched back and forth between IOM1 access and IOM2 access too quickly, they will get stuck and not allow any connection to the devices behind them.

Action Required:

Required to upgrade firmware version to 1.4(3q) or above.

Workaround to be followed before going for firmware upgrade:

• Reseat all the PSUs one by one in the chassis. Wait for 10min after inserting one unit ,so that it could stabilize.

• Reseat all the Fan Units on the backside of the chassis. Wait for 3min before going for the next one.

• Reseat both the IO modules. Wait for 20min before going for the next one.

• Verifying the i2c counter for the chassis.

• (Requires Down Time)Power cycle to reset all counters to fix issues in the running version.

• (Requires Down Time)Upgrading to firmware version 1.4(3q) or above (2.0 release) for a permanent fix.

Please follow the link to download the 1.4(3q) bundle:

Related Issue with firmware version used:

Incorrect behavior of I2C bus or CMC software interpreting I2C transactions?

  1. Fans (count 8 or less), PSU (count 4 or less) can be reported as inoperable. State never cleared.
  2. Fans are running at 100% rotation rate.
  3. UCSM cannot retrieve the PSU/Fan part detailed information
  4. Transient errors indicating Fan inoperable, cleared in one minute time interval.
  5. LED state does not match faults reported in UCSM and actual health of the system.
  6. Incorrectly reported thermal errors on blades and chassis .

Fixes that are promised for 1.4(3q):

  1. CSCtl74710 I2C bus access improvements for 9541

PCA9541 (NXP I2C bus multiplexor) workaround to improve bus access for parts built prior of mid 2009. The workaround assures that if internal clock fails to #:start it gets retried. The change designed and works as expected for both PCA9541 and PCA9541A parts from NXP. PCA9541 parts due to the internal clocking bug #:had a high number of bus_lost events.

  1. CSCtn87821 Minor I2C driver fixes and instrumentation

New Linux I2C driver has optimization to handle I2C controller and slave devices synchronization. With older driver simple synchronization error could appear as uncorrectable device errors.

  1. CSCtl77244 Transient FAN inoperable transition

During UCS (CMC) firmware upgrade and switching to new master/slave mode CMC erroneously takes information from the slave IOM and evaluates fans as inoperable based on stale data.

  1. CSCtl43716 9541 device error. Fan Modules reported inoperable, running 100%

Software code routine bug where single bus_lost event followed by successful retry will result in an infinite loop. As result Fans are reported as inoperable and are not controlled by CMC.

  1. ??

Removed an artificial cumulative threshold to enable amber color LED upon reaching 1000 bus_lost events. This was implemented as a monitoring mechanism to simplify identification of the PCA9541 devices. This is no longer needed since a proper software workaround is implemented.

Since this email we have started the update to firmware 2.0. This is a separate blog I am going to write because that too was pretty intense. I will provide some additional steps that we performed to lessen the impact. One thing is for certain don’t expect it to NOT be impacting….

***Disclaimer: The thoughts and views expressed on  and Chad King in no way reflect the views or thoughts of his employer or any other views of a company. These are his personal opinions which are formed on his own. Also, products improve over time and some things maybe out of date. Please feel free to contact us and request an update and we will be happy to assist. Thanks!~


CISCO UCS – Benefits of VMXNET3 Driver

Well so I have been at it again.  Attempting to learn enough stuff about CISCO UCS to better understand what it can do.  I already know there is a lot of potential and that we probably don’t utilize it to its capacity.

The other day a colleague and I were talking about slowness in general in cloud environments and he mentioned how we could improve performance for all the VMs from E1000 to the VMXNET3.  Now I am fully aware of all the benefits and features of the VMXNET3 but I have to say; I was very reluctant to buy into the EVERY VM now gets a 10GB link – In my opinion, that terrifies me at first though. What if a VM all of sudden decided to GO NUTS and completely saturate the link? That would impact other VMs, would it not? At first yes, that could happen on a “RARE” occasion but you obviously have to understand your design and how Cisco UCS works.

Now onto the other observations and misconceptions I had about the VMXNET3.  I have to say from what I have researched and gathered it does seem that most articles point to an increase in overall performance.  Others reported that Host to Host communications was greatly increased even more than the percentages seen in outbound traffic.  One blog post stated nearly a %300 percent increase! > that’s very impressive. So now I can confidently say if you are using CISCO UCS you should definitely consider using VMXNET3 driver. (NOTE: You cannot use FT with VMXNET3)

So how exactly does all this tie into my CISCO UCS post?
In short it’s this link here.

“The revolutionary Cisco® UCS M81KR Virtual Interface Card (VIC) helps increase application performance and consolidation ratios, with 38 percent greater network throughput, complementing the latest increases in Cisco Unified Computing System CPU performance and memory capacity. The virtual interface card and the Cisco Unified Computing System together set a new standard for balanced performance and efficiency.”

Now the VIC Card seems pretty cool, but what I thought was a little disappointing is that most companies will only really use something like this for a particular “Use Case” and It’s also curious because they don’t get into other things like upstream traffic and how it would affect host to host communication.  The other disappointing factor was they tested this using RHEL which I can understand and it wasn’t really a real world test.  What they only wanted to prove was that by offloading network traffic to UCS you get better performance.  Now, this doesn’t mean I still wouldn’t want to know what it is capable of.  Even so they showed just how having the interface card and VMXNET3 how much further traffic was improved.

Now Down to the nitty gritty:

1)      Limitation on total overall Network Interfaces for VM’s

a)      1/2 height can only have 1 VIC = 128 Virtual Interfaces

b)      Full Height can only have a maximum of 2 VICs = 128-256 Virtual Interfaces

2)      Doesn’t really benchmark windows – that really does matter in the scheme of things considering MOST environments RUN windows.

3)      Doesn’t really go into detail on how you would bind these NICS between UCS and vSphere Hypervisor. Only allocating a MAC in UCS and then using VMDirect Path for the NIC. (this is probably more simple then I think)

4)      They don’t cover host to host but they do cover Chassis to Chassis which is great to see that kind of performance – but come on show us host to host!!!

5)      Scenario 3 isn’t real clear on the VM ethernet interface used – it says “Default enic” so my guess is they couldn’t use anything else but a VMXNET3 – not sure why it says that.

6)      Statistics for how CPU performance was affected per scenario

7)      Does this mean there is no needs for 1000kv switching since you can use the “VIC” to set up your interface within UCS itself? (This would be my biggest reasoning > hand off to Net Eng = WIN!)

8)      Lastly, VMware vCloud Director uses templates and is could you creatively design this to work with an automated cloud solution? (I mean heck I would love the performance; Only thing I can think is VCO plug-in for UCS and Tie it into VCO/VCD plug-in, Maybe? Why I say “USE-CASE”)

Obviously this is a lot of information but I would honestly like to test this in my own environment and see how well it does perform.  Our cloud platform offers everything from weblogic, oracle, SQL, and more. Anyways let me know your thoughts and any other information would be greatly appreciated! Yes, I know I am a Noob 🙂 .

***Disclaimer: The thoughts and views expressed on and Chad King in no way reflect the views or thoughts of his employer or any other views of a company. These are his personal opinions which are formed on his own. Also, products improve over time and some things maybe out of date. Please feel free to contact us and request an update and we will be happy to assist. Thanks!~

%d bloggers like this: