Cisco UCS Journey – When to update firmware
Don’t update if its not broke. If it breaks then update it. If you have issues with false alerts you may want to update firmware. I saw this with 1.4j.
The issue is not with the IOM but with the chassis communication bus(i2c bus) and hence the IOM is not getting detected and backplane ports never come up. If you seeing alerts related to PSU and those types of things then you may want to pay attention.
I2C is a bus that provides connectivity between different components in the chassis.
The PCA9541 is an I2C part that helps us control access to the shared devices in a chassis; the chassis serial eeproms, power supplies, and fan modules.
The 9541 I2C mux has known hardware/hang issues that can cause failures to access hardware components on the chassis. This can result in failures to read fan and PSU sensor data (such as fan speed and temp), triggering faults to be raised for the component (such as fan inoperable).
Some early PCA9541s that were used have a bug that if they are switched back and forth between IOM1 access and IOM2 access too quickly, they will get stuck and not allow any connection to the devices behind them.
Required to upgrade firmware version to 1.4(3q) or above.
Workaround to be followed before going for firmware upgrade:
• Reseat all the PSUs one by one in the chassis. Wait for 10min after inserting one unit ,so that it could stabilize.
• Reseat all the Fan Units on the backside of the chassis. Wait for 3min before going for the next one.
• Reseat both the IO modules. Wait for 20min before going for the next one.
• Verifying the i2c counter for the chassis.
• (Requires Down Time)Power cycle to reset all counters to fix issues in the running version.
• (Requires Down Time)Upgrading to firmware version 1.4(3q) or above (2.0 release) for a permanent fix.
Please follow the link to download the 1.4(3q) bundle:
Related Issue with firmware version used:
Incorrect behavior of I2C bus or CMC software interpreting I2C transactions?
- Fans (count 8 or less), PSU (count 4 or less) can be reported as inoperable. State never cleared.
- Fans are running at 100% rotation rate.
- UCSM cannot retrieve the PSU/Fan part detailed information
- Transient errors indicating Fan inoperable, cleared in one minute time interval.
- LED state does not match faults reported in UCSM and actual health of the system.
- Incorrectly reported thermal errors on blades and chassis .
Fixes that are promised for 1.4(3q):
- CSCtl74710 I2C bus access improvements for 9541
PCA9541 (NXP I2C bus multiplexor) workaround to improve bus access for parts built prior of mid 2009. The workaround assures that if internal clock fails to #:start it gets retried. The change designed and works as expected for both PCA9541 and PCA9541A parts from NXP. PCA9541 parts due to the internal clocking bug #:had a high number of bus_lost events.
- CSCtn87821 Minor I2C driver fixes and instrumentation
New Linux I2C driver has optimization to handle I2C controller and slave devices synchronization. With older driver simple synchronization error could appear as uncorrectable device errors.
- CSCtl77244 Transient FAN inoperable transition
During UCS (CMC) firmware upgrade and switching to new master/slave mode CMC erroneously takes information from the slave IOM and evaluates fans as inoperable based on stale data.
- CSCtl43716 9541 device error. Fan Modules reported inoperable, running 100%
Software code routine bug where single bus_lost event followed by successful retry will result in an infinite loop. As result Fans are reported as inoperable and are not controlled by CMC.
Removed an artificial cumulative threshold to enable amber color LED upon reaching 1000 bus_lost events. This was implemented as a monitoring mechanism to simplify identification of the PCA9541 devices. This is no longer needed since a proper software workaround is implemented.
Since this email we have started the update to firmware 2.0. This is a separate blog I am going to write because that too was pretty intense. I will provide some additional steps that we performed to lessen the impact. One thing is for certain don’t expect it to NOT be impacting….
***Disclaimer: The thoughts and views expressed on VirtualNoob.wordpress.com and Chad King in no way reflect the views or thoughts of his employer or any other views of a company. These are his personal opinions which are formed on his own. Also, products improve over time and some things maybe out of date. Please feel free to contact us and request an update and we will be happy to assist. Thanks!~