Pushing Virtual Limits


The upgrade
April 3, 2008, 2:11 am
Filed under: Documentation, VM

/rant

26 Hosts all told – Five took the new VC agent on the first try. 5 out of 26 is NOT a successful upgrade. Manual intervention should NOT be required on this scale for an enterprise product on a MINOR POINT UPGRADE!

/rant

Feedback on what went wrong and what went well.

Utilizing the advanced install method to perform the upgrade without any physical media worked very well both on the VC and on the ESX hosts we upgraded.

SAN storage continued to work flawlessly with no issues there.

The new client was exceptionally easy for users to download and install (although did they really need to tweak the login screen?)

Things that went wrong and what fixed them:

Getting the host agent installed was resolved through a variety of methods –

For those boxes going to ESX3.5 we simply performed the upgrade and then reconnected them. That worked well although we had hoped to get the VC to 2.5, update the agents, validate everything and THEN perform the upgrade. I am a firm believer in crossing one bridge at a time. In this case we took a risk and it paid off.

Restarting the mgmt agent did not help. Rebooting the host DID help in one case. Trying the agent install about a dozen times worked on two machines. We tried a number of things in parallel and because of our inability to reboot 10+ hosts or shut down any of their vms we were sorely limited in our options.

We only had to manually install the agent on two or three boxes. This would have been easier but the agent install was build specific and the hosts had a vary different selection of patches on them. We could have prepped for this better and had the correct manual VC agent on the host ready to go. Next time I do this I will definitely do that.

This goes to show that regardless of how much testing you do in a dev environment it is nearly impossible to simulate accurately your production environment.

We had a three hour outage window that we exceeded by only about 45 minutes.  One oddity we didn’t notice until the next morning was that one hast had its networking configuration adjusted. I am not convinced that was part of the upgrade but prior to this upgrade this worked perfectly and now I have a plethora of problems with this machines networking configuration.

Advertisements

Leave a Comment so far
Leave a comment



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s



%d bloggers like this: