Pushing Virtual Limits


Mea culpa – An excuse for my abscence
September 21, 2008, 3:48 pm
Filed under: vmware | Tags: , , , ,

So I am sure most of you have seen or looked at – or maybe played with – EMC Documentum CenterStage?

I have been spending a LOT of time with it. Hopefully I will be able to go live with some of the neat things we have made happen even with the beta. When this client gets released to the Documentum public I can’t wait to hear the reaction.

Beyond that I have added on another 4 servers to my ESX deployment and then started working with some fresh VM stuff to see how that is going to impact how we work.
Favorite new feature in VC 2.5u2 – Hot cloning – We implemented the update and our users found the feature before we even told them about it! Now that is reading the market correctly.

Beyond the normal craziness I officially start a new role within EMC on Monday. The up side – I still get to keep my virtual infrastructure. It should be lots of great new stuff and I will be working with a group that handles ALL of EMC software. This should give me the opportunity to talk with some really smart folks about how they would work on some of the niggling VMware issues I have found.

My plan for Monday is to get some time with our #1 networker engineer and see if he can enlighten me on all the things I have been missing in my very limited NW deployment. (1 VM out of 555+)

Vkernel has pushed out a number of great updates to their Capacity Analyzer 2 Beta – I am getting some real value add from having all my virtual centers on one screen.

On a side note I hope everyone read about VMworld and ESX4 – All I can say is I am excited by what I saw publicly discussed. Powerpath in ESX – Drool



Managing a Growing VMware deployment in a Software Development and Testing Environment
July 20, 2008, 3:29 am
Filed under: vmware | Tags: , , , ,

Big title, Big problem.

I think anyone here gets the basics and probably has a VCP or some other certificatation to prove they know what they are doing. On a technical level I have my fair share of challenges right now (SQL2k to 2k5 upgrade of the VC, an ESX Host that has decided two of its NICs are dead, and a variety of client issues) but those are pretty straight forward. VMware support has been challenging lately(see my earlier post on how they told me SQL2k was no longer a support DB backend with VC2.5 and told us we HAD to upgrade) but with the forum and other people out there posting I don’t think anyone ever hits a “unique” technical problem.

What we all hit that is unique is our management structure, our IT structure, and the ever changing requirements of the security teams. These not technical obstacles have always proven to be the limiting factor in my deployment and I doubt I am alone.

Let me set the stage a bit for the discussion that will follow. Right now our environment is working on at least three major new products and providing sustaining and support for at least twelve others. Our average machine profile – 1 cpu, less than 2gb of ram (1gb avg), under 60GB total hard disk space. The problem is we add or remove a dozen or more a day and have 100+ users with Virtual Machine Administrator rights.

The tricky problem comes in when you take a look at 26 hosts spread across more than five business units. Now that we are fully utilizing DRS the “I bought this host and therefore it is all mine” mentality becomes challenging. If one team has excess capacity shouldn’t they be part of the solution rather than hoarding an easily reclaimed resource?

At EMC we have this concept of “One|EMC” to try and bring all the acquisitions together. There are good things and bad things with this policy but I think this is an opportunity to do a real good. In this effort my management team has been very supportive with “lending” our excess capacity to other teams.

My BU owns the hardware, licensed the software, and pays for all upgrades and maintenence. There are a ton of costs associated with this effort and we have no intention of “charging” for utilizing idle assets (exactly what VMware excels at). What I do need to do is provide “cost visibility” to my management and the business units we work with. In order to do this we have purchased and are implementing VKernels “Chargeback Appliance.” The plan will be to provide scheduled reporting based on the following levels:

Deployment Total

Business Unit

Project teams within each BU

(Other reports as necessary)

The great thing is that these reports will be ready at anytime and I can give a login right to my management structure so they don’t have to ask me to generate reports for them. We will also be going one step further to show just how much we save by buying big iron – we will create a cost in VMware vs a physical system cost. VKernel has provided a great baseline for costing out the big numbers as well as all those little things that I just assume will be there (like electricity). Metrics matter and here they matter more than at most places. We know we have had a great thing for the past few years but now I finally have the tools to collect the metrics to show the big guys exactly how much money we are saving.



VMware Management Suite Showdown – vCharterPro vs V-Kernel Capacity Bottleneck Analyzer

This review is long overdue. A quick trip around the country to meet with customers and setup a new Virtual Infrastructure Lab have put me behind schedule. Mea Culpa

First, I want to comment about the support I received from both VizionCore and V-Kernel. Glen P from Vizioncore kept them in the running a lot longer than anyone else could have. Kudos to him and that team for trying to work through all the “glitches” we ran into. The whole team at V-Kernel was also very helpful and successful in diagnosing and resolving the defects I hit.

That being said I hit major defects with both of these products. Both teams released patches or provided workarounds in short order but the ability of V-Kernel to quickly adapt and address new problems was definitely a positive mark for them.

In the end we never got vCharterPro working 100% due to some data collection issues. After two months of working with support and their team I had to compare their “sort of” product against one from V-Kernel that was now doing everything it promised it would do.

When comparing features I found that my deployment was not par for the course. Anyone who has read my previous posts understands that my mantra is that “Hosts Don’t Matter!” All my vms live in the Virtual Machines and Templates views in a heavily nested folder structure. Running reports against business units is a breeze if the application is aware of this folder structure.

vCharterPro had no awareness of the folder structure. It assumed a Host or Cluster based reporting model (kind of silly when you have a huge Cluster running DRS and two or three departments sharing it)

V-Kernel has a native folder level awareness. They let me easily create groups for analysis based on the nested folder structure or via the traditional host/cluster method. This flexible group creation was ultimately the winning feature. Having lots of data in bad groups is useless but if we can get the data into meaningful reports or views it adds immediate value.

Where vCharterPro clearly excelled was in the looks department. Both products are bringing back roughly the same data (vCharterPro has a fixation on disk I/O vs actual space used) but vCharterPro presents it in a very pleasing fashion. Utilizing their parent companies framework they provide a seriously customizable interface to really tweak the dashboard view to be exactly what you want.

V-Kernel CBA has clean looks but it is nothing to get excited about.

As I am sure it is now apparent that I went with V-Kernel’s product suite. We chose it because it had intellegent grouping that worked with my environment as is rather than requiring me to reorganize everything from scratch. It collected all the data I needed accurately and efficiently (vCharterPro wanted a 4cpu box with 2GB+ ram versus CBA which is 1 CPU with 1 GB ram). I really appreciate that V-Kernel is going with a ready made appliance that is easy to deploy and just as easy to upgrade.

In the near future I will start sharing some of the reports that I am running and the value add I get out of them. I am very interested to see what other people see from this data. We will also be implementing V-Kernel’s ChargeBack product within the next few weeks (pending the next release). At that point I will share some pictures of that.

Also – Check out Rob’s blog over at the V-Kernel main site to get an interesting take on a variety of challenges facing the virtualization industry. I promise it is worth at least a quick perusal.



VMware on a CX3-40
June 11, 2008, 5:31 pm
Filed under: Documentation, vmware | Tags: , , , ,

I have the good fortune to run VMware on a single CX3-40. Right now I have approximately 30TB of usable disk space. Lots of space is great but with the frequent snapshot usage and the constant resizing of disks in a development/testing/replication lab I chose to go with smaller LUNs.

How small?

400GB per lun. 20TB of allocated storage / 400 GB LUNs = ~50 LUNs!

I am going to continue with the 400GB LUNs even as I expand out to two additional CX boxes (Probably CX4-40c’s) and add another 20TB of storage in two more locations. My concern is that my naming convention is sub optimal.

SAN_1 through SAN_30 vs my local storage naming convention LOCAL_(first letter of machine name)_(Volume) <Local_Q_1>

I think I will begin naming the child locations SAN_L_1, SAN_T_1, etc. Using the letter of the site in the name of the LUN keeps the friendly names presented in the datastores view clear. I hate nothing more than when I go and see someones infrastructure and they have local(1)-local(26). The datastores view has serious value if you utilize it correctly.

This post begins to detail one alternative plan to my design – VMEtc



Case in point why clustering (*) is not the best way to deploy eRoom (especially when it is virtualized)
June 9, 2008, 6:45 pm
Filed under: Documentation, vmware | Tags: , , , ,

Let’s assume the following default configuration as our existing infrastructure –

3 MSFT 2k3 clusters + SQL backend clusters. For this discussion the SQL back end just needs to be available and is a whole different scaling discussion.

This gives us SIX multi cpu boxes ( 2 – 2cpu, and 4 x 4cpu) where only half of them are ever doing anything. Compound that with the fact that these boxes have huge amounts of RAM (in excess of 10GB) and yet the utility of having that level of ram has been called into question. Microsoft lists the limits here.

If we assume that MSFT Clustering Services are perfect (*cough*) and simple to configure we still have to deal with the limiting factors of eRoom. eRoom is configured to fail over as a result of ONE instance – Deadlock Detection. Now if someone unplugs the other box the system should fail over but eRoom will only ever instigate the fail over when it see that one specific problem.

If IIS dies will eRoom failover? No

If erScheduler dies will we failover? No

etc etc etc

When we go to an eRoom advanced configuration using multiple web servers we take those 3 passive nodes and make them active. This gives us 6 active web servers to share the load. Using the default provisioning that means that if one of those nodes catches on fire we should expect only ~16.5% of services to be interrupted compared with 33% in the cluster configuration. In theory the cluster should fail over and minimize the outage without intervention – it works pretty well.

BUT

If you are willing to take a slightly more manual approach you can reprovision the Facilities hosted on that server to any of the other servers on the fly. Nothing is stored on the application server so losing one merely causes that server to go down without directly impacting the other systems. Reprovisioning can be done without any downtime or impact to the other facilities.

Now to explain the asterisk. If you are going to run eRoom in a cluster here is how you should do it.

Take our original 6 boxes and install ESX on one of them. On that host create B nodes for the 3 primary clusters. Turn off the spare box to save the electricity and run your Active/Passive configuration with physical hardware for the active node and let the passive nodes idle on VM. Physical to Virtual Clusters are the only reasonable way to do an active/passive cluster configuration.

This type of cluster isn’t any “better” than a physical/physical model but it is cheaper in the long run to maintain and setup.



Running Documentum eRoom 7 in VMware – Notes and tricks
June 9, 2008, 1:41 pm
Filed under: Documentation, vmware | Tags: , ,

Frequently I get asked about how to deploy eRoom in a VMware infrastructure. Some people don’t even know that we fully support VMware (I blame this on the fact that VMware changed its logo to remove the “an EMC Company” subscript).

eRoom is a fantastic application to run in a VM. We fully support VM ESX 2.5-3.5 (2.5 support may fall of the chart soon – see EMC Support Note ESG25111 for the most current supportability matrix).

The problem I have with this basic eRoom v7.4 design and implementation is that so much depends on load and use cases. Lets assume 1,000 licensed users and 2 TB of data. Here is how I would design and deploy this environment:

5 total VMs

2 – Application VM, 1 CPU 2GB ram 20GB hdd- no reservations on RAM or CPU – Server 2003 SP2

1 – Index server/File Server – 4 vCPU, 2GB ram, 2 HDDs (OS on one and file share/indexing data on the second) – There are a few reasons for the multiple CPU count. It should always be N+1 where N = the number of application servers. The primary reason for multiple vCPUs is a VMware defect with their hardware acceleration feature which can cause a problem with the indexing engine.

1 – IRM server – Same as the application server above

1 – SQL 2005 DB Vm (Build per MSFT spec)

This system plan allows the greatest flexibility through the eRoom Advanced feature set. I will write about provisioning at a later date but it is still one of the most impressive features within eRoom.

Given a choice between eRoom in a cluster and multiple eRoom servers there should be no question in the choice to use multiple eRoom servers rather than a cluster. That deserves a post all for itself but the short answer is that eRoom in a cluster provides an active/passive configuration whereas eRoom advanced with multiple web servers provides active/active/active/etc configurations which allow you to truly scale your installation.



Flashback – ESX 2.5 installs this weekend
May 24, 2008, 11:49 pm
Filed under: Humor, vmware

A large service provider who will not be named is still running ESX 2.5. In order to test and vet the solution we are providing them I was asked to build up a few servers to run through.

This was of course a comedy of errors as the systems were cast offs or EOL boxes ready to be put out to pasture. The first box had a CPU die on me when I powered it up for the first time after two months of no use. After that it was straight forward but still a serious throwback and that one box was online with an activated swap file and vswitch created (I kind of miss the MUI).

The next two boxes had a series of non compliant parts. Unsupported SCSI cards make me cry because the installer isn’t smart enough to just die when it says “Sorry, this is unsupported” instead it keeps on chugging through loading drivers until it hits one that just won’t finish and waits for reality to reach out and smack you on the head. Luckily a quick power cycle always fixes that problem and because it doesn’t make it far enough into the install to do any damage the box just comes right back up as it was before.

So I am left with one single cpu poweredge and one repurposed precision workstation box (with a single nic no less) to provide to engineering. One can only hope they can work miracles with this class of equipment.

The biggest kicker – ESX 2.5 keys are no longer available from VMware. I emailed their sales alias per the notice on the license page. They wrote me back and said “Sorry we don’t give those out anymore.” I have another few emails in their inboxes requesting a different answer. I can only hope that they will come through with them.

If When I get the keys from those very kind folks at VMware I will hopefully be done with my flashback to two years ago.

The guys at VMware are top notch though because they sent me a new T-shirt (The core customer one). I went out to their apparel site and saw this – Worst T-shirt ever?

Now if I can just convince someone to send me one of the polo’s from EMCWorld 2008 I will be a happy kid. All it takes to keep me happy is a free t-shirt. ( :