I found myself asking the other day how it was possible for my cluster to be over-committed in SVMM 2008 R2 with the creation of one more virtual machine. With a four node cluster and each with 53GB of memory and only a subset of that memory allocated to existing running virtual machines I was sure that I should be able to continue with creating more VMs. My first mistake was looking at memory allocation of virtual machines that were running and not including the virtual machines that were in an OFF state. If we were talking about not being able to start a virtual machine because of a lack of memory resources we would only focus on running virtual machines, but when talking about over commitment, all memory allocation needs to be taken into consideration.
I then hit the internet and TechNet as usual for my answers and troubleshooting steps. What I was able to determine is that cluster over-commitment is determined based on slot size. For example and for the sake of simplicity, lets take a 2 node cluster with a few virtual machines created. Below is our setup, all vms are configured as highly available.
Cluster Host 1 – 24GB or RAM configured
Cluster Host 2 – 24GB or RAM configured
VM1 – Running with 6GB of RAM allocated
VM2 – Running with 4GB or RAM allocated
VM3 – Off with 4GB of RAM allcoated
VM4 – Running with 2GB or RAM allocated
VM5 – Running with 2GB of RAM allocated
Given that VM1 is the HA VM with the largest configured memory at 6GB, our slot size for the entire cluster is 6GB. This also tells us that each host has a total slot count of 4 per host. 24 (usable ram on host) / 6 (slot size) = 4 slots.
This means that at any given time, should we lose a single host, we only have four slots available that can be filled to ensure our virtual guests stay online. Virtual Machines 1-5 in the example will use a total of 3 slots as shown below
VM1 has 6GB RAM allocated = 1 6GB slot used
VM2 has 4GB of RAM allocated & VM4 has 2GB of RAM allocated = 1 6GB slot used
VM3 has 4GB of RAM allocated & VM5 has 2GB of RAM allocated = 1 6GB slot used
As stated we have 1 slot left open that could be filled should we lose a single host, meaning we can create more virtual guests and not be overcommited. The new guests could be a single 6GB vm, three 2GB vms, a 4GB and 2GB vm, or anything totaling 6GB of highly available RAM that must be accounted for.
Consider we create a new 6GB vm and add it into the HA cluster and all is well. The next virtual machine, whether it be a single 1GB vm or a two 4GB vms, will result in an error stating that our cluster is over-committed and prevent creation.