Hyper-V

Checkpoints (Snapshots) are great feature for saving the state of the Virtual Machine before making changes like applying patches or updates, installing or configuring applications etc. If new changes or updates break the system, we can easily go back to the previous state. To allow this, Hyper-V creates another virtual disk file to store all the changes (in simple words). This causes the file to grow in size and depending on how much change data is being created on the VM it can grow either fast or slow. If the location where the checkpoints are being stored runs out of available free space, Hyper-V will pause the VM. This not only affects the VM that has the growing checkpoint, but also has the potential to affect other VMs that are stored in the same location (Storage Volume/LUN). This process occurs so that the VMs don’t start failing disk writes within the OS of the VM and possibly cause file corruption. To recover from this we have to delete the checkpoints that are not needed anymore. Because of this we have to be very careful when creating checkpoints and remember to delete the checkpoints that are not needed any more.

To avoid this problem and keep a control on the checkpoints I developed a PowerShell script that checks all the checkpoints for all the VMs and deletes the checkpoints that are older X number of day. The attached script currently checks for 3 days or older checkpoints, but this can be easily changed. Also, there can be occasions where we want some checkpoints to stay e.g. if there is a maintenance that is taking longer and we want to keep those checkpoints for longer than 3 days. For adding this function I changed the script to check for a file called “CheckPointsException.txt”, this file will have the names of the all the VMs that have exceptions to keep their checkpoints longer than 3 days. So, for all the VMs that are listed in this file, my script will not delete the checkpoints. Once the maintenance is done we can remove the VM name from this file and next time the scripts runs it will delete the checkpoints for that VM. This script can be add in the Windows Task Scheduler to run every day, or whatever schedule is need. This script also creates an HTML report will a list of VMs with their checkpoints that are older than 3 days and their status (exceptions etc.). It also emails this report to the list of recipients defined in the script. So, this report acts as a reminder that there are checkpoints that are in exceptions list so they are not forgotten.

You can add the Hyper-V Clusters and Standalone hosts at the top of the script. You would also need to set the paths to the reports file and CheckPointsException.txt file.

You can download the script below and rename it to .ps1. Please send me comments.

We are using Hyper-V for virtualization. We have several Hyper-V clusters running hundreds of VMs. From time to time there is a need to the reboot Hyper-V host servers usually when we push updates to them or sometimes just to refresh them. We all know that Window OS likes regular reboots to refresh itself.

So, whenever we needed to reboot the host servers, we had to do them manually. The steps we had to do are:

  • Pause” first host/node and “Drain” roles
  • Once it is in “Pause” state, reboot it and wait for it come online
  • When the host is online, “Resume” it and move to the next host

This was a time consuming task therefore I decided to develop a script to automate this task. The script is attached and you can download and use it if you want.

Now let me explain a bit what this script does. As you can see I have added some output commands so we can we can monitor the execution and see what is being done at each step. The script takes the cluster name as an argument. You run the script using the command below:

.\RebootClusterHosts.ps1 -Cluster "ClusterName"

Replace the above “ClusterName” with the name of your cluster. At first the script gets a list of all the nodes in the cluster and then goes into a “for” loop to process each node. It firsts check if the node is up (there maybe one or more nodes down because of any issues or maintenance), if that node is not up then it skips it and moves to the next node. If the node is up, it starts the process to Drain the Roles/VMs from the node and pause it. It then waits until all the Roles/VMs are drained and the node is paused. When this process is complete, the script waits for 5 seconds before reading the status of the node (I added a “sleep” for 5 seconds just to give it a few more time to refresh). After that it checks the status of the process, it checks the node DrainStatus and node State. If DrainStatus is not “Complete” and State is not “Paused” that means something went wrong. We have seen issues where sometimes all the Roles/VMs are not successfully migrated to other nodes and this process fails. This is where the script prompts the user that draining was not successful and they need to drain the node manually and press [Enter] key when done, so the script execution can continue. So, here the user needs to drain the node manually, once done they can press [Enter] key.

If the draining was completed successfully, no user interaction is needed and the script continues to restart the node and then it waits for the node the come online. I added the switch to the Restart command to wait for PowerShell status to come online. I have seen that this check is not enough and sometimes the nodes take longer to fully become available after the reboot. That is why here I have also added a “sleep” for 15 seconds to give the node another 15 seconds. Then after that there is a while loop that checks the status of the node every 1 second until it becomes available. When the node becomes available, the script Resumes/UnPauses the node. I have used another while loop here where the script tries to resume the node and check it’s status after every 1 second until it’s status becomes “Up“. I have ran into the issue where sometimes trying to resume the node once did not go well therefore I added a while loop here.

When the node’s status becomes “Up” the script moves to the next node and does all that processing mentioned above on that node.

I have ran this script multiple times and have tuned it as I came across issues. You all are welcome to modify it according to your needs. If you find something that can be done better, please do let me know.

This script can be used for rebooting failover cluster hosts/nodes and not only Hyper-V cluster hosts/nodes.

Thanks all!

Download the script below.

SCVMM Shows Error as below

Error (801)
VMM cannot find VM object xxxxxxxxxxxxxxxxxxxxxxxxxx.

Recommended Action
Ensure the library object is valid, and then try the operation again.

Run the following Powershell script. It will remove the VMs with missing VHD info from SCVMM Console. It will not delete the VMs, it will only remove them from SCVMM DB and Console.

Import-Module -Name "virtualmachinemanager"
Get-Vmmserver YOURVMMSERVERNAME
$Cluster = Get-VMHostCluster YOURCLUSTERNAME
$VMHosts = Get-VMHost -VMHostCluster $Cluster

foreach ($VMHost in $VMHosts) {
	$VMs=Get-VM -VMHost $VMHost

	foreach ($VM in $VMs) {
		if (Get-SCVirtualHardDisk -VM $VM) {
			#VM is fine, VHD info is fine, do nothing
			Write-Host $VM" is Good"
		}
		else {
			#VHD info is bad, delete VM from SCVMM
			Write-Host "Removing "$VM
			Remove-SCVirtualMachine $VM -force
		}
	}

	foreach ($VM in $VMs) {
		if (Read-SCVirtualMachine -VM $VM) {
			Write-Host $VM" is Good"
			#VM is fine, do nothing
		}
		else {
			#VM info is bad, delete VM from SCVMM
			Write-Host "Removing "$VM
			Remove-SCVirtualMachine $VM -force
		}
	}
}

When right-clicking on the Virtual Machine and clicking on Properties, the VMM Console Crashes. When you check the Event Logs, it shows the error similar to this:

Description:
Stopped working

Problem signature:
Problem Event Name : CLR20r3
Problem Signature 01: vmmadmin.exe
Problem Signature 02: 1.0.523.0
Problem Signature 03: 4d432cdf
Problem Signature 04: System.Windows.Forms
Problem Signature 05: 2.0.0.0
Problem Signature 06: 4f682206
Problem Signature 07: 14d0
Problem Signature 08: 23
Problem Signature 09: System.ObjectDisposedException
OS Version: 6.1.7601.2.1.0.274.10
Locale ID: 1033

Try the following to fix the the issue:

Open VMM Powershell prompt and run the following command

Get-VM -Name “Name of the VM” | Remove-VM -Force

The VM should stay online but will be removed from the VMM. It will reappear automatically after a while or you can right-click the host and Refresh Virtual Machines.

If you are running Hyper-V Failover Cluster and managing it using SCVMM you would have noticed that when you create a new VM in SCVMM, the VM name in the Failover Cluster Manager shows as “SCVMM VMNAME Resources”.

Also, if you rename a VM in SCVMM or Hyper-V Manager it does not get updated in Failover Cluster Manager and vice versa.  This creates issues if you have alot of VMs to manage and then if you need to rename a VM.

To resolve this issue, I wrote a small script using Powershell. This script can be run in two modes:

  1. Compare the VM names in Failover Cluster and SCVMM and list the VMs with mismatching names.
  2. Rename the VMs in Failover Cluster to match the SCVMM names.

There are three variables that you need to check/change before running the script:

  1. $VMMServer: This is your VMM Server FQDN
  2. $Cluster: This is your Failover Cluster FQDN
  3. $Mode: If you set it to “1” it will list the VMs with mismatch names, if you set it to “2” it will rename the VMs

If this script works for you and solves your issue, please leave a comment.

Download the script below