Explaining NUMA Spanning in Hyper-V

When we work in virtualized worlds with Microsoft Hyper-V, there are no many things we have to worry about when it comes to processors. Most of these things come with acronyms which people don’t really understand but they know they need and these and one of these is NUMA Spanning which I’m going to try and explain here and convey why we want to avoid NUMA Spanning where possible and I’m going to do it all in fairly simple terms to keep the topic light. In reality, NUMA architectures may be more complex than this.

NUMA Spanning or Non-Uniform Memory Address Spanning was a feature introduced into motherboard chipsets by Intel and AMD. Intel implemented it with the feature set Quick Path Interconnect (QPI) in 2007 and AMD implemented it with HyperTransport in 2003. NUMA uses a construct of nodes in it’s architecture. As the name suggests, NUMA refers to system memory (RAM) and how we use memory and more specifically, how we determine which memory in the system to use.

Single NUMA Node

Single NUMA Node

In the most simple system, you have a single NUMA node. A single NUMA node is achieved either in a system with a single socket processor or by using a motherboard and processor combination which does not support the concept of NUMA. With a single NUMA node, all memory is treated as equal and a VM running on a hypervisor on this configuration system would use any memory available to it without preference.

Multiple NUMA Nodes

Two NUMA Nodes

In a typical system that we see today with multiple processor sockets and with a processor and motherboard configuration that supports NUMA, we have multiple NUMA nodes. NUMA nodes are determined by the arrangement of memory DIMMs in relation to the processor sockets on the motherboard. In a hugely oversimplified sample system with two CPU sockets, each loaded up with a single core processor and 6 DIMMs per socket, each DIMM slot populated with an 8GB DIMM (12 DIMMs total). In this configuration we have two NUMA nodes, and in each NUMA node, we have one CPU socket and it’s directly connected 48GB of memory.

The reason for this relates to the memory controller within the processor and the interconnect paths on the motherboard. The Intel Xeon processor for example has an integrated memory controller. This memory controller is responsible for the address and resource management of the six DIMMs attached to the six DIMM slots on the motherboard linked to this processor socket. For this processor to access this memory it takes the quickest possible path, directly between the processor and the memory and this is referred to as Uniform Memory Access.

For this processor to access memory that is in a DIMM slot that is linked to our second processor socket, it has to cross the interconnect on the motherboard and via the memory controller on the second CPU. All of this takes mere nanoseconds to perform but it is additional latency that we want to avoid in order to achieve maximum system performance. We also need to remember that if we have a good virtual machine consolidation ratio on our physical host, this may be happening for multiple VMs all over the place and that adds up to lots of nanoseconds all of the time. This is NUMA Spanning at work. The processor is breaking out of its own NUMA node to access Non-Uniform Memory in another NUMA node.

Considerations for NUMA Spanning and VM Sizing

NUMA Spanning has a bearing on how we should be sizing our VMs that we deploy to our Hyper-V hosts. In my sample server configuration above, I have 48GB of memory per NUMA node. To minimize the chances of VMs spanning these NUMA nodes, we therefore need to deploy our VMs with sizing considerations linked to this. If I deployed 23 VMs with 4GB of memory each, that equals 92GB. This would mean 48GB memory in the first NUMA node could be totally allocated for VM workload and 44GB of memory allocated to VMs in the second NUMA node leaving 4GB of memory for the parent partition of Hyper-V to operate in. None of these VMs would span NUMA nodes because 48GB/4GB is 12 which means 12 entire VMs can fit per NUMA node.

If I deployed 20 VMs but this time with 4.5GB of memory each, this would require 90GB memory for virtual workloads and leave 6GB for hosting the parent partition of Hyper-V. The problem here is that 48GB/4.5GB doesn’t fit, we have left overs and uneven numbers. 10 of our VMs would fit entirely into the first NUMA node and 9 of our VMs would fit entirely within the second NUMA node but our 20th VM would be in no man’s land and would be left to have half its memory in both of the NUMA nodes.

In good design practice, we should try to size our VMs to match our NUMA architecture. Take my sample server configuration of 48GB per NUMA node, we should use VMs with memory sizes of either 2GB, 4GB, 6GB, 8GB, 12GB, 24GB or 48GB. Anything other than this has a real risk to be NUMA spanned.

Considerations for Disabling NUMA Spanning

So now that we understand what NUMA Spanning is and the potential decrease in performance it can cause, we need to look at it with a virtualization lens as this is where it really takes effect to the maximum. The hypervisor understands the NUMA architecture of the host through the detection of the hardware within. When a VM tries to start and the hypervisor attempts to allocate memory for the VM, it will always try to first get memory within the NUMA node for the processor that is being used for the virtual workload but sometimes that may not be possible due to other workloads blocking the memory.

For the most part, leaving NUMA Spanning enabled is totally fine but if you are really trying to squeeze performance from a system, a virtual SQL Server perhaps, NUMA Spanning would be something we would like to have turned off. NUMA Spanning is enabled by default in both VMware and Hyper-V and it is enabled at the host level but we can override this configuration on both a per hypervisor host level and a per VM level.

I am not for one minute going to recommend that you disable NUMA Spanning at the host level as this might impact your ability to run your workloads. If NUMA Spanning is disabled for the host and the host is not able to accommodate the memory demand of the VM within a single NUMA node, the power on request for the VM will fail and you will be unable to turn on the machine however if you have some VMs which have NUMA Spanning disabled and others with it enabled, you can have your host work like a memory based jigsaw puzzle, fitting things in where it can.

Having SQL Servers and performance sensitive VMs running with NUMA Spanning disabled would be advantageous to their performance and having NUMA Spanning disabled on VMs which are not performance sensitive allows them to use whatever memory is available and cross NUMA nodes as required giving you the best combination of maximum performance for your intensive workloads and the resources required to run those that are not.

Using VMM Hardware Profiles to Manage NUMA Spanning

VMM Hardware Profile NUMA Spanning

So assuming we have a Hyper-V environment that is managed by Virtual Machine Manager (VMM), we can make this really easy to manage without having to bother our users or systems administrators with understanding NUMA Spanning. When we deploy VMs we can base our VMs on Hardware Profiles. A VMM Hardware Profile has the NUMA Spanning option available to us and simply, we would create multiple Hardware Profiles for our workload types, some of which would be for general purpose servers with NUMA Spanning enabled whilst other Hardware Profiles would be configured specifically to be used by performance sensitive workloads with the NUMA Spanning setting disabled in the profile.

The key to remember here is that if you have VMs that are already deployed in your environment you will need to update their configuration. Hardware Profiles in VMM are not linked to the VMs that we deploy so once a VM is deployed, any changes to the Hardware Profile that it was deployed from do not filter down to the VM. The other thing to note is that NUMA Spanning configuration is only applied at VM Startup and during Live or Quick Migration. If you want your VMs to update the NUMA Spanning configuration after you have changed the setting you will either need to stop and start the VM or migrate it to another host in your Hyper-V Failover Cluster.

Slow WDS PXE Clients and Bad Memory

Following on from my post last week about UK Regional Settings for MDT 2013, I have been this week testing the deployment of a Lite Touch MDT image using WDS PXE over Multicast. Unlike what you will read online about Multicast, I haven’t personally had any issues with it and Multicast has worked off the bat but the problems I have been encountering are actually with Unicast, with the initial phase of PXE boot, downloading the Boot SDI and the WinPE LiteTouch WIM files.

In this case, I’ve been given eight client machines to test the deployment and we were finding that only about half of them were properly initiating the WinPE environment in a sensible timeframe with the other clients taking over 30mins just to download the Lite Touch WinPE image which obviously isn’t cricket as you should be able to lay down the entire Windows OS image is not much more time than that.

All of the machines are HP 8000 desktops with a matching hardware specification and matching firmware revisions so we were left wondering if the problem was the network, routing or such like however earlier on this afternoon, we found the issue and I have to say, it’s one of the craziest reasons I’ve seen something not working in a long time, especially considering how software defined our worlds have become.

Hynix Memory 2GB PC3-10600U-9-10

Yes, that is correct, the above is an image of a Hynix 2GB PC3-1006U-9-10 DIMM and this was the cause of our problems.

The machines in question were all configured with 6144MB of RAM in the form of three 2GB DIMMs. What we didn’t notice at an early stage and why would you really, was that some of the machines exclusively had three DIMMs of HP certified Micron memory in them and our faulting machines had a combination of HP certified Micron memory and Hynix HP certified memory.

All the DIMMs were of the same unregistered type, all of the same PC3-10600 speed and all have the same 9-10 CAS latency so it’s just crazy to think that a mismatched batch of Micron and Hynix memory could ruin things for us given that all of the other factors like registration, speed, latency and ranking were matched.

Simply by removing the Hynix DIMMs from the machines and leaving them with 4096MB made up of two 2GB DIMMs of Micron memory allowed these machines to load the Boot SDI and Lite Touch WinPE WIM files at the speed we expected to see and were already seeing on the other clients.

When we look at this logically, you can see why our issue was a memory problem because the download of the Lite Touch WinPE WIM is done into memory and the hard disk is not touched at this point but I cannot remember the last time I saw a simple DIMM cause so much of a problem. These days we automatically assume that hardware works and that our problems exist in software due to the configurable nature of everything but this was certainly a lesson to never forget the simple things in computing: the basic hardware like processors, memory, motherboards and the like.

Cisco ASA 5520 Memory Upgrade

For anyone using a Cisco ASA 5505, 5510, 5520 or 5540 in their home, lab or non-production environments and wants to be able to run ASA OS versions 8.3 and later you’re probably going to be on the market for a memory upgrade. Cisco ASA memory upgrades are bonkers expensive and while for a production environment you’d want to pay this to get the Cisco TAC support, chances are you aren’t going to want to stump up this kind of money for other purposes.

There is an exception to this rule is if you happen to have an ASA whereby it was either built after February 2010 or the previous owner upgraded it but that’s neither here nor there.

The specifications from Cisco on the memory requirements for each model to run ASA OS 8.3 or later and the comparative shipping memory values can be found at http://www.cisco.com/en/US/prod/collateral/vpndevc/ps6032/ps6094/ps6120/product_bulletin_c25-586414.html.

In my case, the ASA 5520 shipped originally with 512MB of RAM but for ASA OS 8.3 or later you need to have 2GB. The ASA 5520 varies in it’s hardware configuration according to age with some models having four DIMM slots and others only having two. If you’ve got an ASA 5520 or 5540 with only one DIMM slot then sorry, you’ve got an ASA 5510 which has been faked into a 5520 which was a big problem at the time (https://supportforums.cisco.com/message/3517301).

As I didn’t want to spend £300 on the memory upgrade for mine, I went on a search of the internet as you’d expect of me. It transpires that Cisco used memory from Smart Modular in the ASA appliances. 184-pin PC2700 DDR-333 ECC Unbuffered memory to be exact. According to some clever people on the internet, not many memory modules aside from these from Smart will work in the ASA as the Linux kernel on it is only coded to recognise a select few memory setups however luckily, it appears that Infineon are one of the good guys.

Due to the way that memory under-rates itself when required, you don’t have to stick to PC2700 DDR-333 and nor does it seem that you need ECC memory either. From advice online I’ve found that the following module models from Infineon work great in the ASA 5520. I’ve had none of the commonly reported issues with third-party memory of the appliance only successfully booting one in two or three reload cycles. My ASA has booted first time, every time and I’ve been cycling it about once and hour today to test it.

If you’ve got the luxury of four DIMM slots, go with the Infineon HYS64D64320HU-5-C. It’s a 512MB PC3200 DDR-400 DIMM which you can install four of to make the 2GB requirement. If you’ve only got the two DIMM slots to play with, go with the Infineon HYS64D128320HU-5-B which is a 1GB PC3200 DDR-400 DIMM.

eBay is the place to buy in case there was any doubt over that point and no matter which one of the above options you go with, by using these Infineon DIMM modules, you’ll get a reliable ASA platform and it allows you to hit your memory maximums for ASA OS 8.3 and onwards for about £20 at the time of writing. Just a touch better than the £300 for the official memory right?