deduplication

Windows Server 2012 Deduplication Real World

Recently, in my post entitled Storage Spaces You’re My Favourite, I promised to show the world some real life data on how much you can actually save in storage consumption with Windows Server 2012 deduplication, and here it is.

Real-World Deduplication Savings

This screenshot was taken from a production file server cluster running Windows Server 2012 using SAN attached fibre storage. As you can see from the image, I’ve obscuficated the names of the drives and shares but left the good stuff visible, the savings.

The data is broken across a number of volumes, some of switch aren’t doing deduplication (the cluster Quorum or the DFS Namespace for example) but those that are, are showcasing some impressive numbers. Data is broken across a number of volumes because we had to keep it this way to maintain a few legacy elements and also to keep some of the more confidential data structures separate in the event of someone getting a bit too liberal with the ACLs on the more generic volumes.

As you can see from the screenshot though, on our main storage volume, the E: drive, we are currently saving 1.44TB which is 41% of the total storage of that volume, 48% on one of the smaller volumes and 69% on another. All of these savings added together means that are net savings are 1.96TB compared with the previous Windows Server 2003 based file server solution. We aren’t storing less data, we aren’t telling users to change how they store or compress their data or making any changes to the way anybody does their job. All of this is being achieved just by enabling a feature in Windows Server 2012, a feature that is available in Standard edition may I hasten to add.

If I was a financially oriented man, I’d probably want to know what this actually means in monetary terms? On a previous project, we calculated that roughly speaking, our internal storage costs are in the region of £2.50 per gigabyte per annum. This £2.50 bakes in the cost of our underlying storage array hardware, supporting fibre channel network and the cost of disks and all. Based on this £2.50 per gigabyte, the deduplication feature in Windows Server 2012 is therefore saving us around £5,000 a year.

Looking back then, it’s clear to see that deduplication in Windows Server 2012 really does work not only on a feature level but also on a financial level. For us as an enterprise to move to Windows Server 2012 for our file server solution cost us nothing aside from some time because we are permitted to upgrade from Windows Server 2003 to Windows Server 2012 under our SA (Software Assurance) rights from our EA (Enterprise Agreement) but in return, it’s going to save us £5,000 a year in storage costs.

If it wasn’t due to the fact that we keep some of this data logically separated in different volumes then we probably could see an uplift on these savings. With deduplication, less is more and by this I mean volumes. The less volumes you have, the more you will save. This is because deduplication is done at a volume level. Minimising the number of volumes you have and consolidating your data into larger volumes will increase the return from deduplication.

Deduplication in Windows Server 2012 Essentials

Yesterday, I posted with a quasi-rant about Windows Server 2012 Essentials Storage Pools and the inability to remove a disk in a sensible non-destructive manner. At the end of that post, I eluded to the lack of the Primary Data Deduplication feature in Windows Server 2012 Essentials which got me thinking about it more, so I went of on an internet duck hunt to find the solution.

Firstly, I found this thread (http://social.technet.microsoft.com/Forums/en-US/winserveressentials/thread/4288f259-cf87-4bd6-bf9f-babfe26b5a69) on the TechNet forums in which an MVP highlights a bug which was filed on Microsoft Connect during the beta stages over the lack of deduplication. The bug was closed by Microsoft with a status of ‘Postponed’ and a message that it was a business decision to remove the feature.

Sad, but true when the people being targeted with Essentials are the people potentially wanting and needing it most, but I guess the reason probably lies in the realms of supportability and a degree of knowledge gap in the home and small business sectors to understand the feature.

Luckily for me, in another search, I found this article (http://forums.mydigitallife.info/archive/index.php/t-34417.html) at My Digital Life where some nefarious user has managed to extract the .cab files from a Windows Server 2012 Standard installation required to allow DISM to install the feature. While the post is targeted at Windows 8 64-bit users to use dedup on their desktop machines, the process works equally well for Windows Server 2012 Essentials, if not better as you can also use the GUI to drive the configuration.

I don’t want to be the one in breach of copyright infringement or breach of terms of service with Microsoft, so I’m not going to link to the .7z file provided on My Digital Life, so download it from them, sorry.

Download the file and extract it to a location on the server. Once extracted, open an elevated command prompt, change the directory context of the prompt to your extracted .7z folder and enter the following command:
dism /Online /Add-Package /PackagePath:Microsoft-Windows-VdsInterop-Package~31bf3856ad364e35~amd64~~6.2.9200.16384.cab /PackagePath:Microsoft-Windows-VdsInterop-Package~31bf3856ad364e35~amd64~en-US~6.2.9200.16384.cab /packagepath:Microsoft-Windows-FileServer-Package~31bf3856ad364e35~amd64~~6.2.9200.16384.cab /PackagePath:Microsoft-Windows-FileServer-Package~31bf3856ad364e35~amd64~en-US~6.2.9200.16384.cab /packagepath:Microsoft-Windows-Dedup-Package~31bf3856ad364e35~amd64~~6.2.9200.16384.cab /PackagePath:Microsoft-Windows-Dedup-Package~31bf3856ad364e35~amd64~en-US~6.2.9200.16384.cab

If DISM fails or gives you any errors, then the most likely cause is that you didn’t use an elevated command prompt. The next likely cause is that you aren’t in the correct working directory so check that too.

Once all of the packages are imported okay, enter the second command:
dism /Online /Enable-Feature /FeatureName:Dedup-Core /All

No restart is required for the import of the packages or the enabling of the feature, so everything can be done online.

Once the feature is enabled, head over to Server Manager to get things started. Server Manager isn’t pinned to the server Start Screen by default, so from the Start Screen type Server Manager and it will appear in the in-line search results.

From Server Manager, select File and Storage Services from the left pane, and then select Volumes from the sub-options.

As you will see in the screenshot, I’ve already enabled dedup on the volume on this test Windows Server 2012 Essentials VM of mine and I’ve saved space  by virtue of the fact that I’ve created two data folders with identical data in each folder.

For you to configure your volumes, right click the volume you want to setup and select the Configure Data Deduplication option. On the options screen, first, tick the box to enable the feature. Once selected, you have options for age of files to include in Deduplication and types of file to exclude. For my usage at home, I am setting the age to 0 days which includes all files regardless of age, and I am choosing to not exclude any file types as I want maximum savings.

The final step is at the bottom of the dialog, Set Deduplication Schedule. This allows you to configure when optimization tasks occur and whether to use background optimization during idle periods of disk access. I chose to enable both of these and I have left the default time of 0145hrs in place.

Once you click OK and then OK again on the initial dialog, you have just enabled dedup on that volume. Repeat the process for any volumes you are interested in and job done for you. After this, the server has the hard task of calculating all the savings and the process of actually creating the metadata links to physical blocks on the disk and marking the space occupied by duplicate blocks on the disk as free space. This process is very CPU and memory heavy and depending on the size of your dataset can and will take a long time to run.

I am just about to kick off a manual task on my live Essentials server at home, so once the results are in, I will be posting here to report my savings and also the time taken, but I’m not expecting this to come in anytime within the next day or so.

 

The Problem with Storage Spaces

As you may well have gathered from a number of my previous posts about Windows Server 2012 and Storage Pools, I was intending on using them for my home server rebuild, and I am indeed using them, however I have neglected to post anything showing the new server (although I will change that shortly).

I ran into a problem with Storage Pools today which I think quite frankly blows. I got myself a new Western Digital 3TB Red drive to try out. The plan is to replace all of my existing six 2TB Western Digital Green drives with these for a number of reasons including greater bang for buck on power consumption, increased IOPS, cooler running temperature and improved reliability.

Not wanting to keep a mixture of Green and Red drives for very long, I proceeded to remove one of the drives from the pool to replace with a Red drive. The Storage Pool refused to remove it as a Simple non-redundant Storage Space was being hosted on this drive.

Problem 1:  Storage Spaces cannot be converted between Simple, Mirror or Parity. Once they are created, they are created. My only option for this was to create a new temporary Space marked as Mirror and copy the data from the Simple so that I could delete it. Once deleted, I tried a second attempt to remove the drive and I got an error that I needed to add another drive to the pool as there was insufficient capacity.

I’m sorry, what?

Problem 2: I have six 2TB drives in an uber-pool. I am currently less than half of it, so removing the drive should be no problem. I tried this a few more times and each time I got the same error that I would need to add more capacity to the pool before I would be able to remove the drive, which I know to be cobblers.

In the end, I just pulled the disk from the server and let the Storage Pool have a cry about the missing disk. From here, I marked the disk for removal to allow Windows to think that the disk was failed and that it was never coming back. This worked although is time consuming as it forces all Mirror and Parity virtual disks to enter a repairing state, copying blocks to remaining disks in the pool to keep up the protection level.

This brings me softly onto another point which is more of a beef.

Beef 1: One of the tricks of Windows Server 2012 was deduplication. Anyone familiar with Windows Server 2012 will know that Storage Pools and deduplication do work together, but in Essentials, deduplication is absent, missing, not there. The feature is completely missing from any of the Server Manager interfaces and from PowerShell, the command Get-Command -Module Dedup* returns nothing.

Why is it missing from Essentials? Essentials is the release of Windows Server 2012 targeted at SMB/SME and pro-home customers, the customers most likely to be storing a lot of data on a tight budget, so why strip out the feature that they will probably be highly interested in, in Windows Server 2012.

I really hope that Microsoft get enough complaints from customers of Essentials to release a Feature Update to re-add the support for deduplication.

With this done,