Posts from 2015

Non-Published Office 365 Directory Sync with Azure ExpressRoute

In one of my recent sessions with a customer, the customer expressed an interest in protecting their communication between Office 365 and their on-premise environment for the purposes of making their directory synchronization server traffic invisible to the outside world. This got me thinking about Azure ExpressRoute which we know can provide very fast connectivity between your on-premise environments and Azure if you are using a supported MPLS network provider.

The customer in question is using Level 3 Networks as their carrier and Level 3 are on the supported carriers list for ExpressRoute on the ExpressRoute Technical Overview page at https://msdn.microsoft.com/en-us/library/azure/e224be0a-d7b2-4514-b868-86d61cee0ead#bkmk_Connection so I looked into it a little bit further as this was a really interesting proposition – to have Office 365 SaaS managed productivity with Exchange, SharePoint and Lync but to have all of the sync traffic traffic privately routed over ExpressRoute so that you weren’t passing any of that data over the public network (albeit encrypted with HTTPS SSL).

When I looked further, I found that on the ExpressRoute FAQ page at https://msdn.microsoft.com/library/azure/dn606292.aspx it explicitly defines which Azure services are accessible over an ExpressRoute connection and Azure Active Directory (AAD) is not listed nor is anything in relation to Office 365.

Unfortunately, it seems that this isn’t possible right now but it would be nice to see something added in the future to allow AAD to be access over ExpressRoute to allow us to hide and conceal our ADFS or AADSync traffic as this may well answer a security question that some more conscious customers have. The other reason this would be nice as it means we can have our internal users accessing their mail and SharePoint via the ExpressRoute connection so they will get a faster experience that over the companies internet link. Right now however, the best use case for ExpressRoute in my opnion is Azure RemoteApp, allowing you move some or all of the Remote Desktop Services terminal server farms that you may have to Azure and offload your RemoteApp applications to the cloud.

Deployment Scenarios for Office 365 and AAD Identity

In my previous post from yesterday on understanding Office 365 and AAD federated identity types, I talked about the two methods for allowing our users to sign in to the Microsoft cloud services with our on-premises identity using either DirSync, AADSync or FIM for same sign-on or using ADFS for single sign-on. Now that we understand the products at a high-level, I want to cover off some options for deployment scenarios and specifically, how we can leverage Microsoft Azure to host these services.

Customers are increasingly trying to cut the cord with their on-premise datacenters due to the high cost of running them and are looking for cheaper alternatives to run their services. Moving our email, collaboration and communication productivity tools to Office 365 is one way that we can work towards achieving that however in consuming these Office 365 services, we remove the need for having our on-premises Exchange, SharePoint and Lync servers but we replace them with servers that are used to synchronise our identities to the cloud or provide the authentication tokens. If these servers go down it could actually have a more detremental effect than the loss of an on-premise Lync server so we need to pay close attention.

On-Premise Deployments

DirSync, AADSync and FIM for Same Sign-On

If you have opted for a same sign-on deployment then unless you have a specific need for FIM because you have a complex infrastructure or if you have already deployed FIM and want to leverage that existing investment you will be deploying DirSync or AADSync. DirSync is the incumbant tool and AADSync is its replacement although DirSync is still supported and will continue to be supported for a currently undefined period of time so if you have already deployed DirSync then you don’t need to rush out and upgrade your servers to AADSync.

AADSync On-Premise

None of these products, DirSync, AADSync or FIM support clustering or high availability which means you can deploy only one of them at a time. FIM can be sweet talked into working as a clustered product but the process is unsupported and documented by Damian Flynn at http://www.damianflynn.com/2012/10/11/fim-sync-server-2010r2-cluster-build/. If you want to provide high availability for these applications then you will need to look outside the operating system and look to your hypervisor, assuming you are deploying them as virtual machines.

For DirSync and AADSync, the easiest thing to do will be to have a disaster recovery plan and a build document for the installation so that if you’re server melts, you can deploy a new one from a VM template in short order. If you were extra paranoid, you could have a VM deployed with the DirSync or AADSync binaries on the server ready to install (but not installed) so that if you had a failure you could simply run through the installer following your build document and be up and running in about 15 to 30 minutes.

For FIM, you definately want a backup of this server because a) its not highly available and b) there are complex configurations going on here and tryring to re-create this from scratch as we would likely do with DirSync and AADSync wouldn’t be fun nor quick.

ADFS for Single Sign-On

As we know from my previous article, deploying ADFS is more involved and requires more moving parts howeer luckily, all of these parts can be clustered or load balanced accordingly. The exception to this is the DirSync or AADSync server that you deploy in addition to the ADFS servers to sync the users identities but not the password attribute to go with it.

ADFS On-Premise

ADFS Proxies which reside in your DMZ network segment are essentially web servers which can be load balanced. We only require Port 443 for HTTPS SSL pages to be made publicly accessible and we can use either Windows native NLB or we can use a hardware or software load balancer such as a Citrix Netscaler, KEMP Loadmaster or an A10 Thunder load balancer. As a side note of interest, Citrix and A10 both support integration into VMM so if you are using a Microsoft private cloud with Hyper-V and VMM in your on-premise network then these products can be easily integrated into your deployment approach for new servers.

ADFS servers live in your clean, corporate network and route requests between the proxies in your DMZ to the Active Directory domain controllers to retrieve the authorization for users. The ADFS servers use a Windows Internal Database (WID) as standard which supports farms of up to five ADFS servers however if you need more than five servers, you will need a seperate SQL Server for the database which we can of course cluster.

Azure Deployment

DirSync, AADSync and FIM for Same Sign-On

As I mentioned earlier, companies are interested in cord cutting the on-premise datacenter as we know so moving things into the cloud where it is possible and makes sense helps us to achieve this and more importantly, reduce the dependancy on our internal facilities. With DirSync, AADSync and FIM deployments for same sign-on, we can really easily achieve this with Microsoft Azure.

AADSync Azure

The components that make this design work are the Azure VPN which provides site-to-site connectivity between your on-premises environment and Microsoft Azure. Once you have the Azure VPN configured and working and you have your virtual network segments configured in Microsoft Azure, we need to deploy an Active Directory Domain Controller into Azure using an IaaS VM. This will hold a replica of your Active Directory database from on-premise. We then deploy the DirSync, AADSync or FIM server on an IaaS VM in Azure.

By deploying the synchronisation VM in Azure along with the domain controller, the two servers are within the same AD Site which you will have created for Azure and will allow the synchronisation server to contact AD for the user data it needs and will allow it very fast access to the Azure Active Directory service to sync your objects to the cloud directory.

For high availability in this design, we can deploy multiple Domain Controllers into the Azure AD Site and we can use the Azure VPN service in a point-to-multi-point configuration so that two of your on-premise datacenters have an endpoint for the Azure VPN so that we aren’t dependant on a single site for on-premise connectivity. Remember that with this same sign-on deployment, we can only have a single synchronisation server so it is not possible to make this element highly available. When we deploy multiple Active Directory Domain Controllers into Azure, we need to make sure that these VMs go into an Availability Set together which causes the Azure fabric to place these VMs on seperate hosts and racks in the environment so that they are not effected by maintenance operations.

ADFS for Single Sign-On

Using Azure to deploy our ADFS for single sign-on deployment really is the way to show case the power and ease at wihch we can deploy a complex solution with Microsoft Azure. The building block of this design is the same as that of the DirSync and AADSync in that we start out with configuring Azure VPN in a point-to-point or point-to-multi-point connection to our on-premise environment and we then deploy multiple Active Directory Domain Controllers which provide our authentication source.

ADFS Azure

In deploying SSO with ADFS, we need to configure, deploy and test the ADFS environment before we deploy and configure the DirSync or AADSync server. I must atest to not fully understanding the reason for this but Microsoft recommend it so we do it. As you would do for an on-premise deployment, we deploy two ADFS Proxies which recieve our incoming token requests and we deploy two ADFS Servers which do the back and forth between our ADFS Proxies and the AD Domain Controllers to authorize users.

As we did with our previous same sign-on design for a Microsoft Azure deployment, we need to make sure we place our IaaS VMs into Availability Sets however this time we have three sets: one for our Domain Controllers, one for the ADFS Proxies and one for the ADFS Servers.

With an on-premise deployment, we need to buy and configure a load-balancer to use with the multiple ADFS Proxy servers however in Azure we can simply use the Azure Load Balancer to quickly and easily achieve this.

With the ADFS for single sign-on deployment in Microsoft Azure, if having two VMs running for each service isn’t quite enough availability for you because you are concerned about the resiliency that Microsoft provide within a single Azure datacenter, you could take the deployment to another level and use two datacenters. Using two Microsoft Azure datacenters, we split the services provided by the deployment throughout the Azure environment and use the Azure Traffic Manager to provide Geolocation awareness for users and direct them to the most appropriate ADFS Proxy for authentication. If you are deploying ADFS for single sign-on in an organisation which has users throughout the globe then this could be an ideal deployment to achieve not only your high availability requirements for IT but also to improve the user experience by sending the user to the ADFS Proxy closest to them.

Cost Consideration

All of this talk of Azure deployments is very nice but we need to be aware of our costs. Deploying services on-premises has costs associated and these costs are often a lot higher than people realise because they don’t fully take into account every facet of the infrastructure. When you are costing up an on-premise VM you will likely include the cost of the SAN disk occupation or the percentage of a host it occupies and the host of that virtualization host but are you factoring in the amount of extra power that server will consume as a result of the extra CPU cycles or the fraction of load that adds to your UPS or your network switches?

Implementing a simple deployment of DirSync or AADSync will be very cheap and I would challenge any enterprise to be able to do it themselves cheaper. A single datacenter deployment of ADFS for single sign-on will cost more to implement than same sign-on due to the increased number of servers and the addition of the Azure Load Balancer however that cost will double if you extend that to a dual Azure datacenter deployment not to mention the increased configuration complexity.

To conclude, make sure you consider the cost and complexity of what you are proposing to implement and make sure that what you plan to do is really what you need to do. If all your users are based in the UK then having that all sining, dual datacenter, multi-point Azure VPN deployment split across the UK and East Coast US with Geolocation Traffic Management probably isn’t what you really need and you will be wasting resources in both money to pay for it and time and complexity to manage it.

Understanding Office 365 and AAD Federated Identity Types

Recently, I’ve undertaken a number of customer chalk and talk sessions on Office 365 to discuss with them some of the benifits they can expect to see from moving from on-premise services to Office 365 hybrid and cloud services. Amongst the myriad of topics that get covered in these sessions, one of the biggest areas for discussion and contention is identify federation from the on-premise environment to Office 365 which uses Azure Active Directory (AAD) as it’s identity management system.

I thought I would take this oppourtunity to cover off some of the high-level points of the trade-offs and differences between the ways of achieving identity federation with Office 365 and Azure Active Directory. Please remember that this isn’t an exhaustive list of things to consider but a good taster.

In some future posts, I will be covering deployment scenarios for the two methods of identity federation and also the software we need to configure and deploy in order to make it work.

What is Identity Federation

Simply put, identity federation is the means of allowing your users to logon to both on-premise services and Office 365 and Azure Active Directory authentication based services with a single identity, the identity they know and love that resides currently in your on-premise Active Directory and to most people is simply the username and password that they use to logon to your internal PCs and other systems.

Without identity federation, we have a scenario where users have split personas resulting in them having one logon for your on-premise services and another for their cloud services with Office 365 and Azure Active Directory. If you work in a highly secure, militry level environment, perhaps this is actually what you want and you don’t want to potentially expose your internal identies to the cloud but for 99% of people looking at Office 365, you want the simplified experience for your end-users of having just one credential to rule them all.

Single Sign-On vs. Same Sign-On

Single Sign-On and Same Sign-On are the same yet different. Single Sign-On refers to the ability to logon once such as to a domain joined Windows desktop PC and then not have to re-enter credentials for any of the services you consume during that session and this is the holy grail of integration scenarios where the user experience is totally seamless and the user doesn’t need to think about anything other than which app or product they want to work with next.

Same Sign-On refers to using a single identity, provided by our identity federation solution however instead of the user seamlessly being logged into these systems, the user may be prompted at various stages to re-enter their credentials in order to authenticate to a web service or an application such as with the Lync Client or a SharePoint team site.

Both of these scenarios are achievable with Office 365 and Azure Active Directory however in order to achieve one over the other increases the amount of work and effort required upfront in order to achieve success. I will explain in more detail further on the technical differences but for now, know that Same Sign-On is the easiest to implement and requires nothing more than one server to run the syncrohisation software. Single Sign-On requires more servers to be deployed and it requires some firewall reconfiguration in order to allow the services to be properly accessible.

Deciding Which Identity Type is Right for You

In this section, I will talk about some of the determining factors for deciding betwween same and single sign-on.

User Experience

The end-user experience is obviously high on the agenda when it comes to deciding between the two identity federation modes. Same sign-on means users can login to services with the same username and password whilst single sign-on allows users to seamlessly move between applications and services without the need to re-authenticate.

The winner here is clearly single sign-on however this comes with a caveat that Outlook does not behave in a truely singluar fashion like the other applications do.

Due to the way that Office 365 uses the RPC over HTTPS protocol with Basic Authentication, Outlook users will still be prompted to enter their credentials even when single sign-on is deployed and properly configured. This is a common misconception and people see it as a problem with their single sign-on configuration or deployment but sadly it is just the way that Outlook behaves. The workaround to this issue is for users to select the Remember My Credentials option and their password will be cached in the Windows Credential Manager until such a time that the user changes their password.

Password Security

Password security is in my opinion, the number one reason that people consider the single sign-on deployment over same sign-on.

With a same sign-on deployment, the authentication to Office 365 and Azure Active Directory service is performed within AAD. The syncronisation software that is run within your organisation syncs both the users and their passwords to the cloud AAD directory. When a user requests access to a service, the password entered by the user is sent to AAD for verification and authorization. All this means that there are copies of your passwords stored in AAD.

With single sign-on, your passwords are not stored in the cloud AAD directory but instead, only the usernames and a few other attributes of the users. When a user requests a login to a service, the authentication request is forwarded to servers within your organisation known as proxies which then forward the request to your internal Active Directory domain controllers. Once authorized, a token is sent back to Office 365 and Azure Active Directory to approve the login. In a nutshell, the passwords never leave your environment, only tokens approving or denying the connection.

The winner for password security is likely to be single sign-on. The idea of having passwords stored in the cloud is too much for some organisations whereas for others, the idea of the authentication tokens flying back and forth is equally bad coupled with the exposure a single sign-on solution potentially adds to your internal directory environment.

Password Changes

Users need to be able to change their passwords in accordance with your organisations security policy. Typically, in order for a user to make that change, they need to be either on-premise or connected to the on-premise environment via a VPN or such if they are working remotely. Windows Server DirectAccess helps with password changes amongst many other scenarios where the users need to be connected to the corporate network.

If you opt for a same sign-on implementation then by enabling the Password Write-Back feature in the sync application, we can enable users to change their passwords without requiring a connection back to the corporate environment. This password change can be performed via one of the Office 365 application websites such as Outlook Web App. Once the password is changed by the user, it is written to the cloud Azure Active Directory and is synchronised back to the corporate directory when the nextx synchronisation occurs.

If you deploy a single sign-on model then you do not have the flexibility of enabling users to perform password changes via the portals because there is no cloud storage of the password and the cloud environment is not aware of the users password but only that they were authenticated from your environment.

If you are looking for ways to reduce the user reliance on VPN technologies and enable them to do more of their work remotely using cloud based applications then same sign-on could provide an added benifit here. Companies which use single sign-on will need to maintain these VPN technologies even if simply to allow users to change their passwords as required. As I have mentioned previously, DirectAccess if not already deployed could be a real answer here as it provides always-on connectivity back to the corporate environment and does not require the user to interact with a VPN client improving the user experience.

Access Revokation

In a scenario where you need to revoke somebodies access very quickly either due to a confidentiality issue or an employee has gone rogue, single sign-on is the clear winner. Same sign-on works by syncronising the stage of user objects periodically based on a scheduled job from on-premise to the Azure Active Directory. If a user account is disabled on-premise then it could take sometime for that change to make it’s way to the AAD directory and further damage could be done in the scenario as a result.

If you are using single sign-on, when a user account is disabled, that user account will no longer be able to authenticate to Office 365 and AAD federated applications as soon as the account is disabled because those services will no longer be able to authorize that user based on the fact that, that authentication request is sent directly into your environment.

Availability and Reliability

This point is closely linked to the password security requirement, as is configuration complexity. In order for users to be able to sign-in to Office 365 and AAD linked services, there needs to be an authentication service available to process the request. For same sign-on where the passwords are stored in the cloud directory, the availability and reliability is provided by Microsoft Azure.

As we understand from the Microsoft Azure SLA page at http://azure.microsoft.com/en-gb/support/legal/sla/ Azure Active Directory Free provides no guarantee of uptime or SLA although through my personal experience and use of it, I have never seen a problem with it being available and working. Azure Active Directory Basic and Premium both offer a 99.9% enterprise SLA along with a slew of other features but at a cost.

For deployments using single sign-on, because the authentication requests are redireted to servers which are maintained by you as the customer, the availability and reliability of the authentication service is dependant on you and is born of a number of factors: we have authentication proxies, authentication servers and domain controllers all of which need to be available for the solution to work and not to mention any firewalls, load balancers, networks and internet connections, service providers, power sources, you name it, all of which are consumed by these servers.

I’ll be covering some deployment scenarios for both same and single sign-on in a future post however for now, we’ll assume all of this resides in your on-premise datacenters. If you have reliable on-premise servers and infrastructure services and you can provide a highly available solution for single sign-on then you will have no problems however if any of the components in the single sign-on server chain fail, users will be unable to authenticate to Office 365 or AAD federated applications which will cause a user experience and an IT support issue for you.

Configuration Complexity

Taking what we learn from the reliability and availability information above, it is fairly apparent that there are more moving parts and complexities to the single sign-on implementation. If you as an organisation are looking to reduce your configuration complexity because you want to unlock time from your internal IT resources or you are looking to improve your uptimes to provide a higher level of user satisfaction then you should consider whether the complexity of a single sign-on implemention is right for you? In theory, once the solution is deployed it should be no hassle at all but all know that servers go bad from time to time and even when things are working right there is still things to contend with like software updates, security patching, backups and so forth.

The same sign-on implemention requires only one server to implement, requires no firewall changes to allow inbound traffic to your network (all communication is based on an outbound connection from the server) and if you wanted to, you could even omit the backups because the installation of the software is very simple and can be recovered from scratch is little time at all.

Same sign-on is definately the winner in the complexity category so if you have a small or over-worked IT department then this could be an assist in the war against time spent on support issues however single sign-on clearly provides a richer user experience so it needs to be a balanced debate about the benifits of avoiding the configuration complexity.

Software Implementation

In this section I am going to quickly cover off the software we use for these scenarios but I am not going to go into configuration of them or the deployment scenarios as I want to save that for another post at the risk of this post dragging on too long. The other reason I want to cover these here is that throughout this post I have talked of same sign-on and single sign-on but I want to translate that into a software term for later reference.

Same Sign-On using DirSync, AADSync or FIM

Same sign-on is implemented using the Directory Synchronisation (DirSync), the Azure Active Directory Sync (AADSync) tools or using Forefront Identity Manager with the Azure AD Connector.

The DirSync tool has been around for quite some time now and has been improved with new features from one iteration to the next. For example, initial versions of DirSync didn’t support Password Sync to the cloud environment which meant users had different passwords for on-premise and cloud which was a reason a lot of early Office 365 adopters didn’t adopt DirSync. With more recent additions to DirSync to support Password Sync and Write Back from the AAD cloud environment, DirSync has become a lot more popular and I find that the majority of customers seeking a fast track adoption of Office 365 going for DirSync. DirSync has a limitation of only working for environments with a single Active Directory forest which made it a non-starter for customers with more complex internal environments.

Forefront Identity Manager (FIM) with the Azure AD Connector is DirSync on steroids or rather, DirSync is a watered down version of FIM. DirSync is a pre-packaged and pre-configured version of FIM. Using the full FIM application instead of DirSync enables support for multi-forest environments and also for environments where identity sources include non-Active Directory environmnets such as LDAP or SQL based authentication sources. For customers wanting a same sign-on experience but with a complex identity solution internally, FIM was the only option. If you have FIM deployed in your environment already then you may want to take this route in order to help you sweat the asset.

AADSync is a relatively new tool and only came out of beta in September 2014. The AADSync tool is designed to replace DirSync as it provides additional features and functionality over DirSync for example, AADSync has support for multi-forest environments (although still only Active Directory based) making it much more viable for larger, complex customer environments. It also allows customers to control which attributes and user properties are synchronised to the cloud environment making it better for the more security concious amongst us.

Single Sign-On using ADFS

Single sign-on is implemented using Active Directory Federation Services (ADFS). ADFS is deployed in many different configurations according to the requirements of the organisation but in it’s simplest form, it requires an ADFS Proxy which is a server residing in your DMZ responsible for accepting the incoming request for authorization from Office 365 and AAD to your environment. This is passed to an ADFS Server which resides on the internal network and communicates with the Active Directory environment to perform the actual authorization.

There is an additional component in the mix and that is the requirement for either a DirSync or an AADSync server however this is deployed to push the user objects into AAD however there is no password sync or write back of attributes occuring here, this server is there just to allow AAD to know what users you have so that AAD knows whether a valid username has been entered in order to pass it down to your environment for authorization.

Because of the public nature of ADFS, it requires you to have public IP addresses available, certificates for the URLs used by ADFS and also requires you to have a DMZ segment exposed to the internet.

Deployment Scenarios

In a future post which I hope won’t take me too long to make available, I will talk about some of the deployment scenarios and options for deploying both same sign-on and also single sign-on. I will also cover how the release of AADSync effects existing deployments of DirSync or FIM.

Support My Wife Without a Penny from Your Pocket

Apologies to those who come here for a techo-fix but I’ve got to drop one personal post in here from time to time 🙂

My wife who is currently a 3rd year university student studying a degree in Midwifery has the option to be able to do an elective period of voluntary placement work overseas during this, her last year of the course. She is looking to work in Gibraltar for two weeks to complete this elective work module because it gives her a taste of how they do things overseas without the complexities of a language barrier.

These elective placements are not paid work and they are essentially working for free in exchange for getting their hands on experience and learning.

Because there are already enough charities asking for your money each month, she’s decided to take a different tact to fundraising using a site called Easy Fundraising. The way this site works is that when you go about your travels of the Internet, buying your wares from eBay, Amazon or other retailers, Easy Fundraising collect a small percentage of the value of the sale from the online stores in referral fees.

Simply visit http://www.easyfundraising.org.uk/causes/nicolagreen/ or using the bit.ly short-link I have created http://bit.ly/nickygib and make your online purchases via the Easy Fundraising site and we collect the referral bonuses. Sadly, I must confess that you do need to register on the Easy Fundraising site before you can support a cause.

International Technology Frustration

We live in a world where our communications are sent around the world in sub-second times thanks to services like Twitter, Facebook and WhatsApp. Thanks to Facebook, LinkedIn and other people hubs we are closer connected to those around us without geographic discrimination and thanks to all of this high-speed communication and information transfer, we discover news and new information faster than ever before.

Taking all of this into consideration, why is it, that we are still in a world where one country takes the glut of the new technology releases without them officially seeing the streets of foreign lands only assisting to line the pockets of the lucky few who are able to import and export these technologies and sell them in the foreign lands via channels like eBay at exorbitant prices.

In the technology arena, Microsoft are one of the worst offenders for doing this. There’s been a number of releases over the years including but not limited to the Zune, Surface Pro, the Microsoft Band and the Wireless Display Adapter for screen Miracast that have been released and neither of them have been released outside of the borders of the US and Canada. Why is it that these highly sought after devices are only being sold in the US and not sold worldwide via Microsoft’s normal retail channels?

I remember when the Surface Pro first launched and I waited months to get one officially in the UK but it never came so I ended up importing one from the US with the help from a former co-worker. Back when Zune was a thing, I happened to be in the US on a long bought of work with my family in tow so I decided to buy one whilst I was there. I for one, would snap up a pair of Wireless Display Adapters and a Microsoft Band the day that they went on sale if they did ever appear here in the UK but I’m not holding out much hope which leaves me with the remaining option to buy them via eBay sellers.

The Microsoft Band is in high demand right now and whilst there a few of them on eBay UK for sale, the price is riding higher than retail and given that the device isn’t officially available here in UK, you don’t know how your warranty will be effected.

The Wireless Display Adapter isn’t quite so hot, largely because other competing products are available in the UK such as the Netgear PTV3000 and as a result of this, if I wanted one, I’d have to buy one from a seller on eBay US and pay whatever import and duty taxes the British government deemed appropriate and then pay whatever handling tax DHL or UPS levy on the shipment for the privilege of advancing my customs payment for me.

All this behaviour results in is a reduced consumer experience because there are devices out there that we want and the companies making them aren’t making them available to us so middle-men fill the void lining their own pockets with profit and driving the retail price up for consumers like you and me. I know that beaming a packet of data down an undersea fibre is obviously easier than arranging shipping and stocking of physical goods, but my point here is that with all of this technology to tell us what is happening around the world, to let us see what we could have, it’s akin to teasing a kid with a lollipop, waving it in front of their face and showing them it, videoing you licking it and playing it over and over again in their face. The kid will end us crying and wanting the lolly and you’d likely give in and let them have it after enough tantrum so why can’t companies see the same logic?

If the trend of devices only being released into the US and not being made available in Europe and the UK (and let us not forget our friends in Australia and New Zealand) continues then I think anything relating to the devices should be applied with IP filters to block people from outside of the availability regions from seeing, hearing or reading anything about it. At least that way, we wouldn’t have the lolly being waved under our noses to tempt us without the opportunity to ever have the lolly.

Free Fitbit Flex with Windows Phone Purchases

If you’re in the market for both a new smartphone and a fitness aid this year, Windows Phone could defiantly be your friend.

Microsoft UK are currently running a promotion that started on January 12th 2015 and runs until March 31st 2015. If you purchase either Microsoft Lumia 735, 830 or 930 between these dates from one of the eligible retailers (almost all UK high street and network outlets are listed) then you can claim a free Fitbit Flex fitness activity and sleep tracking device.

To find out more information about the detail then visit http://www.microsoft.com/en-gb/mobile/campaign-fitbit/. If you want to skip straight to claiming your Fitbit device or want to know if your device is eligible then download the Fitbit Gift app from the Windows Phone Store at http://www.windowsphone.com/en-gb/store/app/fitbit-gift/ee34cfd1-e302-4820-a3cc-0d4e349ccf6a.

I’m a Fitbit user so I like the idea of this promotion but I equally struggle to see it: Microsoft are now in the fitness and activity and sleep tracking business with the Microsoft Band but as we know, this isn’t available in the UK right now. I have to question whether this promotion would instead be against the Microsoft Band if it was available here. Given that the Flex retails for £60 and the Microsoft Band is $200 in the US, I can’t imagine it would be a free promotion like they have on the Flex but I think it would likely be a discount code for £50 off the price of a Microsoft Band.

Fingers crossed the Microsoft Band makes its was UK-side via official channels one day soon and the promotion will flip on it’s head. Don’t forget that all Windows Phone 8.1 devices are going to be eligible for Windows 10 upgrades once the new OS ships too.

Invalid License Key Error When Performing Windows Edition Upgrade

Last week, I decided to perform the in-place edition upgrade from Windows Server 2012 R2 Essentials to Windows Server 2012 R2 Standard on my home server as part of a multitude of things I’m working on at home right now. Following the TechNet article for the command to run and the impact and implications of doing the edition upgrade at http://technet.microsoft.com/en-us/library/jj247582 I ran the command as instructed in the article but I kept getting a license key error stating that my license key was not valid.

As my server was originally licensed under a TechNet key, I wondered if the problem could be down to different licensing channels preventing me from installing the key. On the server, I ran the command cscript slmgr.vbs /dlv to display the detailed license information and the channel was reported as Retail as I expected for a TechNet key. The key I am trying to use is an MSDN key which also should be reported as part of the Retail channel but to verify that, I downloaded the Ultimate PID Checker from http://janek2012.eu/ultimate-pid-checker/ and my Windows Server 2012 R2 Standard license key, sure enough is good, valid and just as importantly, from the Retail channel.

So my existing and new keys are from the same licensing channel and the new key checks out as being valid so what is the problem? Well it turns out, PowerShell was the problem.

Typically I launch a PowerShell prompt and then I enter cmd.exe if I need to run a something which explicitly requires a Command Prompt. This makes it easy for me to jump back and forth between PowerShell and Command Prompt within a single window hence the reason for doing it. I decided to try it differently so I opened a new Administrative Command Prompt standalone, without using PowerShell as my entry point and the key was accepted and everything worked as planned.

The lesson here is this: If you are entering a command into a PowerShell prompt and it’s not working, try it natively within a Command Prompt as that just maybe is your problem.

Tesco Hudl 2 Date and Time Repeatedly Incorrect

Since about a week or so ago, the kids Tesco Hudl 2 tablets that they got for Christmas have been consistently reporting the wrong date and time. The issue is easily spotted because anytime they launch an app or open the Google Play Store or perform any action that depends on an SSL certificate, they are shown a certificate warning due to the inconsistency between the server date and time and the client date and time. Sometimes the tablet can appear just a few hours out of sync but in the main, it seems that the devices reset their date to January 1st 2015.

Yesterday, I noticed for the first time that my Hudl 2 tablet started exhibiting the same behaviour which led me to look online to see if this is a widespread issue as I couldn’t believe that all four of our Hudl 2 tablets could show the same symptoms and problems within two weeks’ of each other, especially considering I bought my tablet about a month after we bought the kids theirs so they would likely be from different batches of manufacturing.

Searching online, I came across a thread on the Modaco forums at http://www.modaco.com/topic/373796-misreported-time-and-other-things/ where other users are reporting the same issue and that it only seems to manifest after circa one month of using the device: an interesting observation given that I first powered up the kids tablets the week before Christmas to configure them and I got mine the week after Christmas.

Several users have tried contacting Tesco Technical Support and are advised to hard reset the devices or to exchange them in a local store but the issue continues to return and it appears from one commenter that Tesco is now working on a firmware update to address the issue. To me, this says that the current firmware build clearly has an issue relating to the CPU clock and tracking the time in relation to the CPU clock.

I reached out to Tesco on Twitter today to try and find out if it’s possible to contact their support via email or Twitter as opposed to phone as I don’t want to have to call them to add four new serial numbers to the list of effected devices that they are tracking. If I get a response, I’ll update the post here but in the meantime, if you have a Hudl 2 from Tesco and are experiencing the same date and time reset issue, it’s not you, it appears to be a known problem they are working on but please do report it to Tesco.

The more people that report the issue, the faster Tesco are likely to work on the firmware update and get it released.

 

Project Home Lab: Planning for Recovery

In my last post, Server Surgery Aftermath, I talked about the issues I was having with my home server. Whilst continuing to try and identify the issues after the post, I ran across some more BSODs and I managed to collect useful crash dumps for a number of them. Reviewing the crash dumps with WinDbg from the Windows Debugging Tools, I was able to see that in every instance of the BSOD, the faulting module was network related with the blame shared equally between Ndis.sys and NdisImPlatform.sys which means that my previous suspicion of the LSI MegaRAID controller were out of the window.

Included in the trace was the name of another application which is running on the server. I’m not going to name the application in this instance but let’s just say that said application is able to burst ingress traffic as fast as my internet connection can handle it. I decided to intentionally try and make the server crash by starting up the application and generating traffic with it and sure enough within a couple of minutes the server experienced a BSOD and restarted. This started to now make sense because the Windows Service for this application is configured for Automatic Delayed start which is why in one instance after a BSOD, the server had another BSOD about 45 seconds later.

For the interim, I have disabled the services for this application and with the information in hand, I started looking more closely into the networking arrangements. I knew that as part of the server relocation, I had switched from my dual port PCIe Intel PRO 1000/PT adapter to the on-board Intel 82576 adapters and both of these adapter ports are configured in a single Windows Server native LBFO team using the Static Team mode which is connected to a Static LAG on my switch.

To keep this story reasonably short, it turns out that the Windows Update provided network driver for my Intel adapters is quite old but yet the driver set 19.5 that Intel advertise as being the latest available for my adapters doesn’t support Windows Server 2012 R2 but will only install on Windows Server 2012. Even booting the server into the Disable Driver Enforcement mode didn’t allow the drivers to install. I quickly found that many other people have had similar issues with Intel drivers due to them blocking drivers on selected operating systems for no good reason.

I found a post at http://foxdeploy.com/2013/09/12/hacking-an-intel-network-card-to-work-on-server-2012-r2/ which really helped me understand the Intel driver and how to hack it to remove the Windows Server 2012 R2 restrictions to allow it to be installed. The changes I had to make differed slightly due to me having a different adapter model but the process remained the same.

Because my home server is considered production in my house, I can’t just go right ahead and test things on it like hacked drivers so luckily, my single hardware architecture vision came out on top because I’ve installed the hacked and updated Intel driver on the Lab Storage Server and the Hyper-V server with no ill effects. I’ve even tested putting load between the two of them over the network and there has been no issues either so this weekend I will be taking the home servers’ life in my hands and replacing the drivers and hopefully that will be the fix.

If you want to read my full story behind the Intel issue troubleshooting, there is a thread I started on the Intel Communities (with no replies I may add) but all the background detail is there at https://communities.intel.com/thread/58921?sr=stream..

Project Home Lab: Server Surgery Aftermath

So it seems that in my last post about relocating the Home Server into the new chassis was spoken a little too soon. Over New Year, a few days after the post, I started to have some problems with the machine.

It first happened when I removed the 3TB drive from the chassis to replace it with the new 5TB drive which caused a Storage Spaces rebuild and all of the drives started to chatter away copying blocks around and about half-way through the rebuild, the server stopped responding to pings. I jumped on to the IPMI remote console expecting to see that I was using so much I/O on the rebuild that it had decided to stop responding on the network but in actual fact, the screen was blank and there was nothing there. I tried to start a graceful shutdown using the IMPI but that failed to I had to reset the server.

When Windows came back up, it greeted me with the unexpected shutdown error. I checked Storage Spaces and the rebuild had resumed with no lost data or drives and eventually (there’s a lot of data there) it completed and I thought nothing more of it all until New Years day when the same thing happened again. This time, after restarting the server and realising this was no one off event, I changed the Startup and Recovery settings in Windows to generate a Small Memory Dump (256KB) otherwise known as a Minidump and I also disabled the automatic restart option as I wanted to try and get a chance to see the Blue Screen of Death (BSOD) if there was one.

Nothing happened on this front until yesterday. The server hung again and I restarted it but within two minutes of hanging, it did the same thing again. I decided to leave the server off for about five minutes to give it a little rest and then power it back up and since then I’ve had no issues but I have gathered a lot of data and information in the time wisely.

I used WinDbg from the Windows Debugging Tools in the Windows SDK to read the Minidump file and the resultant fault code was WHEA Uncorrectable Error with a code of 0x124. To my sadness, this appears to be one of the most vague error messages in the whole of Windows. This code means that a hardware fault occurred which Windows could not recover from but because the CPU is the last device to be seen before the crash, it looks as if the fault is coming from the CPU. The stack trace includes four arguments for the fault code and the first argument is meant to contain the ID of the device which was seen by the CPU to have faulted but you guessed it, it doesn’t.

So knowing that I’ve got something wierd going on with my hardware, I considered the possibilities. The machine is using a new motherboard so I don’t suspect that initially. It’s using a used processor and memory from eBay which are suspects one and two and it’s using the LSI MegaRAID controller from my existing build. The controller is a suspect due to the fact that on each occasion the crash has occurred, there has been a relative amount of disk I/O taking place (Storage Spaces rebuild the first time and multiple Plex Media Server streams taking place on the other occasions).

The Basic Tests

First and foremost, I checked against Windows Update and all of my patches are in order which I already knew but wanted to verify. Next, I checked my drivers as a faulting driver could cause something bad to get to the hardware and generate the fault. All of the components in the system are using WHQL signed drivers from Microsoft which have come down through Windows Update except for the RAID Controller. I checked the LSI website and there was a newer version of the LSI MegaRAID driver for my 9280-16i4e card available as well as a new version of the MegaRAID Storage Manager application so I applied both of these requiring a restart.

I checked the Intel website for drivers for both the Intel 82576 network adapters in the server and the Intel 5500 chipset and even though the version number of the Intel drivers is higher than those from Windows Update, the release date on the Windows Update drivers is later so upon trying to install them, Windows insists that the drivers installed are the best available so I’ll leave these be and won’t try to force drivers into the system.

Next up, I released that the on-board Supermicro SMC2008 SAS controller (an OEM embedded version of the LSI SAS2008 IR chipset) was enabled. I’m not using this controller and don’t have any SAS channels connected to it so I’ve disabled the device in Windows to stop it from loading for now but eventually I will open the chassis and change the pin jumper to physically disable the controller.

Earlier, I mentioned that I consider the LSI controller to be a suspect. The reason for this is not reliability of any kind as the controller is amazing and frankly beyond what I need for simple RAID0 and RAID1 virtual drives but because it is a very powerful card, it requires a lot of cooling. LSI recommend a minimum of 200 Cubic Feet per Minute (CFM) of cooling on this card and with the new chassis I have, the fans are variably controlled by the CPU temperature. Because I have the L5630 low power CPU with four cores, the CPU is not busy in the slightest on this server and as a result, the fan speed stays low.

According to the IPMI sensors, the fan speed is at 900 RPM constant with the currently system and CPU temperatures. The RAID controller is intentionally installed in the furthest possible PCI Express 8x slot from the CPU to ensure that heat is not bled from one device into the other but a byproduct of this is that the heat on the controller is likely not causing a fan speed increase. Using the BIOS, I have changed the fan configuration from the default setting of most efficient which has a minimum speed of 900 RPM to the Better Cooling option which increases the lower limit to 1350 RPM.

Lastly, I raised a support incident with LSI to confirm if there is a way to monitor the processor temperature on the controller however they have informed me that only the more modern dual core architecture controllers have the ability to see the processor temperature either via the WebBIOS or via the MSM application. If I have more problems going forwards, I have a USB temperature probe which I could temporarily install in the chassis but this isn’t going to be wholely accurate however in the meantime, the support engineer at LSI has taken an LSIGET dump of all of the controller and system data and is going to report back to me if there are any problems he can see.

The Burn Tests

Because I don’t want reliability problems on-going, I want to drive the server to crash under my own schedule and see the problems happening in live so that I can try and resolve them, I decided to perform some burn tests.

Memory Testing

Memory corruption and issues with memory is a common cause of BSODs in any system. I am using ECC buffered DIMMs can can correct memory bit errors automatically but that doesn’t mean we want them still so I decided to do a run on Memtest86.

Memtest86 Memory Speed Screenshot

I left this running for longer than the screenshot shows, but as you can see, there are no reported errors in Memtest86 so the memory looks clear of blame. What I really like about these results is that it shows you have incredibly fast the L1 and L2 caches are on the processor and I’m even quite impressed with how fast the DDR3-10600R memory in the DIMMs themselves are.

CPU Testing

For this test, I used a combination of Prim95 and Real Temp to both make the CPU hurt and also to allow me to monitor the temperatures vs. the Max TDP of the processor. I left the test running for over an hour, 100% usage on all four physical cores and here’s the results.

RealTemp CPU Temperature

 

As you can see, the highest the temperature got was 63 degrees Celsius which is 9 degrees short of the Max TDP of the processor. When I log in to the server normally when there are multiple Plex Media Server transcoding sessions occurring the CPU isn’t as utilized as heavily as this test so the fact that it can run at full load and the cooling is sufficient to keep it below Max TDP makes me happy. As a result of the CPU workload, the fan speed was automatically raised by the chassis. Here’s a screenshot of the IPMI Sensor output for both the system temperatures and the fan speed, remembering that the normal speed is 1350 RPM after my change.

IPMI Fan Speed Sensors

IPMI Temperature Sensors

 

To my suprise, the system temperature is actually lower under the load than it is at idle. The increased airflow from the fans at the higher RPM is pushing so much air that it’s able to cool the system to two degrees below the normal idle temperature, neither of which are high by any stretch of the imagination.

Storage I/O Testing

With all of the tests thus far causing me no concern, I was worried about this one. I used ioMeter to test the storage and because ioMeter will fill a volume with a working file to undertake the tests, I created a temporary Storage Space in the drive pool of 10GB and I configured the drive with Simple resiliency level and 9 columns so that it will use all the disks in the pool to generate as much heat in the drives and on the controller as possible.

I ran three tests, 4K block 50% Read, 64K block 50% Read and lastly 256KB block 50% Read. I ran the test for minutes and visiting the garage to look at the server in the rack while this was happening, I was greeted to an interesting light show on the drive access indicator lights. After ten minutes of the sustained I/O, nothing bad happened so I decided to stop the test. Whilst I want to fix any issues, I don’t want to burn out any drives in the process.

Conclusion

At the moment, I’m really none the wiser as to the actual cause of the problem but I am still in part convinced that it is related to the RAID controller overheating. The increased baseline fan speed should hopefully help with this by increasing the CFM of airflow in the chassis to cool the controller. I’m going to be leaving the server be now until I hear from LSI with the results from their data collection. If LSI come up with something useful then great. If LSI aren’t able to come up with anything then I will probably run another set of ioMeter tests but let it run for a longer period to really try and saturate some heat into the controller.

With any luck, I won’t see the problems again and if I do, at least I’m prepared to capture the dump files and understand what’s happening.