Error when replacing node in a DirectAccess NLB cluster


Most companies running Microsoft DirectAccess for their remote access needs rely on it enough that they have taken the steps of creating a DirectAccess cluster with two or more DirectAccess servers. While DA clustering does not actually have anything to do with Windows Server clustering, it creates a fully active/active network load balanced (NLB) array of servers that gives you resiliency in the event of a single DirectAccess server failure.

By using the in-built NLB feature in the Remote Access Management Console, you have the ability to run up to 8 simultaneous nodes in this active/active array, and adding or removing servers from that array is a simple matter of running through a quick wizard. As long as all of your DA servers are online and running, it is very easy to add or remove nodes from this array.

However, in the event that one of your DirectAccess servers does go completely offline and is unavailable on the network, the Add/Remove Servers wizard gets a little buggy. Specifically, if you have a server that has gone down (but DA is still running fine because your remaining node picks up the slack) – the fix should be as simple as prepping a new replacement server, giving it a new hostname on the domain, and then using the “Add Server” function to bring this new node into the array. When you attempt to add the new server to the array however, you receive an error message that states “At least one available server must remain in the cluster. A connection cannot be established to any server that is not selected for removal.

This error message presents itself and does not allow you to continue with adding the new server node to your array. This error is essentially the direct result of the DA array still containing reference to the “dead” server. As long as your server node that is offline continues to be shown inside the Remote Access Management Console, you will continue to receive this error no matter how many times you try adding the new server. So the answer must be removing the dead server before I add the new server, right? Yes that is correct, except it’s not as easy as it should be. When you run through the same Add/Remove Servers wizard and try to remove the bad server from the array, you get the exact same error message!

It seems that we are stuck, but removing the reference to the dead DA server is actually quite easy. You just have to do it directly from inside Group Policy Management. Open up gpmc.msc and browse to the GPO that contains your DirectAccess Server settings. You will see inside the Security Filtering section of the GPO that both DirectAccess server names are listed here.

Simply select the DirectAccess server that is offline (for me it was DAEXT-EAST2), and Remove it from the GPO Security Filtering. Make sure to leave the good DirectAccess server in this list!

Now give your remaining DirectAccess server a few minutes to update Group Policy, or launch a prompt and run gpupdate /force, and when you close and re-open the Remote Access Management Console you will see that the dead DirectAccess server has now been removed from the console.

Now re-run the Add/Remove Servers wizard for the Network Load Balancing, and this time when you specify your new DirectAccess server it will be able to successfully add that new server to the array, without encountering error messages.

 

Jordan Krause
IVO Networks