I recently discovered what I think we can officially classify as a “bug” in the Remote Access Management Console, that you may encounter if/when you ever need to rebuild one of your DirectAccess servers. Let’s lay out an example based on the scenario I found myself in:
Say you are running a two-node NLB array of DirectAccess servers. This is a very common scenario, so many of you reading this will be in this boat. We’ll call the servers DA1 and DA2. Now pretend that DA2 has died. Fortunately you are running NLB so you may not even notice that this server has gone offline for a number of days, as all of your connections continue flowing to DA1. But, you eventually realize that DA2 is offline, and that guy is totally dead, you end up needing to replace it with another box, or at minimum (this was my case), you are reimaging or restoring the server back to the point where Windows was not yet joined to the domain. For sake of simplicity in the example, we’ll say you have pulled out the old DA2 and are replacing it with a new server.
When you bring the new server online, you want to make the transition as seamless as possible, and you really liked the DA2 name, so you name this new server also DA2, and you join it to the domain. You notice that your new DA2 does not get the DirectAccess settings, nor is it responding inside the Remote Access Management Console, but that is to be expected, because you know that Active Directory references computer objects with a SID, and so you assume it has changed with the new domain joining. This prompts you to walk through the Load Balancing wizard inside the Remote Access console to “Add/Remove Servers” and you remove the old DA2, which clears it out of the console. Then you walk through that wizard again and you re-add DA2 to the array. All fixed, right? Not so fast…
When you join the new DA2 back into the NLB array, the wizard does successfully add the DA2 computer object into the Security Filtering section of the DirectAccess Servers GPO, this is appropriate behavior and this means that your new DA2 is now getting the DirectAccess settings from the GPO. In fact, if you look closely into your IPsec tunnels, you will see that you now even have users connecting successfully to your new DA2! So effectively, DirectAccess is 100% live and working on DA2. However…inside the Remote Access Management Console you are not able to communicate with DA2. Under the Dashboard and under Operations Status, you see a big fat question mark, where you would normally see information about the health status of DA2. No matter how long you wait or how much you fiddle with the console, it continues to show a question mark for DA2, even though technically DA2 is working just fine.
This issue has to do with SIDs, and though I searched high and low inside the configuration files and XMLs that the Remote Access console uses, I couldn’t find the DA2’s old SID referenced anywhere. It seemed like everything had cut over to the new SID for the new DA2, and yet the console was thoroughly confused.
Resolution: I wish I had a quick, easy fix that you could run on the Remote Access console in order to correct this behavior, but what we ended up doing that ultimately fixed it was to reimage DA2 again, bringing it back to pre-domain-join status, and then giving it a brand new hostname (DA3) that the console had never encountered before. Prior to this drastic change we had tried unsuccessfully reimaging DA2 numerous times, but every time we added it back into the console DA would work, but the console would never talk to it. We could have left it running like this and DirectAccess would have been working, but it screws up reporting and down the road when the time came to replace their IP-HTTPS SSL cert, this would certainly cause problems as you generally use the Remote Access console in order to specify the new SSL cert on the DA servers. That process would have failed.
In the event one of your DirectAccess servers ever dies, when you bring a new DA server online make sure to give it a brand new hostname that the Remote Access Management Console has never seen before. Do this, and you’ll never experience this strange bug/behavior.