Tuesday, February 16, 2010

Exchange 2010 Site Disaster Recovery on a dime! Part1: Building the solution

Since Microsoft has made significant improvements to how Exchange handles disaster recovery of databases, many organizations have started to wonder how they can effectively prevent site, datacenter and other such disasters from occurring. But not every company has the budget to implement a new infrastructure, so how can such companies still take advantage of these new techonolgies? The answer is in this article -- I will explain how this can be accomplished with only two Exchange 2010 servers. In Part 1 we will discuss how to build the solution; then in Part 2 we will move on to discover how to activate the disaster recovery site.

Please note that this solution does not give you High Availability, but it will provide you with a solution for site and server disaster.

This solution builds and depends upon the Exchange 2010 feature called Database Availability Group (DAG). DAG is the new High Availability feature of Exchange 2010 that is the evolution of the Exchange 2007 CCR, LCS and SCR replication technology. A DAG can be built with as little as 2 Exchange server mailbox roles, and with as many as 16, making this a very flexible solution. The beauty of the Exchange 2010 DAG feature is that can also contain other Exchange server roles such as CAS and HUB, which is an attractive option for smaller organizations. To demonstrate the scalability of the DAG feature, I will use only two servers in my example – one in the production site and one in the Disaster Recovery site. This represents the smallest installation that can be done for DAG, but remember this is a flexible solution so at any point if you need to scale out with multiple DAG members the steps you would perform are nearly identical.

Building the solution.
In both the production site and the Disaster Recovery site we need a server with Windows Enterprise edition since DAG relies on Microsoft Failover Clustering which is only available in the Enterprise edition. (Remember that Exchange comes in either Standard or Enterprise edition. The Standard edition can be used with up to five databases, but if you need more than five then it is necessary to utilize the Enterprise edition of Exchange.) Both sites also need Domain Controllers and Global Catalog Servers. The DR (Disaster Recovery) site is most likely a different site in Active Directory to prevent users from accessing it.

Installing Exchange.
To install Exchange, you simply perform a standard Exchange installation in both sites. When you are finished you will have one Exchange server in the production site and one Exchange server in the DR site. Both servers can have all standard roles (i.e. Mailbox, HUB and CAS), but you can also install them on separate servers and have multiple roles on multiple servers.

To test that everything is functioning properly, I recommend creating a mailbox on each database that is mounted on each server, and then sending a test email from one mailbox to the other. Our configuration thus far is very basic since no clusters or DAGs have been built yet. At this point, our example consists of two Exchange servers located in different Active Directory sites.

Since DAG is one of the hottest new features in Exchange 2010, many articles have been written on the subject. Hence, I will walk you through the steps of creating a DAG fairly quickly.

Creating a DAG.
In the Exchange Management Console, under the Organization Configuration, Mailbox and the ‘Database Availability Groups’ tab, right click and select ‘New Database Availability Group.’

clip_image001
The Create a DAG wizard starts.


clip_image002
Next, enter a name for your DAG. If you have a server with a HUB role but no mailbox role, then the wizard will select the HUB server and create the witness directory for you. If you don’t have an available HUB server, then you must manually specify the ‘Witness Server’ and a ‘Witness Directory.’

At this stage I need to caution you that a permission issue might occur when creating the File Share Witness directory. This is because it’s not the logged on users security context that is utilized when creating the File Share Witness directory, but rather the Exchange server computer account. The solution is to add the ‘Exchange Trusted subsystem’ group to the witness server local administrators group. This is also necessary becasue in order to create a DAG you must also create a computer account in Active Directory. Thus, you might need to delegate ‘Exchange Trusted subsystem’ group to create and manage the computer account in Active Directory, or at least in a pre-populated disabled computer account.

Exchange Management Shell or Wizard?
If you prefer Exchange Management Shell over the Wizard, below is the command you need to create a DAG:

New-DatabaseAvailabilityGroup -Name DAG1 -WitnessDirectory C:\DAG1 -WitnessServer FQDNofaServerinPrimarySite -DatabaseAvailabilityGroupIpAddresses 192.168.15.233,192.168.25.233 -Verbose

The Exchange Management Shell is a better approach than the Wizard when you consider the following: with the Wizard you cannot set a fixed IP on your DAG. Instead, it will use DHCP to assign an IP. This is important to consider since it is recommended that you have an IP in every subnet that contains DAG members. The reasoning behind this is that when DAG moves to a different IP subnet, it needs to have a valid IP address on that IP subnet.

Adding the parameter Verbose will allow you to receive clues in case something goes wrong as the command runs and pulls more information for you.

Why is having fixed IP for your DAG preferable to using DHCP?
Remember that a DAG is actually a failover cluster, and in order for the cluster to function IP must be up and running. Since not every company uses DHCP on the server subnets (some only use it on client subnets), it is often more convenient to have fixed IP.

The next step is to add your Exchange mailbox servers to your DAG.

clip_image003

Click ‘Manage Database Availability Group Membership’ and then add the mailbox server to it.
If everything works out accordingly, then the Failover Cluster role will be installed on the servers you added to your DAG. You can start the Failover Cluster Management tool and see that there is a cluster called DAG1 that contains your two mailbox servers. The computer account should also be enabled, and the witness directory should be shared and also populated with a couple of files.

Below is the Exchange Management Shell comand that you must run one time for mailbox server that you add:

Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer FQDNofMailboxServer –Verbose

Remember to allow AD replication between each step, otherwise you may not be able to join servers to your DAG.

You should also see that a DAGNetwork has been created, and if you have multiple networks on your mailbox servers then there should be multiple DAGnetworks. Even though you should run DAG on a single network, it is oftentimes better to have mutiple NIC and networks in your server because it gives you the ability to separate MAPI, Cluster and replication traffic into different networks.

The next step is to add databases to your DAG members in order to enable replication. Up to this point, each server had only one database mounted but now we would like to add more to it.

clip_image004
Click the ’Add Mailbox Database Copy’

Next, select which servers you want to hold a copy of the mailbox database and the ActivationPreference.

Below is the Exchange Management Shell command:

Add-MailboxDatabaseCopy -Identity 'Mailbox Database 2036433681' -MailboxServer FQDNofServerInDRSite -ActivationPreference 2

This step can potentially take a long time since the database is seeded to the DR (Disaster Recovery) site; how long it takes depends on the database size and available bandwidth.

Now we must set some parameters on the mailbox database so that it is not automatically activated.

From Exchange Management Shell (EMS) run the following command:

Suspend-MailboxDatabaseCopy -Identity 'Mailbox Database 2036433681\FQDNofServerInDRSite' -ActivationOnly –Verbose

This will ensure that replication is still happening automatically while ensuring activation will not.

Next, run every mailbox database to both your servers with the ActivationPreference set to 1 on the server in the production site; then, set the database copy on the server in the Disaster Recovery site to ‘suspended’ for activation.

Configuring Replay Lag Time
Configuring Replay Lag time is something that you should seriously consider doing. Lag time is how long the passive copy will wait until the transaction log is replayed into the database. Replication is still happening as fast as possible.

Below is the EMS command:
Set-MailboxDatabaseCopy -Identity 'mailbox database 1976375852\FQDNofServerInDRSite' -ReplayLagTime 0.1:0:0 –Verbose

(Please note: 0.1:0:0 means 1 hour. In real life you should most likely set this to a higher value.)

There is also another paratemeter that you might want to use--the Truncation Lag Time.

Below is the EMS command:
Set-MailboxDatabaseCopy -Identity 'mailbox database 1976375852\FQDNofServerInDRSite' -TruncationLagTime 0.2:0:0

(Please note: 0.2:0:0 means 2 hours. In real life you should probably set this to another value.)

How long you set the ReplayLagTime and TruncationLogTime for depends on two things: 1) How long it takes you to notice a corruption on the production site, and 2) How long it takes to replay all transaction log files if you activate the DR site. For instance, if you know you can detect a corruption in the active datacenter within 10 hours, then you should probably set the ReplayLagTime to 12 hours or so to allow for recovery of all non-corrupted data. Also consider the amount of disk space you have when setting the ReplayLagTime.

More information about Managing Mailbox Database Copies can be found on Technet: http://technet.microsoft.com/en-us/library/dd335158.aspx

For more information on creating a DAG, click here: http://msexchangeteam.com/archive/2009/06/14/451609.aspx

Creating the CASArray.
Now your DAG and databases should be all ready to go! Remember to monitor the replication with Get-MailboxDatabaseCopyStatus –Server FQDNofServer

CopyQueueLength and ReplicationQueueLength should show small numbers if possible, preferably zero or one, but in real life you would see higher values depending on your bandwith, serverload, etc.

Why do you need a ClientAccessArray?
Technically, this is not needed but rather highly recommended because it’s easier to manage a system that has one, and since it’s only a name that you can move to another IP, you can also move your client connection point.

Move client connection point?!
Yes, the Outlook MAPI connection is moved from the Information Store on the mailbox server to the CAS (and the CASArray name if you have one defined.)

New-ClientAccessArray -Name CASArray-HQ -Fqdn FQDNofYourDesiredEndpoint -Site ADsiteInPrimaryDatacenter

For more information on the New-ClientAccessArray, click here: http://technet.microsoft.com/en-us/library/dd351149.aspx

Now configure all your databases to have the CASArray-HQ object as the RPCClientAccessServer. This will ensure that Outlook conencts to CASArray FQDN instead of the actual server name.

Get-MailboxDatabase Set-MailboxDatabase -RpcClientAccessServer CASArray-HQ

You must also create a record in DNS with FQDNofYourDesiredEndpoint with an IP of your Exchange server in the primary datacenter. Set the TTL to a low value, such as 5 minutes, to make the switchover go faster to the Disaster Recover site.

When Outlook connects, it will now connect to the ‘FQDNofYourDesiredEndpoint’ name. Also, if you look at the MAPI settings, Outlook thinks that the FQDNofYourDesiredEndpoint is the Exchange mailbox server.

Configuring Autodiscover
For Outlook to connect properly you must make sure to configure Autodiscover correctly.

At this point you should have two servers with the Mailbox, HUB, and CAS roles on each one, a DAG with the two servers (one in each AD site), and a CASArray located on the server in the primary AD site.

Failovers will not occur automatically because of the configurations we did on the mailbox databases. Thus, if you reboot the primary server then clients will lose connection to their mail.

I hope you have enjoyed this tutorial on Exchange Server 2010 Disaster Site, and that you were able to follow my instructions and begin preparing your organization for the worst-case scenario: site or server disaster. Now that you know how to build the solution, in Part 2 of this piece we will move on to discussing how to activate the disaster recovery site, at which point I will explain how to backup, test and perform a switchover should your Exchange server fail.