Sunday, July 25, 2010

Error in Exchange Management Console when removing Domain Controllers

When rearranging and upgrading to never OS it’s very common to install a new DC and then remove the old one instead of doing in place upgrade. This is all fine Active Directory wise. Exchange Server itself will dynamically detect new and removed Domain Controllers and Global Catalog servers and adjust accordingly but Exchange Management Console is not dynamic in the same way.

Symptoms is when you start or navigate in EMC you get error messages indicating error in LDAP query or missing Domain Controllers.clip_image002

You might also see error messages in the Application log indicating unavailable LDAP or DC/GC.

Why is this happening?
EMC caches Domain Controller names in the MMC temporary files and unfortunately it will not be clever enough to query another DC or GC when the current one is not responding or has been removed.

How to solve the problem?
Solution is quite easy: clear the MMC cache.

Close all open MMC. Start a new empty MMC, in the menu, select File/Options and then on the Options Window and Disk Cleanup tab, press the “Delete Files” button. Close the empty MMC.

Next time you start EMC it has no cache to read from and will dynamically select a new DC.

There is another way of doing this. Close all open MMC and then delete the file “C:\Users\%username%\AppData\Roaming\Microsoft\MMC\Exchange Management Console”.

As you can see this cache is per user which unless the administrator creates a process of clearing the cache when needed, each user that runs EMC must clear the cache itself.

If you’re interested, “C:\Users\%username%\AppData\Roaming\Microsoft\MMC\Exchange Management Console” file is in XML format and can be viewed if you like, but I wouldn’t advice you to make any changes to it.

Monday, June 21, 2010

Exchange 2007 Service Pack 3 is out the door

Microsoft has now released Exchange Server 2007 Service Pack 3.
With SP3 it is now possible to run Exchange 2007 on Windows Server 2008 R2.

Download Exchange 2007 SP3 from here
The link with what’s new in Exchange seems to be incorrect. but here it is anyway http://go.microsoft.com/fwlink/?LinkId=154404

Don’t run of and upgrade your OS to Windows Server 2008 R2 because of this. It’s most likely not supported when Exchange is installed.

Happy patching

Friday, June 18, 2010

Exchange 2010 Update Rollup 4 is released

Read the complete list of fixes here Exchange 2010 UR4 – KB982639

Worth mentioning is that if you’re using Microsoft update and Exchange DAG, updates will not be automatically detected. DAG/Cluster servers should be patched manually.

Installation package can be downloaded from download link for Exchange 2010 UR4

Tuesday, May 11, 2010

Exchange 2010 Site Disaster Recovery on a Dime! Part 3: Backup, Restore, Recovery

I began this series by explaining how to build a low cost site or datacenter disaster recovery solution with the new Database Availability Group (DAG) feature in Exchange 2010. Next, I covered the process of failing over to your other site in case of a disaster. Naturally, I hope you never experience a disaster in your Exchange environment, but if you do this article will ensure you are prepared.

In Part 3, I will walk you through the steps of performing a Backup, Restore, and finally a Recovery. While it is important to know how to do a proper backup, it is equally if not more important to be able to use it in case disaster strikes.

The Backup Process
Performing a backup of Exchange 2010 databases is not that difficult; just make sure that your backup software uses VSS technology since the traditional streaming backup API is not available in Exchange 2010.

The built-in Windows Server Backup has this capability, but it lacks many other functions that a real backup solution has. Therefore, I tend to rank Windows Server Backup as a poor man’s backup software. See my earlier post about Windows Server Backup and Exchange 2007 for more details. While the article is written for Exchange 2007, it is also applicable to Exchange 2010.

Other vendors are working on their backup software to ensure its compatibility with Exchange 2010 and some already have it working. An example is Microsoft’s Data Protection Manager 2010.

No matter which backup software you use, the steps for doing a backup are essentially the same. Backup software communicates with VSS, which in turn communicates with the Exchange Write that is installed during the Exchange installation. During this process, only changed blocks on disk will be transferred to the backup software, which is responsible for saving and storing data for later retrieval. By transferring only changed blocks on disk the backup time is decreased and so is the number of bytes on the wire.

In the example we have been using for this series, we have one server running in the primary datacenter and another server in the Disaster & Recovery (DR) datacenter.
The question arises: ‘Where do I do the backup -- on one or both servers?’ The answer is ‘It depends.’ (Don’t you love this answer!)

Your options are:
1. Do the Exchange database backup on one server and Exchange database copy of the other servers
2. OR backup only one server, but which one?

For the Exchange Admin who has been around Exchange awhile, the question about purging transaction log files always comes up. The beauty of the DAG design in Exchange 2010 can be seen during this process: when doing backup of a database in a database availability group (DAG), it will automatically purge the corresponding transaction log files on all replicas of that database. The server running the database where the backup is performed communicates to the other servers having a database replicated, telling them that a backup has been done and that it is now time to purge transaction log files. Which files to purge depends on several factors, such as checkpoint, replay lag time and truncation lag time. Thus, you should not expect them all to be purged with a normal full backup. With this in mind, make sure to size your transaction log LUN correctly if using replay lag time and truncation log time.

The Restore Process
The process of restoring a database located in a DAG is pretty much the same as doing it on a mailbox database that is not a member of DAG replication. The decision you must make is whether to use the lagged copy of the database or to perform a traditional restore.

How can we take advantage of the lagged database copy?
Lagged database copies can be used for recovering a logical corruption in a mailbox or mailbox database, or recovering individual mails or folders within a mailbox. The recovery process is simple, but you must consider the replaylagtime settings carefully so that you can discover a problem in time to use the lagged database copy before transaction log files are replayed into the lagged database.

Components needed for the recovery include a ‘recovery mailbox database’. The first step is to create a recovery mailbox database:

New-MailboxDatabase -Name RecoveryDB -Verbose -Recovery –EdbFilePath E:\Recovery\RecoveryDB.edb –LogFolderPath E:\Recovery -Server FQDNofServerInRecoverySite

This will create a recovery mailbox database with paths set to E:\Recovery

Next step is to get a file copy of the mailbox database you want to extract data from into the E:\Recovery folder. You could use a regular restore from your backup, but it’s often faster to make a copy of the lagged database. You may use the amount of transaction log files that suits your purpose.

Before doing a file copy, it is best to pause the replication with the Suspend-MailboxDatabaseCopy command:

Suspend-MailboxDatabaseCopy 'database name\FQDNofServerInDRSite'-SuspendComment "Recover data from database" -Verbose

We use VSS to do a shadow copy of the database we want to extract data from. The Syntax for vssadmin.exe command line tool is “vssadmin create shadow /For=”

As you can see, you can only do a VSS shadow copy for full volume, meaning the volume is either a disk such as D:\ or a mountpoint. You probably have database files and transaction log files on separate disks, so you must create shadow copies for both disks.

vssadmin create shadow /For=D:
vssadmin create shadow /For=G:

Pay attention to the result you get:
Shadow Copy Volume Name:
\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2

In previous versions of Windows, you could simply do a copy from the strange path above to your recovery folder, but this seems to be have either broken or been taken away in Windows Server 2008 R2. This is how it used to look:

copy “\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\path_to_your_edb_file” E:\RecoveryDB

But I discovered another way of doing it, with explorerer.exe:
Right click on your C drive, select Properties and Previous Version Tab. Here you should see the newly created shadow copy. Select it and click open. A new window opens in which you can drill down to wherever your Exchange database files are located and simply do a file copy of the edb file and corresponding transaction log files to the E:\Recovery folder. Which transaction log files you need to copy depends on how far forward you want to replay information into the database. Simply check the file stamps on the files. Warming: In real life, this file copy will take a long time!

The Recovery Process

Now we can do a recovery of the database in E:\Recovery folder. Start by deleting the checkpoint file “xxx.chk”. Next use eseutil from an elevated command prompt in the E:\Recovery folder:

eseutil /r xxx /a

Where xxx is the transaction log file prefix, such as E00. Output will look something similar to this:

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 14.00
Copyright (C) Microsoft Corporation. All Rights Reserved.
Initiating RECOVERY mode...
Logfile base name: e00
Log files:
System files:
Performing soft recovery...
Restore Status (% complete)
0 10 20 30 40 50 60 70 80 90 100
----------------------------------------
..............................

This could take a long time depending on how many transaction log files you have to roll into the database file. The speed of rolling transaction log files is approximately 2 log files per second.

Next step is to rename the edb file to recoverydb.edb since that was the name chosen when we created the recovery database.

If everything has gone well, we can simply mount our recoverydb:

Mount-MailboxDatabase RecoveryDB

To see what mailboxes there is in the RecoveryDB, use:

Get-MailboxStatistics -Database RecoveryDB

To extract data from the Recovery database, use the Restore-Mailbox command:

Restore-Mailbox -RecoveryDatabase RecoveryDB -TargetFolder Recovery -Identity 'target mailbox' -RecoveryMailbox 'mailbox to get data from' -BadItemLimit 999 –Verbose

You could use several more parameters with Restore-Mailbox such as -ExcludeFolders, -SenderKeyWords, -AttachmentFilenames, -ContentKeywords, -AllContentKeywords, and many more. See the documentation on TechNet for full syntax of Restore-Mailbox.

When restore-mailbox command is finished, you will see a folder structure inside the ‘target mailbox’ named ‘Recovery’ with the extracted data beneath.

Now it’s Time to Clean Up

Now that you have managed to extract data from the lagged copy, you must start cleaning up (‘But Mom!’) First delete the shadow copy; otherwise, it will eventually fill up the shadow storage disk space.

Start by listing your current shadow copies with:

vssadmin list shadows

Look for the shadow you made before. (For example, timestamp is good to use.) Then delete the shadow copy with:

vssadmin delete shadows /Shadow=ShadowId

Where shadowID looks like {XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}

When you have finished, use the Recovery database. Simply delete it with:

Remove-MailboxDatabase RecoveryDB

And delete files in the E:\RecoveryDB folder.

Don’t forget to un-pause the replication of transaction log files with Resume-MailboxDatabaseCopy:

Resume-MailboxDatabaseCopy -Identity 'Mailbox Database 2036433681\FQDNofServerInDRSite' –ReplicationOnly

-ReplicationOnly is there to stop ActiveManager from accidentally activating the database DR site.

What if you encounter a corrupted file and need to recover the complete server but also want to go back a few hours in time? (and your DeLorean is all out of plutonium…)
You can always use your regular backup, but you could also use the lagged copy. Using the lagged copy in this scenario is even simpler than described above. Suspend replication, delete checkpoint file and as many transaction log files as you need to “go back in time.” Then select the amount of time and use eseutil /r to replay the transaction log files left on disk.

Next step is to do a switchover to the recovery site and server. Please see part 2 of this series for more detail: Exchange 2010 Site Disaster Recovery on a dime! Part2: Navigating the Failover process

Another approach would be to use a dial-tone database together with a recovery database. I will save this discussion for a future article.

Tuesday, April 13, 2010

Exchange 2010 Site Disaster Recovery on a dime! Part2: Navigating the Failover process

In Part1 of this series I explained how to build a low cost site or datacenter disaster recovery solution using Microsoft Exchange’s new DAG feature. In this article, I will endeavor to explain what manual steps are required to failover to your other site in the event of a disaster.

First of all let’s discuss what types of problem can occur. There are a variety of problems that can happen ranging from simple disk failure to a tornado smashing the datacenter in the primary site. In this article, I would like to address how you would manually activate your backup exchange server if your primary server’s mother board or disk failed. Next, I will outline the steps to take if you experience the dreaded total site failure and then I will finally conclude with how to fail back to your primary site when everything returns to normal.

OK, so how do we recover from for example a motherboard failure?
If you find yourself in this situation, you can be sure that your primary Exchange server will be offline and not functional. The good news is that in this situation all your other core infrastructure will be up and working, including critical items like your domain controllers and DNS servers.

The first thing you will notice is that your Outlook clients will still try to connect to the original MAPI endpoint (RPC Client Access Service located on CAS). To quickly rectify this situation, simply just change the A record in DNS for the ClientAccessArray to the IP of CAS in the DR site. The Time To Live on this record should be a couple of minutes making the change to a new IP as fast as possible. Another thing you also should consider is the time it takes for DNS replication/updates to propagate throughout the network.

Next it will be time to get the databases up and running on your DR server.

First verify that all Exchange services are running on the DR server. If the services have been turned off this could cause other problems with transaction log replication.

The most simple step is to move all active databases from the primary site to be activated on the DR site. The following command should be run on a server in the DR site, most likely from the Exchange server.

First remove the activation block on mailboxes in the DR site

Resume-MailboxDatabaseCopy 'mailbox database name\FQDNofaServerinDRSite

Perform this step on every mailbox database you want to activate. There is a chance that databases will mount automatically when resuming mailboxdatabasescopies. You can verify status by running Get-MailboxDatabaseCopyStatus on Exchange server in DR site.

Get-MailboxDatabaseCopyStatus -server FQDNofaServerinDRSite | fl Name, Status, ActivationSuspended, ContentIndexState, Activecopy

If databases are mounted and the ActiveCopy is True, you are done with the activation and outlook should now be able to connect and start receiving and sending mail internally. Next reconfigure services and applications to make Exchange reachable from Internet with SMTP, Outlook anywhere, OWA, Active Sync etc. If you have ISA or other reverseproxy server, reconfigure it to the server in the DR site instead of the server in the primary site. Other services that might need to be reconfigured are autodiscover and InternalUrl in several IIS virtual directories.

If mailboxes don’t mount correctly, you can manually run the following command:

Move-ActiveMailboxDatabase –Server FQDNofaServerinPrimarySite –ActivateOnServer FQDNofaServerinDRSite

Depending how Windows and Exchange managed to handle the crash you might encounter some errors, making the activation a little more difficult. Things that might happen range from the index is not up to date on the DR server or all transaction log files have not been copied to the DR server. The solution is to specify some extra parameters on the Move-ActiveMailboxDatabase command.

For example, -SkipClientExperienceChecks is good to use when index is not up to date.

If you have not configured AutoDatabaseMountDial on the mailbox server, by default it is set to lossless and there is always a chance that replication have not copied all transaction log files to DR server, then you have to use the –MountDialOverride with a parameter such as BestAvailability or GoodAvailability.

Other parameters that might be needed are –SkipLagChecks or –SkipHealthChecks.
You might have to use several parameters together to get databases up and running.

Move-ActiveMailboxDatabase –Server FQDNofaServerinPrimarySite –ActivateOnServer FQDNofaServerinDRSite –MountDialOverride:BestAvailability –SkipLagChecks –SkipHealthChecks -SkipClientExperienceChecks

More information about Move-ActiveMailboxDatatabase is found on Technet. http://technet.microsoft.com/en-us/library/dd298068.aspx

When you have replaced the motherboard on Exchange server in the primary site and replication starts going from the DR site to primary site, you’re good and it’s time to plan the switchover to the primarysite. This is done with the same step as above. Plan the switchover to a time during off hours since the switchover will take a couple of minutes due to the necessary DNS updates, AD replication and time it takes to run the commands above.

Finally, you should run the Suspend-MailboxDatabaseCopy again to disable automatic activation of databases in DR site.

Suspend-MailboxDatabaseCopy -Identity 'Mailbox Database 2036433681\FQDNofServerInDRSite' -ActivationOnly –Verbose

This last step is needed because activation is reset when you do a switchover between servers. Be sure to remember to do this for every mailbox database on your servers.

If you can’t get things started on Exchange in the primary site due to problems with corrupt database or transaction log files, you might have to reseed files from the server in DR site. Use the Update-StorageGroupCopy and possibly with the –DeleteExistingFiles parameter.

Recover from a disk failure is pretty much the same as above but it only involve databases and transaction log files located on the faulty disk.
Another cool thing is that you can even test a database switchover in production. To do this, first create a database in the primary site and make a copy in the DR site the same way all the other databases were created. Next create a mailbox in the test database, logon and send some test messages back and forth. Activate the test database on the DR server, edit the hosts file with the FQDN of the CASarrayname and the IP of Exchange in DR site and start outlook again. You should now be able to connect with Outlook to the DR server and use outlook the normal way with disturbing any other users.

Recover from a disaster in the primary site.
This is more problematic scenario, but the steps are basically the same as above. The slightly more complex steps are caused by the fact that you don’t have any servers or network connectivity in the primary site and that your cluster will not have access to its quorum, and as a result it will be in a failed state.

How do you solve this problem?
First you need to make your cluster working.
In the DR site, stop the failover cluster service if started and the start it again with the forcequorum switch.

net start clussvc /forcequorum

The next step is to active all databases on the DR server. This is done in the Move-ActiveMailboxdatabase command the same way as before.

You may also have to manually mount the databases.

With a complete site failure in the production site you most likely need to live with the DR site for a while which calls for more actions than just getting your Exchange server up and running. You also need to get traffic to and from Internet flowing, both mailflow and user access to Exchange. Autodiscover is your friend to update configuration in outlook, so make sure you have configure all URL’s correct.

So in the whole there is a lot more to reconfigure than just Exchange to do a site failover.

http://technet.microsoft.com/en-us/library/dd351049.aspx

How do you fail back to your primary site after the disaster?

We have forced quorum on our cluster and if we restart the cluster service or reboot the server, the cluster service will fail to get quorum. This is important when servers go online in the primary datacenter since we don’t want to have a forced quorum in the secondary site when servers startup in the primary site.

If everything wasn’t that bad and we could simply power up everything in our primary site, replication should start working again.
But you have to do some things like, reconfigure your File Share Witness, restart cluster service on secondary Exchange server, and basically all steps we did to move everything to secondary site but now change everything to point to our primary site again. But don’t rush things here, let Active Directory get to a stable state first and then slowly move things back to normal.

Depending on what state servers are in and what happened you may not want to start Exchange in primary site, but remove it from DAG and rebuild Exchange, join it to DAG etc.

As you have probably noticed, there are lots of variables and therefore it is not easy task to write a step by step guide on what to do for every situation. It would be recommended to write out the basic steps and your configuration information to make the transition easier when you are dealing with the stress of the situation. The best tip I can give to all of you is to learn how things work and play with the various scenarios in a lab. The experience you gain from this will be your best friend when the unexpected happens in real life.