Wednesday, October 14, 2009

Exchange 2010: The New Archiving Feature

There is a lot of buzz surrounding the new archiving feature in Exchange Server 2010. But where there is buzz there are always the unavoidable rumors and misunderstandings surrounding the new feature.

When you ask an Exchange administrator about archiving, most of them think of the archiving product as a tool that replaces messages and/or attachments with a shortcut often called a stub, and then takes the original item and stores it in the archive system. This is a deeply-entrenched misunderstanding, and when Microsoft revealed that the Exchange 2010 archiving function does not take items away from Exchange store, people started to shout and exclaim, This is not a true archiving solution! I admit that at first I agreed with this outcry from the Exchange community, but the more I thought about it, the more I realized there are good reasons to keep items inside Exchange.

First, you must put yourself in the shoes of a regular user. He or she often has a mailbox quota enforced and when the user gets a warning message they typically move items away from Exchange and store them in a PST file. This seemingly innocent move causes a very big problem. PST files, as their name implies, stands for Personal Store and should be stored locally on end users’ hard drives. The result? There is no longer any backup on those mail items, and even if you go the unsupported route and save the PST file on a file share somewhere, the backup software often has issues with doing backup of these files since Outlook has them open. What about using special tricks, such as open file agents, you might ask? Unfortunately, the backup software will still experience difficulty in performing the backup of open files even with such tricks. Outlook also changes the archive bit on the PST file, which in turn triggers the backup software to perform a backup even if there is no change in the file from Outlook. This will cause the backup to run for an extensive length of time since there are typically many PST files scattered across numerous file shares.

Another roadblock administrators may face when storing PST files “over the network” is that networks are unreliable and do not always function properly. Even if the networks are working, users are prone to closing the lid on their laptops, causing the network link to close and the PST file to corrupt since it was not properly closed by Outlook. This is also the main reason why PST files are not supported on file shares. The corrupt PST file is also notorious for engendering end users to call the help desk, and essentially forcing the administrator to initiate a restore of the hopefully backed up PST file. Other problems exist with PST files located on a share, including but not limited to: slow network performance when open, and when closing Outlook.

The risk of taking data out of Exchange and storing it inside PST files is that you are moving corporate data from a safe environment located inside Exchange databases to an unsafe environment. Since PST files can easily be corrupted and/or lost, they are not a secure alternative to storing business-critical data. By moving corporate data out of Exchange you may in essence be breaking laws regarding retention and compliance because the administrator no longer has control of email content. Let us not forget that corporate assets are in danger of being lost by moving data out of Exchange.

Other issues to consider include legal discovery and reducing the burden of searching and restoring mail data. When moving data from Exchange to PST files, you have the potential of losing all those things.

Other archiving solutions often solve all or many of the aforementioned problems by using the “stub” approach, and can provide some kind of search capability.

The stub approach is something that most vendors claim to be a viable alternative, but keep in mind this also introduces problems since items are removed from Exchange and are no longer indexed and searchable from a native client, forcing you to utilize the vendor client. That process entails installing and maintaining another client, which can be complex both for the end user and for the administrator. Most vendors also claim that you would reduce the amount of data in Exchange with the stub approach. That is often true, but in many cases you do not reduce the data as much as you expect since the stub is another item in the Exchange database with a couple of Kb in size. By replacing a 10Kb mail with a stub of 5 Kb, you only save 5Kb of data. This is something that you should consider if you import PST files to Exchange and then archive those imported items-- this will in fact increase mailbox size by a couple of Kb per item you import even if you later archive it.

Microsoft’s approach to archiving in Exchange 2010 is not to move items from Exchange and store it somewhere else, but in fact to leave it inside Exchange. There are several reasons for this. By leaving data inside Exchange, both users and administrators no longer need to learn and manage another system. The end user experience is the same as having a PST file connected to Outlook, and users can still drag and drop mail back and forth between their mailbox and the archive, making it incredibly simple for users to take their PST files and import them to the archive. Administrators would be happy since they no longer must cope with all the problems caused by PST files, but users will also be happy since the archive is indexed and searchable. Some companies must also comply with regulations and policies that force them to do searches across multiple mailboxes. This is also built in and is performed from the Exchange Control Panel (ECP) by users that have the delegated permission to do so. You can also turn a mailbox on ‘Litigation hold,’ meaning that even if a user deletes items, empties their ‘Deleted Items’ folder, and clears the dumpster area, mail is still maintained in the new Exchange 2010 dumpster version 2 area. This area is not reachable by end users but only by members of the ‘Discovery Management’ role group.

But what about the increased database size in Exchange?
The archive is technically an additional mailbox in the same database as the primary mailbox and shows up in Outlook in the same way a PST file would. People often react and say that the archive must not be in the same database because it should be on cheaper storage-- that’s fair and is most viable if we are talking about Exchange 2003. It’s common knowledge that the IO load that Exchange 2007 places on your storage has dropped by approximately 70% as compared to Exchange 2003, and with all the enhancements done for Exchange 2010, the IO footprint has dropped about the same amount again, making it possible to run all your Exchange databases on cheap and less performing disks. This means that you don’t need the costly SAN for your Exchange databases but in fact can use cheaper storage like SATA disks.

What about backup and restore time?
With the increased volume in Exchange databases, you may think that the backup time will increase, but that is not entirely true. The streaming backup API is taken away from Exchange 2010 and what is left is the VSS API for backup. With VSS, you only do backup of the changed blocks on disk and most of the mail in the archive is just sitting there, and therefore the block on disk is never changed. Sadly, the story for doing restore is not improved, and with the increased volume you also get increased restore time. But there is a simple solution for that-- don’t do backup. This is a very controversial thing to say, but with Exchange 2010 Database Availability Group (DAG), you can replicate data to several mailbox servers (up to 16) and if a database or disk blows up, another copy of your databases will be set as primary, and most likely will not even be noticeable to the end user since all client connections don’t go direct to mailbox servers but instead go to Client Access Servers (CAS). You can also stretch members in the DAG across datacenters to solve the case where a datacenter stops working. It also provides a replica of your data offsite. So the question to ask is, Why do you backup your Exchange data? http://anewmessagehasarrived.blogspot.com/2009/05/why-do-you-backup-exchange-databases.html

Why bother creating an archive mailbox at all?
The reason for creating an archive mailbox is probably something you base on the size of each user’s data volume. It is only the primary mailbox that is synced to the cached Outlook, and Outlook has issues with performance if the OST file is growing large. Therefore, by moving data to the archive the OST file will be smaller and Outlook will in turn perform better in cached mode. To see the archive, you must be online and have contact with Exchange and be using Outlook 2010 or OWA 2010, hence the name ‘Online Archive.’

Must users manage their archive manually?
No, the administrator can create policies that either move mail from mailbox to archive or delete mail. This is similar to what was first introduced in Exchange 2003 as recipient policies that were often used for clearing things out of the mailbox to the ‘Deleted Items’ folder or delete items completely. In Exchange 2007 this feature was enhanced a bit and changed its name to Message Record Management (MRM), and displays as Managed Folders in EMS and EMC. Exchange 2007 also introduced ‘Organizational Folders’ that could hold certain policies regarding how long mail must be maintained and what to do when they expire.
The problem with MRM version 1 was that it could only be applied to a folder or type of message, not one individual mail. With Exchange 2010 the administrator can still use the old way of applying policies on folders, but there are also new policies that allow users to apply a policy directly to individual mail items, (MRM version 2 if you will.) Policies are created by the administrator and if applied to folders, then users have the option to apply policies on individual mail items depending on how the policies are created.

The administrator can set different quotas on the primary mailbox and the archive. An example would be that the primary mailbox quota is 2GB and the archive is 15GB. With a couple of policies the administrator can choose to delete everything older than 10 years, and messages older than 1 year are moved to the archive. There also exist a couple of user policies that a user can set, allowing them to: ‘Keep this message for 5 years,’ ‘Keep this message for 1 year,’ ‘Delete this message in a month’ or ‘Delete this message in 5 months,’ giving a very flexible and efficient way of managing messages in Outlook.

Microsoft definitely has a good thing going with their Exchange 2010 archiving solution. For those of you not swayed yet, keep in mind that this is the first version of archiving within Exchange. The archive makes it possible to get rid of PST files and along with them all the problems they cause. Any administrator would agree that having data safely inside the Exchange store, managed and searchable with Exchange native tools, instead of having extra software and hardware to maintain is worth disregarding any rumors or misconceptions surrounding this brand new feature.