As more enterprises move their data to Microsoft 365, one aspect of that migration many of them face is the migration of the Enterprise Vault data, which is typically much more data than is stored in the actual mailboxes.
There are two basic ways to get data out of Enterprise Vault; you can use the Veritas Enterprise Vault API (Application Programming Interface), which is how we do it at Vault Solutions, or you can reverse engineer the data. In this article, we will take a closer look at each of these methods.
Reverse Engineering Enterprise Vault Data for Restoration
Clearly, we prefer the API route, but let’s talk about reverse engineering first. Being familiar with how at least a couple of companies who do it that way operate, and having worked with Enterprise Vault since the late 1990s, I am keenly aware that Enterprise Vault has used many different methods of storing data over the past 25 years and change.
In the very first version of Enterprise Vault, they would take the entire message and put it in a file on the Enterprise Vault server or other local storage. They would supplement the original message with additional metadata and store it in a file with a ‘.dvs’ (for Digital Vault Saveset) extension. It was Digital Vault, because the product originated at Digital Equipment Corporation (where I worked for 19 years). However, they didn’t document the format or the information they put inside with the message. They didn’t even provide this information to their technology partners (such as Vault Solutions). While some companies attempt to reverse engineering the format to reconstruct the message, fidelity will be questionable for reasons explained in more detail below.
As Enterprise Vault developed, the engineers added features such as single instance storage and collections. For example, if the same message went to multiple people or was in several different mailbox archives, they might only store most of the message once. This was originally done with the attachments, but later with distribution lists.
At KVS (where I also worked for several years, as Technical Director of North America), we partnered with EMC to build in support for their innovative (at the time), storage platform known as Centera. This was the first Content Address Storage (CAS) platform, and Enterprise Vault took full advantage of it by building in native support for writing directly to the Centera. We quickly enhanced that at the request of one of the largest financial institutions to package multiple messages into a single blob on the Centera.
Over time, it has become difficult, almost impossible, to know all the ins and outs of how Enterprise Vault has changed these formats over the years. So even though a vendor may have an Enterprise Vault system for testing and can figure out how to do it for that version, it doesn’t mean they will be able to understand all the permutations of how Enterprise Vault stored data over the past 25+ years.
One of our customers has been using Enterprise Vault for over 20 years. The data there today was stored by the Enterprise Vault code from 20 years ago. If you are restoring the data by reverse engineering it, you would have to know how Enterprise Vault stored that data back then to put it back together without some form of corruption or potential data loss.
Unfortunately, this method does not offer the nearly 100% success rate that is possible with the API, which takes advantage of the code in Enterprise Vault that understands every nuance of how the data was and still is stored.
The Forklift
Because some vendors attempt to reverse engineer the data, they don’t need Enterprise Vault running to do the extraction. They take advantage of that fact by moving all the Enterprise Vault data from the customer’s on-premises environment to a staging environment, usually Azure. This is often referred to as a Forklift.
If you need to move all your Enterprise Vault data, 45 Terabytes, for example, you will need a temporary holding space. This requires putting together an infrastructure in Azure, for example, which takes time and costs the customer additional money for the cloud infrastructure. Once the infrastructure is completed, all 45 TB of data and the EV databases will have to be copied to this holding space, which could take weeks, if not months, to complete. Once all the data has been moved into the new infrastructure, new databases must be created, and the data must be sorted through and put back together. Then, they can start thinking about restoring items into somebody’s mailbox or elsewhere.
We’re working with a customer now who chose to utilize this Forklift process with another vendor, and it was at least six months before they restored message #1. Months later it was discovered they had missed hundreds of thousands of archived messages. When we ran the same process using the API, we restored 98% of the messages that had failed by the reverse engineering process.
Pros and Cons of Reverse Engineering
Pros:
You do not need a working Enterprise Vault system to use this method.
Cons:
• You will likely need to ‘forklift’ all data to the cloud before a single message can be restored (because of the way Enterprise Vault typically splits messages into parts for efficient storage)
• A Proof of Concept (POC) is out of the question because of the ‘forklift’ requirement. You must commit to doing it this way without being able to evaluate the success rate first.
• Additional costs are incurred for the infrastructure required to store a full copy of the Enterprise Vault data during the migration process.
• The error rate will be much higher due to the absence of documentation of the many changes to the storage formats used by Enterprise Vault over the last 25 years. The reverse engineering method should only be used when the CIO decides that the data in Enterprise Vault is not business-critical.
Using the API
When you use the Enterprise Vault API, as we do, the process is streamlined. The Enterprise Vault Runtime API encompasses all the code that has been used over the years to store data and, therefore, knows exactly what needs to be done to put it back together. (think Humpty Dumpty!)
When using the Enterprise Vault’s API, the software to restore the data is installed in the customer’s data center, and within a couple of hours messages from their Enterprise Vault environment flow directly into people’s mailboxes in Exchange Online. Within hours, not months, you know what success rate you’re going to have.
This is important if you’re thinking about doing a POC to see if this is going to work for your needs. Do you really have six months of time to spare and the additional financial resources for the Azure infrastructure, only to find out that your success rate is sub-par?
Pros and Cons of using the Enterprise Vault API
Pro:
• A POC can be installed and running in a couple of hours
• You can know from the beginning what the success rate will be
• Dramatically less infrastructure is required
• Typically, a 99.99%+ success rate
Cons:
Must have a working Enterprise Vault environment.
What to ask
If you’re getting ready for the enormous undertaking of restoring your data from Enterprise Vault to Exchange Online, it’s best to have all the information you need to choose the right method and resources to get the job done. Here are some questions to ask potential vendors that can help you restore your data outside of Enterprise Vault.
• What is your typical success rate?
• Do you use the native API to pull the data from EV?
• Is it necessary to have a staging area to do the migration?
• If so, how long does it take to create that staging area and how much will that cost?
• Can you do a Proof of Concept for a few mailboxes?
• How long does it take from the project start before you are migrating message #1?
• Is there a charge for that?
• Does the solution use Modern Auth? (Microsoft has deprecated basic authentication when connecting to Exchange Online) https://learn.microsoft.com/en-us/exchange/clients-and-mobile-in-exchange-online/deprecation-of-basic-authentication-exchange-online
If you are getting ready to migrate your Enterprise Vault data, we’re here to make it easier. Let us know at info@vault-solutions.com – we’re happy to help!