Company X is reducing the amount of data they’re migrating by 80%. How do you do that? How does a company figure out what data needs to be migrated, and what doesn’t?
It starts with asking the right questions. A customer of ours asked if we could tell them what was in their Journal Archive that wasn’t in their Mailbox Archive. After a little digging, we came to understand what they were really trying to do – figure out what they needed to migrate from Veritas Enterprise Vault to Exchange Online. And we realized the tool that could do this was our very own “Swiss Army Knife” of the Archive Accelerator suite – the Catalog.
When Migrating (rehydrating) email to Exchange Online from Veritas Enterprise Vault, there are 2 types of archives that might need to be migrated, depending on how Enterprise Vault has been used:
- Mailbox Archives
- Journal Archives
In this post, I’ll discuss an option to reduce the amount of data migrated in the case that both types of archives need to be moved.
Think twice
First, I would just say that if you plan on moving your Journal Archives to Exchange Online, please think twice about this. We’re seeing more companies come to us that made that move and are now looking to resume using Enterprise Vault Journaling (typically with SMTP archives), because they’ve realized that the Veritas Compliance and e-Discovery options are better. See https://www.veritas.com/form/webinar/deep-dive-ediscovery-that-goes-beyond-office-365.html to register for an excellent on demand webinar from Veritas that goes over the benefits of eDiscovery with Enterprise Vault over Microsoft 365.
If you were doing mailbox archiving AND journaling, then a lot of the data in the journal is also in the mailbox archive. If you are like many companies, and you migrate all of the mailbox archive data to Exchange Online, you might consider not migrating all the Journal Archive data to Exchange Online as well. You should consider only migrating the difference between what is in the mailbox archives, and what is in the Journals. This is how our customer is reducing the amount of Journal data to migrate by a whopping 80%.
So, how can you determine what is in the Journal Archives, but not the Mailbox Archives?
The best way to uniquely identify a message in Enterprise Vault (or any other messaging system), it to use the Internet Message ID. This is guaranteed to be globally unique for all messages based on an international standard developed decades ago for email.*
It isn’t a trivial task getting this information from Enterprise Vault, but it can be retrieved via the API. This is just one of the many attributes that we store in Archive Accelerator Catalog. Once the archives have been inventoried by the Catalog, this information can be used to identify the messages in the Journal archives that are not also in the Mailbox Archives. The pseudo query to do this looks like:
Select Items From (All Journal Archives) Where Item.Recipient = ‘user1@domain.com’ And Item Not in (User1 Mailbox Archive)
In practice, you will likely find the same item in multiple journal archives, so the actual query includes a clause to de-dup those also.
As I mentioned earlier, Veritas Compliance and e-Discovery options are arguably better than the native Microsoft 365 offerings. We’re impressed with the direction Veritas is going with their Information Governance products, and the new Veritas Advanced Supervision compliance offering is well worth a look.
Vault Solutions has been adding value to Enterprise Vault for over 13 years, so whether you decide to stay or migrate, we may be able to help!
*At first thought, many assume the information you need to do this is in the Enterprise Vault database tables. To uniquely identify an item in Enterprise Vault, you need an Archive ID and a Saveset ID. The Archive ID tells you which archive it is in, such as ‘John Smith’s Mailbox Archive’, or the ‘Journal Archive’. Let’s assume, for simplicity’s sake, that all Journal data is in the 1 journal archive called ‘Journal Archive’. You would think that if have an item with a particular Saveset ID in the Journal Archive, but you can’t find that same Saveset ID in John Smith’s archive, that the message isn’t in John Smith’s archive. But that test doesn’t work, because Enterprise Vault gives each of those messages a unique Saveset ID, even though they are the same message.