Authored by Kimberly Quan, Director of Partner Relations for Nuix
Over the years, organizations have become adept at dealing with discovery and compliance requests for electronically stored information (ESI), but what happens when the data in question is years-old email stored in an archive? Ideally you can rely on the archive’s native search tools but what do you do when a deadline is looming and the archive search tool takes hours or days to return results? Or worse, what if the archive indexes are corrupt and the search results you waited days for are inconsistent and unreliable?
This exact dilemma has led some legal and information management teams to consider moving their data from their existing archives to next-generation or cloud archive platforms.
However, migrating data from an old email archive to a new one can be a time-consuming and frustrating task. Aging legacy archives are often filled with redundant, obsolete and trivial (ROT) information that does nothing but add to storage costs.
Worse, years of use can corrupt legacy archive indexes making it difficult or even impossible to for traditional migration tools to get the data out. Archives’ built-in extraction tools can also cause spoliation to metadata fields that are key in ediscovery and compliance matters.
Over the next five years, tens of thousands of enterprises and governments worldwide will make this move. However, the problems in the original archive will create huge headaches for these organizations during the migration process. Here are five of the most common issues we see.
Most migration technologies use the legacy archive’s application programming interface (API) to extract data. However, these APIs weren’t designed for extracting large volumes of data at once—in fact, many can only process a single item at a time in a single thread. This is why it can take months or even years to extract relatively small amounts of data.
API-based migration relies on the legacy archive’s internal index to have accurate records of which messages and attachments are stored where. Unfortunately, as we have already discussed, these indexes often become corrupted through years of use. Extracting data with a corrupt index can result in damaged or incomplete information or even an inability to get all of the messages and attachments out at all. Using archives’ native data extraction tools may also lead to spoliation of vital metadata such as a file’s last modified date or BCC values
Delayed search and discovery
Gaining access to more responsive—or just functional—search tools is often a major driver for migrating archives. However, the new platform’s search won’t return accurate results until all the legacy data has been ingested. This means it can take months or years for an organization to access the search capabilities that were the point of migrating in the first place.
Many organizations use the journaling feature of the email server to capture a copy of every email and attachment for every user. This data accumulated over a number of years sometimes conceals business risks, such as sensitive, private or financial data. This information is often buried deep within terabytes of poorly organized and old data. What’s more, most legacy archives lack advanced search capabilities to identify these risks.
No way to leave out the junk
Traditional approaches to migration have no efficient way of distinguishing ROT from important data. Organizations have no option but to ‘pump and dump’ the entire archive into the new platform … along with all of its bloat and drag on performance. For example, during one migration project, we encountered millions of automatically generated emails with repeated subject lines such as “fax server warning” and “failure notice.”
These headaches are easy to avoid. Nuix’s Intelligent Migration technology allows information managers to index all of their data before migrating to new systems. Nuix bypasses the need to use the APIs by processing the data directly within the archive storage. Using the Nuix’s unique patented parallel processing engine makes it possible to complete this task within weeks instead of months or years.
Once the data is indexed and fully searchable, organizations can make informed judgments about risk and value. They can prioritize for migration important items such as data on legal hold or belonging to executives. They can pinpoint business risks for remediation and filter out data they can defensibly leave behind in the legacy archive.
Common candidates to be left behind include data past its retention date, very large and infrequently accessed files, duplicated email messages such as company-wide email memos and trivial content containing keywords such as ‘lunch’ or ‘kitten’.
Use technology effectively
Having all the data searchable opens up many new possibilities. An organization can stop paying maintenance on its legacy archive and choose to move only data that has current business value for end users, for example the last 12 months’ worth.
During migration, the parallel processing technology in the Nuix Engine can extract data from the legacy archive many times faster than API-based approaches. This technological edge, combined with the ability to choose which data to move, can reduce migration timeframes from years to weeks. What’s more, organizations need only store important data in the new archive—no more paying a premium to keep ROT.
Organizations can save money by taking legacy archives offline and moving a cleansed, relevant population of data to the new archive.
Gaining an in-depth view into the data is the perfect opportunity to work with skilled professional services providers, such as Lighthouse eDiscovery’s consultants, to assist with governance tasks such as locating and acting upon personally identifiable information (PII) or identifying and preparing data for defensible deletion. It is also possible to perform a content assessment on a small sample of a larger data set to identify the risks and opportunities it may contain.