Not surprisingly, almost on a daily basis, I receive questions about what constitutes a defensible collection within the United States. This question is usually specific to how a client can limit the amount of data being collected and processed, while still meeting discovery obligations. In addition, I will often receive questions on whether or not in-house resources can be leveraged to avoid the cost of engaging outside collection vendors. In this article, I’ll dive into both issues and their solutions.
To begin, what is a defensible collection? The answer is both obvious and sometimes frustrating for counsel and custodians. A defensible collection is something that can be defended in court should questions arise. From a technical standpoint, that means that a data collector should use the appropriate tools to collect with intact metadata and to collect all possible results.
The reason this answer causes so much frustration is that in a world where data privacy concerns are at the top of everyone’s minds, custodians may not be willing to share their entire data source during a collection or a company may want to do everything they can to protect valuable intellectual property. This can be especially true when webmail, personal files, or large amounts of IP are subject to collection. Counsel can utilize targeted collections when collecting files and folders from a custodian’s laptop or file shares by having the custodian show the data collector where relevant data is located during a custodian interview. However, when it comes to email or data sources where the potentially relevant data isn’t clearly segregated, the solution isn’t quite as simple.
I often have clients request that I use the searching tools built into the front end of applications (e.g. the search function in Microsoft Outlook, Webmail, or Windows) to find potentially relevant data. This approach is not defensible for several reasons:
- Front-end search indexes do not always search metadata or all data locations within a data source, potentially missing relevant data.
- Similarly, many documents that are not searchable would become searchable using OCR technology as part of standard ediscovery processing. These documents would not return on any front-end searches and potentially would be left out of the collections.
- There’s a risk that searching for potentially relevant data will compromise the metadata of an object.
- A client must have a prepared list of keywords ready to be used during searches. Because front-end searching is not robust, these keywords have to be simple in nature and won’t always capture all potentially relevant data. This also opens the door for the court to question the keywords and methods used in collection.
- Frequently, as ediscovery progresses, new issues are discovered requiring new or revised terms to be added. If front-end searching was employed, recollection becomes necessary potentially increasing client costs and causing undue disruptions. Similarly, the ability to fine-tune the search terms as results are analyzed during review becomes impossible without returning to the source for recollection.
In the cases where targeted collections aren’t appropriate or possible, all data should be collected and culled within a processing tool such as Nuix, using date limiters, search terms, and/or custodian limiters. When it comes to data privacy concerns, data should be preserved and processed under strict security guidelines to ensure that sensitive data and privacy is protected. Human review begins once the population is culled and narrowed to potentially relevant data.
For those rare situations where data simply cannot leave the premise, unless expressly needed for litigation, custom solutions can be devised where data is collected, processed, searched, and promoted all on site. These solutions can be more costly than a traditional matter given the man hours and custom equipment needed, but they are available.
Leveraging in-house resources to collect data or having custodians self-collect data in order to avoid the costs of engaging a vendor can be done, but it increases the legal risk in a case and can still lead to defensibility issues. To break these issues down:
- As mentioned earlier, one of the most important facets of collecting data defensibly is capturing the metadata. There are specific processes that collection vendors utilize to ensure that metadata is preserved while collecting data. They typically require robust file copy tools and forensic collection programs, and those tools require expertise and training that many IT professionals and custodians do not possess.
- Custodians do not always understand what is/isn’t potentially relevant, and self-collecting data is often not included when it should be.
- In a typical corporation, IT would be responsible for in-house collections. They are usually already managing burdensome workloads and do not always prioritize data collection needs. This leads to impacted deadlines and downstream issues.
- IT does not always use industry standard tools for data collection. In past cases, I have received data collected from IT in abnormal file formats that required additional research and processing. This drives up the processing costs, which ultimately nulls the cost savings of having an in-house resource collect the data.
- Finally, IT professionals and custodians are not equipped to testify to the defensibility of the processes that they used to collect the data.
While in-house and self collection may be appropriate in some cases, I typically discourage it. This process can certainly net cost savings, but too often, I have seen issues crop up that lead to expensive and time-consuming fixes. My recommendation is to hire a vendor that can act as a trusted advisor during all phases of litigation, handle the collections as needed, and avoid potential pitfalls described above.
As I wrote in a previous article, starting collections off on the right foot sets the entire tone for a project and ensures that problems don’t persist over the course of the matter. That is especially true when it comes to defensibility. When data is collected by indefensible means, the subsequent review and production of data becomes indefensible, the case becomes subject to court sanctions, and, more importantly, the indefensible data can lead to the loss of the case.
While data collection is a typically routine and sometimes mundane aspect of the ediscovery, the stakes are high when it comes to the defensibility of data. Ensuring that data collection is done correctly should be of utmost importance to counsel as they work through their case.
To discuss more or ask questions, feel free to reach out to me at email@example.com or find me on LinkedIn.