EDRM recently published a draft of their Processing Standards Guide, a general software-agnostic standard for data processing in the fast paced field of ediscovery. This document is a welcome breakthrough and it falls in line with EDRM’s many standards guides. This guide, in particular, looks at the concerns and considerations that arise when processing data, and it begs the question, what should the client’s expectations be when it comes to processing standards?
Lighthouse is responding to EDRM’s Processing Standards Guide formally and contributing our guidance towards the development of an industry standard. After looking over EDRM’s processing document and compiling our feedback, I have come up with some standards and best practices that you, as a client, should expect and keep in mind when it comes to processing.
- Virus Protection: Keep in mind the vulnerabilities of processing systems and be sure that your ediscovery provider is aware of the issues that can arise when one uses virus protection tools while processing data. Your reviewers should also be aware of this issue when utilizing any “launch or view in native application” functions of processing or reviewing tools, especially when it comes to email attachments. Openly viewing any unknown attachments or container files may subject the system to malware and viruses.
- Container Files: To avoid surprises in processing fees, request file type analysis reports before and after processing that include both the compressed and extracted sizes of the data.
- Deduplication: This is one of the most complicated ediscovery concepts and the most sensitive to processing changes, such as time zone selection and hash calculation. Ask your ediscovery provider for the option of global or custodial deduplication and make sure to stipulate the order in which your key custodians’ data is processed.
- For example, both the VP and a mid-level employee from the same company have an unknown email of evidentiary importance. If the employee’s data is processed first, the VP’s data with global deduplication will be applied next, and the important email will be suppressed in the VP’s doc review set.
- Regardless of the order in which custodian data is processed, your ediscovery provider should supply you with an All Custodian field to allow tracking of important items that have touched various custodians.
- MD5 Hash: Deduplication also relies on the process of MD5 hash calculations, or a digital thumbprint, which can be modified by the different time zones that emails are sent and the data processed. Calculating hash values on header information alone, can cause many items to be improperly deduplicated and make it impossible to correctly apply threading or TAR tools. Using tools that compute with a mixture of header fields, hashed, or tokenized, body text minimizing blank space, and hashed or binary streams of attachments, provide greater accuracy when calculating hash values. Your ediscovery provider should be aware of the ways their tools calculate these hash values and provide quality checks to determine that deduplication is being performed with accurate MD5 hash values.
- Time Zones: Consistency with time zone selection is important, not just for discovery or ingestion, but also for any date filtering, date-based culling, exports, or when crossing platforms for review or production. Your ediscovery provider should allow you to choose the necessary time zone for your matter and apply that standard time zone across all of the processing, review and production platforms.
- NIST List: NIST updates their NSRL list of MD5 hash values on a quarterly basis. Ask your provider if they are performing all deNISTing with the most recent list.
- Embedded Images: Is your provider using an automated method unique to your corporate logos and other signature embedded objects, while leaving true embedded images and signature blocks intact, automatically reducing the number of documents for review? If not, make sure to present this idea.
- Exceptions Handling: Ask your ediscovery provider to provide reports consistent with the proposed guidelines. Where password cracking, replacing items, or fixing container files can be done, it should be completed in a way that keeps the integrity of the data. Your ediscovery provider should perform a re-collection on these items. Since ediscovery and technology are ever-changing, evolving, and improving, it is important for ediscovery providers to challenge themselves and make certain that they can handle exceptions related to different program versions, or incompatibility with industry standard tools by providing their clients with a process that involves internally developed tools to address these items.
- Quality Control: A topic not in EDRM’s processing guideline, but one that should be stressed for all ediscovery processing, review and production, is quality control. There should be quality checks inserted at various steps in the process to validate processed dates and time zones to ensure that no spoliation of data has occurred, that text encoding remains consistent, and that all standards agreed to during the meet-and-confer process are met.
How do your internal review and processing standards align with the guidelines proposed by EDRM and the suggestions above? Are there any standards that you expected to be present in the above that weren’t?
I look forward to diving into some of the proposed guideline topics more in depth. If you have any further discussion or questions, please feel free to contact me at email@example.com .