By Debora Motyka Jones, Esq.

Published on Thu, December 17, 2020

All posts by this person

Co-authored by Debora Motyka Jones and Tobin Dietrich

A recent conversation with a colleague in Lighthouse’s Focus Discovery team resonated with me – we got to chatting about TAR protocols and the evolution of TAR, analytics, and AI. It was only five years ago that people were skeptical of TAR technology and all the discussions revolved around understanding TAR and AI technology. That has shifted to needing to understand how to evaluate the process of your team or of opposing counsel’s production. Although an understanding of TAR technology can help in said task, it does not give you enough to evaluate items like the parity of types of sample documents, the impact of using production data versus one’s own data, and the type of seed documents. That discussion prompted me to grab one of our experts, Tobin Dietrich, to discuss the cliff notes of how one should evaluate a TAR protocol. It is not totally uncommon for lawyers to receive a technology assisted review methodology from producing counsel – especially in government matters but also in civil matters. In the vein of the typical law school course, this blog will teach you how to issue spot if one of those methodologies comes across your desk. Once you’ve spotted the issues, bringing in the experts is the right next step.

TAR Protocols 101 Avoiding Common TAR Process Issues AdobeStock_256575524

Issue 1: Clear explanation of technology and process. If the party cannot name the TAR tool or algorithm they used, that is a sign there is an issue. Similarly, if they cannot clearly describe their analytics or AI process, this is a sign they do not understand what they did. Given that the technology was trained by this process, this lack of understanding is an indicator that the output may be flawed.

Issue 2: Document selection – how and why. In the early days of TAR, training documents were selected fairly randomly. We have evolved to a place now where people are being choosy about what documents they use for training. This is generally a positive thing but does require you to think about what may be over or under represented in the opposing party’s choice of documents. More specifically, this comes up in 3 ways:

  1. Number of documents used for training. A TAR system needs to understand what responsive and non-responsive looks like so it needs to see many examples in each category to approach certainty on its categorization. When using too small a sample, e.g. 100 or 200 documents, this risks causing the TAR system to incorrectly categorize. Although a system can technically build a predictive model from a single document, it will only effectively locate documents that are very similar to the starting document. The reality of a typical document corpus is that it is not so uniform as to rely upon the single document predictive model.
  2. Types of seed documents. It is important to use a variety of documents in the training. The goal is to have the inputs represent the conceptual variety in the broader document corpus. Using another party’s production documents, for example, can be very misleading for the system as the vocabulary used by other parties is different, the people are different, and the concepts discussed are very different. This can then lead to incorrect categorization of documents. Production data, specifically, can also add confusion with the presence of Bates or confidentiality stamps. If the types of seed documents/training documents used do not mirror typical types of documents expected from the document corpus, you should be suspicious.
  3. Parity of seed document samples. Although you do not need anything approaching the perfect parity of responsive and non-responsive documents, it can be challenging to use 10x the number of non-responsive versus responsive documents. This kind of disparity can distort the TAR model. It can also exacerbate either of the above issues, number, or type of seed documents.

Issue 3: How is performance measured? People throw around common TAR metrics like recall and precision without clarifying what they are referring to. You should always be able to tell what population of documents these statistics relate to. Also, don’t skip over precision. People often throw out recall as sufficient, but precision can provide important insight into the quality of model training as well.

By starting with these three areas, you should be able to flag some of the more common issues in TAR processes and either avoid them or ask for them to be remedied. To discuss this topic more, please reach out to me at djones@lighthouseglobal.com.

About the Author
Debora Motyka Jones, Esq.

Senior Advisor, Market Engagement and Operations

Debora has been with Lighthouse since 2009 and has made a significant impact on the company’s growth and business strategy during her tenure. With a background in litigation from practicing at law firms in both Washington D.C and Washington State, her expertise and deep understanding of complex ediscovery matters enabled her to create a resonating brand and architect the innovative products and services that keep Lighthouse at the forefront of the ediscovery market. She led the execution and implementation of the company’s rebranding in 2012 and developed the marketing department from the ground up. In addition, she has been instrumental in spearheading the company’s strategic technology partnerships, driving the formation of Lighthouse’s product strategy, and the evolution of Lighthouse’s SmartSeries. She also instituted and continues to maintain a client advisory board to ensure strong alignment with market demands. Finally, in 2015, Debora lead the company’s expansion to the eastern seaboard by managing the development the New York office and team, as well as expanding upon the company’s current set of services and clientele.

Prior to joining Lighthouse, Debora was a Complex Commercial Litigation Associate at Weil, Gotshal & Manges LLP in Washington, D.C. where she worked on matters such as the WorldCom and Enron bankruptcies. Her practice also included multi-million-dollar commercial and securities litigation, and internal investigations. While at Weil, Debora was recognized three times for her dedication to pro bono service. Debora also practiced as a litigation Associate at McNaul Ebel Nawrot & Helgren PLLC. Her practice included commercial, employment, and securities litigation, as well as legal malpractice defense.

Debora received a B.A. in Psychology from the University of Washington where she graduated magna cum laude. She received her law degree from The George Washington University Law School in Washington, D.C. She is admitted to practice law in New York State, the District of Columbia (inactive membership), and Washington State. Debora is Level II Pragmatic Marketing Certified. Debora is actively involved in the legal community as the former Director of Women in eDiscovery, as a mentor with Mother Attorneys Mentoring Association of Seattle, as an Advisory Board Member for the Organization of Legal Professionals, as the former Chair of the Association of Corporate Counsel (ACC)'s New to In-House Committee, and as a former board member of the Washington Women Lawyers (WWL). Debora was also recognized for her contribution to the ACC and was named 2012 WWL Board Member of the Year. Debora is a frequent speaker on eDiscovery strategy, a former instructor for the Organization of Legal Professionals, and a regular Lighthouse blog contributor.