By Rob Hellewell

Published on Thu, February 25, 2021

All posts by this person

Big data can mean big problems in the ediscovery and compliance world – and those problems can be exponentially more complicated when personal data is involved. Sifting through terabytes of data to ensure that all personal information is identified and protected is becoming an increasingly more painstaking and costly process for attorneys today.

AI and Analytics New Ways to Guard Personal Information_AdobeStock_257489024

Fortunately, advances in artificial intelligence (AI) and analytics technology are changing the landscape and enabling more efficient and accurate detection of personal information within data. Recently, I was fortunate enough to gather a panel of experts together to discuss how AI is enabling legal professionals in the ediscovery, information governance, and compliance arenas to identify personal protected information (PII) and personal health information (PHI) more quickly within large datasets. Below is a summary of our discussion, along with some helpful tips for leveraging AI to detect personal information.

Current Methods of Personal Data Identification

Similar to the slower adoption of AI and analytics to help with the protection of attorney-client privilege information (compared to the broader adoption of machine learning to identify matter relevant documents), the legal profession has also been slow to leverage technology to help identify and protect personal data. Thus, the identification of personal data remains a very manual and reactive process, where legal professionals review documents one-by-one on each new matter or investigation to find personal information that must be protected from disclosure.

This process can be especially burdensome for pharmaceutical and healthcare industries, as there is often much more personal information within the data generated by those organizations, while the risk for failing to protect that information may be higher due to healthcare-specific patient privacy regulations like HIPAA.

How Advances in AI Technology Can Improve Personal Data Identification

There are a few ways in which AI has advanced over the last few years that make new technology much more effective at identifying personal data:

  • Analyzing More Than Text: AI technology is now capable of analyzing more than just the simple text of a document. It can now also analyze patterns in metadata and other properties of documents, like participants, participant accounts, and domain names. This results in technology that is much more accurate and efficient at identifying data more likely to contain personal information.
  • Leveraging Past Work Product: Newer technology can now also pull in and analyze the coding applied on previous reviews without disrupting workflows in the current matter. This can add incredible efficiency, as documents previously flagged or redacted for personal information can be quickly removed from personal information identification workflows, thus reducing the need for human review. The technology can also help further reduce the amount of attorney review needed at the outset of each matter, as it can use many examples of past work product to train the algorithms (rather than training a model from scratch based on review work in the current matter).
  • Taking Context into Account: Newer technology can now also perform a more complicated analysis of text through algorithms that can better assess the context of a document. For example, advances in Natural Language Processing (NLP) and machine learning can now identify the context in which personal data is often communicated, which helps eliminate previously common false hits like mistakenly flagging phone numbers as social security numbers, etc.

Benefits of Leveraging AI and Analytics when Detecting Sensitive Data

Arguably the biggest benefit to leveraging new AI and analytics technology to detect personal information is cost savings. The manual process of personal information identification is not only slower, but it can also be incredibly expensive. AI can significantly reduce the number of documents legal professionals would need to look through, sometimes by millions of documents. This can translate into millions of dollars in review savings because this work is often performed by legal professionals who are billed at an hourly rate.

Not only can AI utilization save money on a specific matter, but it can also be used to analyze an entire legal portfolio so that legal professionals have an accurate sense of where (and how much) personal information resides within an organization’s data. This knowledge can be invaluable when crafting burden arguments for upcoming matters, as well as to better understand the potential costs for new matters (and thus help attorneys make more strategic case decisions).

Another key benefit of leveraging AI technology is the accuracy with which this technology can now pinpoint personal data. Not only is human review much less efficient, but it can also lead to mistakes and missed information. This increases the risk for healthcare and pharmaceutical organizations especially, who may face severe penalties for inadvertently producing PHI or PII (particularly if that information ends up in the hands of malevolent actors). Conducting quality control (QC) with the assistance of AI can greatly increase the accuracy of human review and ensure that organizations are not inadvertently producing individuals’ personal information. 

Best Practices for Utilizing AI and Analytics to Identify Personal Data

  • Prepare in Advance: AI technology should not be an afterthought. Before you are faced with a massive document production on a tight deadline, make sure you understand how AI and analytics tools work and how they can be leveraged for personal data identification. Have technology providers perform proof of concept (POC) analyses with the tools on your data and demonstrate exactly how the tools work. Performing POCs on your data is critical, as every provider’s technology demos well on generic data sets. Once you have settled on the tools you want to use within your organization, ensure your team is trained well and is ready to hit the ground running. This will also help ensure that the technology you choose fits with your internal systems and platforms.
  • Take a Global Team Approach: Prior to leveraging AI and analytics, spend some time working with the right people to define what PII and PHI you have an obligation to identify, redact, or anonymize. Not all personal information will need to be located or redacted on every matter or in every jurisdiction, but defining that scope early will help you leverage the technology for the best use cases.
  • Practice Information Governance: Make sure your organization is maintaining proper control of networks, keeping asset lists up to date, and tracking who the business and technical leads are for each type of asset. Also, make sure that document retention policies are enforced and that your organization is maintaining controls around unstructured data. In short, becoming a captain of your content and running a tight ship will make the entire process of identifying personal information much more efficient.
  • Think Outside the Box: AI and analytics tools are incredibly versatile and can be useful in a myriad of different scenarios that require protecting personal information from disclosure. From data breach remediation to compliance matters, there is no shortage of circumstances that could benefit from the efficiency and accuracy that AI can provide. When analyzing a new AI tool, bring security, IT, and legal groups to the table so they can see the benefits and possibilities for their own teams. Also, investigate your legal spend and have other teams do the same. This will give you a sense of how much money you are currently spending on identifying personal information and what areas can benefit from AI efficiency the most.

If you’re interested in learning more about how to leverage AI and analytic technology within your organization or law firm, please see my previous articles on how to build a business case for AI and win over AI naysayers within your organization.

To discuss this topic more or to learn how we can help you make an apples-to-apples comparison, feel free to reach out to me at

About the Author
Rob Hellewell

Vice President

At Lighthouse, Rob counsels the world’s top corporations and law firms in ediscovery and leveraging analytics, data science, and technology to extract critical insights from data. Rob’s expertise includes applying analytics and developing data-driven solutions to reduce risk in compliance and legal matters.

Rob’s education and professional experience combine data analytics and legal expertise. He received his M.S. in Business Analytics from New York University, where his research focused on applying text mining, metadata, and sentiment analysis to detect legal risk in unstructured data sources. He received his J.D. from Brigham Young University’s J. Reuben Clark Law School. Rob previously practiced in the Antitrust and Trade Regulation group of Skadden, Arps, Slate, Meagher & Flom, where he counseled clients in connection with complex litigation and regulatory investigations from the Federal Trade Commission, Department of Justice, Securities and Exchange Commission, and other state and federal agencies.