Document Review at a Crossroads: A Guide to Navigating the Current Options 

By Beau Holt, Esq. and Debora Motyka Jones, Esq.

 

The Electronically Stored Information (“ESI”) world has begun what will eventually be seen as an inflection point.  If you work in the e-discovery field, you have likely heard or read about Technology Assisted Review (aka Predictive Coding, Computer Assisted Review or Human Guided Machine Learning, collectively referred to as “TAR”) with increasing frequency.  Document review models are beginning to shift away from traditional Linear Human Review (“LHR”) toward TAR.  This does not mean that traditional LHR is ready for the museums or that TAR is a complete replacement for all LHR.  In this article, we will explore the strengths and flaws for these two distinct methods and provide a roadmap for leveraging the strengths of both.  

TAR and LHR Defined

Since the inception of ESI and electronic document review, the primary methodology for identifying relevant material[1] has been application of search terms followed by one or more phases of human review of the search results.  The review phases often include:    
•initial review for relevance and identification of potentially privileged documents (commonly known as “first pass” review); •second pass or quality control (“QC”) review to confirm the responsiveness decisions, further refine the potentially privileged
document set and apply or confirm issue code tags; and •final privilege review involving a more extensive review to prepare privilege summaries for a privilege log. This approach is often referred to as linear review because documents follow a linear path: search term application, initial review, QC or second pass review, and privilege review.  In the last few years, the arrival of TAR has promised a new paradigm for document review.  The armies of document reviewers are being replaced by one (and sometimes two) “expert” reviewers and a computer program.  These “expert” reviewers review a small fraction of the total document population allowing the TAR program to learn the decision-making logic.  Once the program demonstrates an acceptable level of understanding, the TAR application applies the logic to the remaining document population.  In effect, the expert review and the TAR application combine to conduct the equivalent of a first pass linear review.

The TAR model has begun to challenge the older LHR model in three areas:
•costs;
•added awareness of the over-inclusive and under-inclusive issues associated with search terms; and
• variation in accuracy of reviews associated with large document review teams.

TAR can reduce the cost of document review by substituting voluminous first pass LHR with a streamlined review of a small data set. TAR  can achieve this while simultaneously meeting or exceeding the accuracy of LHR first pass results.

The cost argument alone is extremely compelling.  Costs for the discovery phase of litigation have continued to climb, due almost exclusively to the explosion of ESI volume.  Large cases now often involve collection of millions of documents.  These documents must be collected and processed (e.g. loaded into processing tool, de-NIST’ed, deduped, and limited by date ranges) prior to application of search terms.  Generally speaking, the larger the volume of incoming data, the larger the volume captured by search terms for subsequent review.  It is no surprise that there is a direct positive correlation between the cost and the size of the document reviewundefinedat least with LHR models.  I have personally been involved in litigation with multi-million dollar LHR costs due to the volume of documents. Corporations, small businesses, and the insurance companies that provide litigation coverage are all pushing back against the spiraling costs with greater urgency.  They are actively seeking new models. The numbers below compare traditional LHR for a medium size case where search terms captured 250,000 documents for review against a TAR model for the same set.

LHR Model
•First pass review on all 250,000 documents captured by search terms
•2500 hours of collective review time assuming a fast rate of 100 documents per  hour
•Cost: $125,000 - $312,500 assuming typical per hour reviewer costs of $50 to $125 per hour.

TAR Model
•Expert TAR review of approximately 7,000-10,000 documents
◦Although every case is unique, common TAR platforms would be able to achieve statistically sound and stable results after human review of approximately 7,000-10,000 documents.
•170 hours of TAR review time by an “expert” reviewer, assuming a review rate of 60 documents per hour.
◦The expert reviewer is typically a member of the litigation team or someone who grasps the relevancy concepts and parameters of the case.
•Cost: approximately $67,000 assuming cost of this expert reviewer at $400/hour.

The TAR review model represents a savings of between $58,000 and $244,500. The numbers are even more impressive for cases involving larger volumes in the TAR model.

Firmly established and frequently used statistical sampling protocols determine the accuracy of the TAR results.  The sample size required for statistically sound results goes up fractionally in comparison to increases in the overall document population.  Thus, the costs of utilizing TAR do not grow proportionally with the overall document volume whereas the costs of LHR do.  With the dramatic cost-savings, every legal department should be educating themselves on how to leverage TAR. In addition to the costs associated with linear review, the reliability of using search terms to determine the corpus of documents for linear review is also being challenged.  Studies and judicial commentary have increasingly called into question use of search terms and subsequent initial review by scores of review personnel.[2]  Search terms are known to capture irrelevant documents due, in large part, to the flexibility and variability of human language.  For example, if you look up “table” in the dictionary you find over half a dozen different definitions and common uses.[3]  Search terms are also under-inclusive, failing to capture all relevant documents.  Failure to capture, identify, and produce relevant documents, whether harmful or helpful to your litigation strategy, can have consequences to the overall success of your case.  Uncertain efficacy of search terms also leads to a number of complex and expensive side-processes in the LHR model including QC procedures and sampling of documents not captured by search terms.   TAR eliminates the need to use search terms to cull the document set for review to a manageable size.[4]  Due to the statistical sampling methods, TAR can be employed across huge document populations, the review of which was previously considered unfeasible.  In the LHR model, you typically have to rely on search terms to reduce your data set.  TAR enables you to set aside this concern because a million document set in TAR does not require four times the human effort of the 250,000-document set discussed above.  Instead, the increase in the volume reviewed by the “expert” reviewer would likely be in the range of 3,000 to 10,000 documents. 

TAR has also begun delivering on the promise of decreasing inconsistent relevancy decisions that often plague LHR models.  Inconsistency in decision making is an issue known to virtually anyone who has managed LHR.  Although we draw largely on personal experience in discussing inconsistency in document review decision making in the LHR model, there is a growing body of evidence support this.  [5] In the TAR model, a single expert reviewer conducts a review of a small percentage of the documents.   This single expert reviewer should have strong direct familiarity with the issues, facts and themes of the case.  This, in turn, will typically result in the reviewer making more consistent decisions than a large group of reviewers.  The increase in consistency is attributable to two primary factors.  First, human document review involves human judgment and subjective decision-making.  A single person is likely to make more consistent decisions than would a small group of people working independently.  The larger the review group, the greater the inconsistency.  Second, document review teams are rarely as familiar with the issues and facts of the case as the core litigation team.  Document review teams are often introduced to the issues and facts of the case on the same day that they start their review and well after commencement of the litigation. The core litigation team, on the other hand, has been uncovering the facts of the case since the moment they received notice of the case.  This can be months before a document review begins.  These two factors are key ingredients in the accuracy improvements we see in TAR models.  

Overall, TAR is an excellent alternative to LHR for first pass review.  As discussed above, TAR is successfully addressing the costs, search term efficacy and review accuracy weaknesses of LHR models.  However, TAR is not yet a singular solution ready to replace all phases of LHR. 

LHR: When and Why? 

Up to this point, you may be thinking that we really dislike linear review.  Quite the contrary, weactually believe LHR has a valuable place in eDiscovery for at least another decade if not two.  There are phases of review, certain data sizes and specific data types particularly well-suited to human linear review.  It can be a very powerful tool when utilized in a thoughtful and deliberate manner.  As with any complex task, efficient and effective LHR requires proper preparatory work, clear decision parameters, quality personnel and sound management and oversight. LHR is still a good solution for QC reviews to verify TAR, finalization of privilege decisions and creation of privilege logs, numerous issue code reviews, and reviews where summaries or other human language input is required or helpful.  Until judicial and industry acceptance of TAR increases to near-universal levels, LHR will likely be utilized as a QC mechanism to verify or confirm that the machine-coding decisions during TAR are reliable.  QC of TAR results is not firmly established as a requirement across the legal review industry or the judicial landscape.  However, the recent da Silva Moore ruling shows that judges will expect a certain level of QC review against the documents TAR identified as not relevant.[6] 

Judicial and industry acceptance of TAR’s ability to identify non-relevant documents is increasing.  However, it will be some time before corporate clients and litigators feel confident enough in TAR results to forego human review of the documents identified as relevant.  Currently, human review of at least a sample of TAR results is used to confirm responsiveness decisions made by the computer.  LHR is also frequently used subsequent to TAR to assign issue codes, indicate confidentiality designations, identify or verify privilege content, apply redactions of privilege content,[7] and create a privilege log.  Traditionally, in LHR, these tasks are completed in what many refer to as second pass review.  LHR is well suited for these tasks.  Existing TAR platforms do not yet excel at the individual actions that are part of these tasks.

LHR is still a superior option in smaller matters where the additional cost of TAR outweighs the cost of LHR.  Matters without a lot of ESI still exist in many areas including employment and small business.  In keeping with the goals of proportionate discovery, a more expensive TAR model makes little sense in these cases. Even in matters with a large eDiscovery footprint, LHR is a very useful and necessary approach for certain subsets of data types.  Specifically, LHR is a superior solution for non-text based documents including image file types (e.g. jpg, video files, audio files), low/no text based files (e.g. .dwg or other AutoCAD related file types), and structured data (e.g., numerical database files).   

The changes and solutions offered by TAR and some of the advanced applications discussed above are no longer “coming,” they are here. Five to ten years ago, LHR was the dominant (and really the only) option.  Currently, LHR is still in greater use than TAR.  Five years from now, the roles will probably have switched with TAR being in greater use than LHR as a first pass review option.  Ten to twenty years from now, we suspect that large groups of review attorneys sitting in a single room reviewing documents for months on end will seem as foreign and antiquated as large-scale hardcopy reviews seem to the most recent law school graduates.  Those of us who learn how to embrace these changes and leverage the strengths of TAR and LHR models to offset the existing weaknesses of each might very well be the leaders of what we believe is the nascent technology revolution in the legal services industry.

[1] In the context of this article, relevant material refers to documents relevant to the lawsuit and marked as such by attorneys.

[2] See, e.g., da Silva Moore v. Publicis Groupe & MSL Group, No. 11 Civ. 1279 ALC AJP, slip op. at 20 (S.D.N.Y. Feb. 24, 2012); Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251, 260, 262 (D. Md. 2008) (Grimm, M.J.); David L. Blair & M. E. Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System, 28 Comm. ACM 289 (1985). 

[3] www.merriam-webster.com/dictionary/table 

[4] As a hybrid alternative, some litigation teams still use search terms prior to TAR.  In this model, TAR will exclude the irrelevant documents captured by overbroad search terms.  Then, the team can either sample non-hits or apply TAR to non-hits to identify documents missed by under inclusive search terms. 

[5] See, e.g., Maura Grossman & Gordon Cormack, Technology Assisted Review in eDiscovery Can Be More Effective and More Efficient than Exhaustive Manual Review,Rich. J.L.& Tech., Spring 2011, at 48, available at http://jolt.richmond.edu/v17i3/article11.pdf; Herbert L Roitblatt, Anne Kershaw & Patrick Oot, Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, 61 J. Am. Soc'y for Info. Sci. & Tech. 70, 79 (2010).

[6] da Silva Moore, at 9-12; It should be noted that the QC standard is not perfection – as Judge Peck recently noted “the Federal Rules do not require perfection.” id.,at 21-22 citing Pension Comm. of Univ. of Montreal Pension Plan v. Banc of Am. Sec., 685 F. Supp. 2d 456, 461 (S.D.N.Y. 2010); “The Court reminded the parties that computer-assisted review ‘works better than most of the alternatives, if not all of the [present] alternatives.  So the idea is not to make this perfect, it's not going to be perfect.  The idea is to make it significantly better than the alternatives without nearly as much cost.’” da Silva Moore, at 11 (citation omitted).

[7]  For example, drawing redactions to remove privileged or protected from an otherwise relevant non-privilege document.

About the Authors: 

 

Beau Holt, Esq. Beau manages Lighthouse eDiscovery’s entire project management staff, both technical project managers and Lighthouse’s Hosted Solutions group, ensuring that Lighthouse remains at the forefront of best practices and workflows for eDiscovery.  As an attorney with over 12 years of eDiscovery project management experience, Beau has a deep knowledge of all aspects of complex litigation and has been a frequent guest speaker at industry events.   Before joining Lighthouse, Beau was an eDiscovery attorney for K&L Gates LLP.  Beau graduated with his B.A in Psychology, magna cum laude, from Auburn University and received his J.D. from the University of Washington.  He is licensed to practice law in Washington State.

 

Debora Motyka Jones, Esq: As Director of Product Strategy for Lighthouse,  Debora is responsible for designing the innovative product and services for Lighthouse and its portfolio of Fortune 500 companies and Am Law 100 law firms.  Her background in litigation - practicing law in both Washington D.C and Washington State - supports her expertise and deep understanding of complex eDiscovery matters and the tools necessary to create added value for Lighthouse clients in the fast-paced eDiscovery space.  Additionally, Debora drives thought leadership initiatives in collaboration with other key client-facing functions focused on improved results through client consultation. Debora is a speaker on electronic discovery strategy, more specifically focusing on effective management of the cost of discovery.

 

 

 
 

 


  

 

 

 
 

 

 

 

 

 

 

 

 

      • Powered by Wild Apricot Membership Software