How to Get More Miles Per Gallon Out of Your Next Document Review

How many miles per gallon can I get using Insight Predict, Catalyst’s technology assisted review platform, which is based on continuous active learning (CAL)? And how does that fuel efficiency rating compare to what I might get driving a keyword search model?

While our clients don’t always use these automotive terms, this is a key question we are often asked. How does CAL review efficiency1 compare to the review efficiency I have gotten using keyword search? Put another way, how many non-relevant documents will I have to look at to complete my review using CAL versus the number of false hits that will likely come back from keyword searches?

This is an important question. The number of documents you have to review to get to your recall target directly correlates to the time and cost of doing that review. If you have to look at more non-relevant documents using either a CAL or keyword process, then perhaps you need to change your ride. If the miles per gallon are significantly lower with keyword search, then you are paying too much on your commute. Time to trade in for a new model.

How Efficient are Keywords?

This question might be tougher to answer than you think. While keyword searches can be effective in finding some relevant documents, the process rarely achieves a sufficient level of recall for a true comparison with a CAL process.

Anecdotally, we have asked a number of e-discovery professionals how efficient their keyword-based reviews typically are. While we aren’t claiming this was a statistical survey the typical answer we received was ten to one. That means that the typical reviewer has to look at ten documents for every relevant one found. That is a pretty low efficiency rate in our view. If these estimates are anywhere near correct, that suggests  that a CAL process like Insight Predict will be about 5 times more efficient than keyword review. Put another way, if you are basing your review on keyword search, you are reviewing too many documents.

To be sure, keyword search still has a place in e-discovery, to find relevant documents quickly or when searching for documents that you know have specific and not often-repeated terms. But when it comes to a general review, here’s what the data shows about keyword efficiency—and why Predict is a superior method for document review.

Apples-to-Apples: The Recall-Precision Tradeoff in Keyword Searching

There is not a lot of publicly available data on the true effectiveness of keyword search, but we do know one thing for certain. There will always be a tradeoff between recall and precision (or, in this case, efficiency). As a general matter, you can only increase recall levels by sacrificing precision. With that in mind, we can take a look at the keyword search data and understand, at least qualitatively, how to compare keyword search to TAR, given that TAR typically attains much higher recall levels than keyword search.

Blair-Maron: Even though it’s over 30 years old, the Blair-Maron study remains the most comprehensive analysis of keyword effectiveness in the legal realm. Catalyst founder John Tredennick recently wrote an article  examining the study in depth. He discussed the implications of the scientists’ findings that while attorneys were able to use keyword search to find a lot of relevant documents (indeed, sometimes achieving 80% precision on individual topics), they were only able to find about 20% of the total relevant ones in the larger population. The conclusion which follows logically is that had the attorneys tried to develop additional keywords to increase total recall from 20% to, say, 75 or 80%, they would have had to review a large number of non-relevant documents in the bargain.

BioMet: We do have one other publicly available data point that we might use to further evaluate the recall-precision tradeoff for keyword search in the Biomet decision, Biomet M2a Magnum Hip Implants Prods. Liab. Litigation (N.D. Ind. April 18, 2013). In Biomet, the producing party used keywords to reduce the review population from 19.5 million to 2.5 million documents.

The statistics in the Biomet matter are not entirely consistent, but do provide a reasonable basis for assessment. Using one set of statistics, we can estimate the recall of the keyword search to be roughly 60%, which means the parties only found 60% of the relevant documents in the entire population.

What about the precision or efficiency of their review? Based on information disclosed in the opinion, we calculate that the precision of their keyword search efforts was roughly 16% (an efficiency of 6.25:1). Using a second set of statistics from the case, the precision would only be about 9% (11:1 efficiency). Neither figure seems to account for the fact that the team likely reviewed many non-responsive family members, so we need to treat the Biomet precision values conservatively. Their actual efficiency may well have been worse than 11 to 1.

Since neither the Blair-Maron study nor the Biomet decision reflect the higher levels of recall typically seen with Predict (for example, we typically set our recall target at 80%), we need to consider the recall-precision tradeoff. Obviously, at 80% recall, the precision of keyword searching would be nowhere near the 80% level seen with the Blair-Maron study. Instead, keyword search precision at 80% recall would more likely be nearer (and probably less than) the 16% precision calculated in Biomet. From these figures, we can estimate the precision of keyword search at 80% recall to be roughly 10%, which equates to a 10:1 efficiency.

Predict: Achieving High Efficiency and High Recall

In a recent post, we examined efficiency rates across several simulated Predict projects. We found that, on average, a Predict review has the potential to reach a 1.75:1 efficiency ratio at 75% recall. This means, in the perfect world of a simulation, you would have to review 1.75 documents to find each relevant document during the course of review.

We then took a look at how those same cases played out in the real world by looking at the statistics for the actual review. On average, the efficiency of the actual review was  2.66:1. But the average recall in those cases was in the neighborhood of 90% — which is generally higher than necessary in litigation, and certainly higher than you would see with keyword search.

Looking at the statistics at the point at which those same cases achieved 80% recall (a more realistic litigation target), the efficiency increases to about 2:1. This means that you will look at two documents in a typical Predict review to find each responsive document.  So if you need to find about 10% of a collection to reach your estimated recall goal, you’ll wind up looking at about 20% of the collection during the course of the review.

Your Mileage May Vary (slightly)

So, why is Predict slightly less efficient in the real world, on average, as compared to the simulation environment? Let me offer another analogy to an automobile. Every new car carries a sticker that reads something like “32 mpg highway/24 mpg city”. Are these real numbers? Of course they are, but they are actually the result of tests performed in the laboratory, where the conditions are perfect.

In real life, your actual gas mileage will likely be less, say 30 mpg on a highway and 21 in the city. You may be driving into a headwind; you may be in stop-and-go traffic; or your tires aren’t perfectly inflated. All those real world factors cause your gas mileage, your “efficiency”, to go down.

The same thing happens with Predict in the real world. The requirements of discovery and real-world review workflows lead to some inescapable inefficiencies along the way – for example, reviewing family members for privilege, periodic sampling and the like, all decrease the average efficiency of Predict projects. That’s why we estimate Predict efficiency at 80% to be roughly 2:1, rather than the 1.75:1 seen in the simulations.

Maximizing Fuel Efficiency

Which option would you choose to maximize the fuel efficiency of your next document review? While mileage may vary in driving toward 80% recall, keyword search averages out at an efficiency of about 10:1, while Predict averages out at roughly 2:1. So, unless you know for sure that you will be coasting downhill for the entire review, choosing Predict will likely make your review about 5 times more efficient.

1. What we mean by “efficiency” is how many documents you need to review to find one relevant one. This is expressed as a ratio— such as 10:1 for keyword search versus 2:1 for a CAL-based process. And this is really just another way to look at review precision, since an efficiency of 10:1 equates to 10% precision, and an efficiency of 2:1 equates to 50% precision.

About Andrew Bye

Andrew is the director of machine learning and analytics at Catalyst, and a search and information retrieval expert. Throughout his career, Andrew has developed search practices for e-discovery, and has worked closely with clients to implement effective workflows from data delivery through statistical validation. Before joining Catalyst, Andrew was a data scientist at Recommind. He has also worked as an independent data consultant, advising legal professionals on workflow and search needs. Andrew has a bachelor’s degree in linguistics from the University of California, Berkeley and a master’s in linguistics from the University of California, Los Angeles.


About John Tredennick

A nationally known trial lawyer and longtime litigation partner at Holland & Hart, John founded Catalyst in 2000. Over the past four decades he has written or edited eight books and countless articles on legal technology topics, including two American Bar Association best sellers on using computers in litigation technology, a book (supplemented annually) on deposition techniques and several other widely-read books on legal analytics and technology. He served as Chair of the ABA’s Law Practice Section and edited its flagship magazine for six years. John’s legal and technology acumen has earned him numerous awards including being named by the American Lawyer as one of the top six “E-Discovery Trailblazers,” named to the FastCase 50 as a legal visionary and named him one of the “Top 100 Global Technology Leaders” by London Citytech magazine. He has also been named the Ernst & Young Entrepreneur of the Year for Technology in the Rocky Mountain Region, and Top Technology Entrepreneur by the Colorado Software and Internet Association. John regularly speaks on legal technology to audiences across the globe. In his spare time, you will find him competing on the national equestrian show jumping circuit or playing drums and singing in a classic rock jam band.


About Thomas Gricks

Managing Director, Professional Services, Catalyst. A prominent e-discovery lawyer and one of the nation's leading authorities on the use of TAR in litigation, Tom advises corporations and law firms on best practices for applying Catalyst's TAR technology, Insight Predict, to reduce the time and cost of discovery. He has more than 25 years’ experience as a trial lawyer and in-house counsel, most recently with the law firm Schnader Harrison Segal & Lewis, where he was a partner and chair of the e-Discovery Practice Group.


About Jeremy Pickens

Jeremy Pickens is one of the world’s leading information retrieval scientists and a pioneer in the field of collaborative exploratory search, a form of information seeking in which a group of people who share a common information need actively collaborate to achieve it. Dr. Pickens has seven patents and patents pending in the field of search and information retrieval. As Chief Scientist at Catalyst, Dr. Pickens has spearheaded the development of Insight Predict. His ongoing research and development focuses on methods for continuous learning, and the variety of real world technology assisted review workflows that are only possible with this approach. Dr. Pickens earned his doctoral degree at the University of Massachusetts, Amherst, Center for Intelligent Information Retrieval. He conducted his post-doctoral work at King’s College, London. Before joining Catalyst, he spent five years as a research scientist at FX Palo Alto Lab, Inc. In addition to his Catalyst responsibilities, he continues to organize research workshops and speak at scientific conferences around the world.