An Open Look at Keyword Search vs. Predictive Analytics

An_Open_Look_at_Keyword_SearchCan keyword search be as or more effective than technology assisted review at finding relevant documents?

A client recently asked me this question and it is one I frequently hear from lawyers. The issue underlying the question is whether a TAR platform such as our Insight Predict is worth the fee we charge for it.

The question is a fair one and it can apply to a range of cases. The short answer, drawing on my 20-plus years of experience as a lawyer, is unequivocally, “It depends.”

Let me walk through my reasoning.

The Question

The question really is twofold: Can you get equal or better results with keyword search than with a product such as Predict? If so, can you do it at lower cost considering the hourly costs to develop and run the searches?

To answer these questions, we have to consider the following:

  1. How many hours will it take to develop the searches and test them, and at what cost?
  2. How many documents do you end up reviewing to complete the process (precision)?
  3. What level of recall was obtained?

The goal in all of this is to reduce the total cost of review across your cases. If it were my case, I wouldn’t care how that got done just so it got done reliably.

Measuring Success

To start the inquiry, we need a reliable measure for success. Several years ago, a different client challenged Predict because we ran a parallel process to keyword search that suggested the team review more documents than hit on the keyword searches. You can read that article here.

To explain why Predict’s results made sense, I created this chart to show the four possible states of the search and Predict results.


As shown in the diagram, both approaches agreed on the documents in the top right and bottom left quadrants (specifically, on documents likely relevant and likely not relevant). The disagreement came in the top left and bottom right quadrants. In brief, Predict found a lot of documents missed by the keyword search and a good number of false hits from the keyword search.

Information retrieval scientists would couch this discussion in terms of precision and recall. The suggestion from the chart above is that keyword searches might have failed in terms of recall, at least if the goal was to reach a level of recall above 50 percent.

However, precision is equally important in calculating the total cost of review. If the method does well with recall but retrieves a lot of irrelevant documents in the process, you meet the recall goal but at the expense of increased review costs.

So Which Approach is Better?

Here is where the lawyer answer comes in: “It depends.”

While keyword search can be effective in finding relevant documents, research has shown that it can suffer from both low recall and poor precision. The oft-touted Blair and Maron study sets a framework for the discussion. In the case they studied, lawyers felt their keyword searches were quite effective, finding 75 percent of the relevant documents.

In fact, according to the scientists, the team had found (on average) just 20 percent of the relevant documents.[1] They were swayed by the seeming precision of the searches (seeing a lot of relevant documents) but didn’t realize they were missing a lot of other relevant documents which didn’t hit on the searches.

You can read more about the study in my recent blog post: Revisiting the Blair and Maron Study: Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System.

The antidote to this problem is to use broader searches, but that comes at the cost of lower precision. That means while recall can be improved over the reported average of 20 percent, the gain would likely come at a reduction in precision. Specifically, the team would be required to look at a lot more irrelevant documents. Total review costs would go up accordingly.

Some litigation support teams address this by using a sophisticated process of iterating keywords over a series of searches, sampling results and then refining the keyword searches. That can certainly improve results, but how much and under what circumstances are questions that require measurement. And, in doing so, you would want to include the cost of developing, refining and running those searches as well.

In contrast, most predictive analytics tools (and certainly Insight Predict) are designed algorithmically to provide the best bang for the buck across most cases. The goal is to bring a desired level of recall (e.g. 75%, 80%, 90%, 95%) while reviewing the fewest possible documents.

While I can’t explain how our proprietary algorithm works, it uses a sophisticated weighing of document features found through a continuous ranking process. The main difference here is that Predict uses tens of thousands of features with both positive and negative weighting during the process, and it refines the training literally hundreds – often thousands – of times during the review. To be sure, humans don’t build the searches (nor could they). But humans still control the process in that they identify the documents from which the searches are built.

Comparing the Two Approaches

So, which approach is better? My first answer is that either approach could be better in a specific case¾depending on facts and circumstances. For example, say you were requested to produce all documents with the company name “Acme” in them. In that case, keyword search would be the quickest and most effective way to go.

We recently had a chance to participate in the Total Recall track of the TREC information retrieval program, which is sponsored by the National Institute of Standards and Technology (NIST). The Total Recall track was lead by Gordon Cormack and Maura Grossman, with help from many others. Its goal was to use machine learning to find relevant documents quickly, and with the least review possible.

TREC has prohibitions on releasing results so I won’t go into detail at this point. However, I can say that a number of the topics lent themselves to keyword search as readily as machine learning. For example, one topic was “affirmative action,” which is not a typical e-discovery subject.[2] We used Predict to find relevant topics but later learned through examination of the answers that a simple keyword search for “affirmative action or one world” would have returned over 90% of the relevant documents with 63% precision. Several other topics out of the 30 total also could have been handled through the right combination of keywords.

How often is that the case in the real world of litigation and regulatory investigations? If your answer is often or always, then keyword search might be the answer for you. In my experience, and that of many others, keyword searches are useful but not effective or efficient at finding a high percentage of relevant documents with high precision.

Ultimately, in each case you have to measure not only results, but the cost of getting those results. Our experience and the research out there suggests that predictive analytics, and particularly a continuous active learning process will outperform keyword search¾both in terms of effectiveness and cost¾in most cases and is a much safer bet as a standard practice.

Keyword vs. Predict

To be clear, we believe in the power of human-generated keyword searches. In fact, we encourage clients to use them for finding seed documents (as opposed to random selection used by some vendors) and for chasing down low-hanging fruit throughout the review process. For us, the question is not whether to use keyword searches. Rather, the question is how to use them most effectively during the review. Different cases will likely suggest different approaches.

If I were in a corporate counsel’s shoes, I would be asking how we might go about determining which cases might be better handled by keyword search rather than Predict. Independent research strongly suggests that keyword search can be less than optimal in many cases, missing large numbers of relevant documents (low recall) while bringing back too many non-relevant documents (low precision). Without doubt, smart searchers running an iterative process can improve results but how much and at what cost?

I am not aware of any research suggesting that keyword search can match the results of a good predictive analytics process across the board. In particular, it is rare to hear about keyword searches getting much above 75 or 80% recall. Perhaps others’ experiences are different but any conclusions should be backed by rigorous analysis (across a wide variety of cases) of the documents that did not return as well as those that did.

Given that you can rarely tell in advance which case is right for a keyword-only approach, it is hard to see why a legal team wouldn’t include predictive analytics as a core strategy. At the least, we need to measure the cost of Predict against the hourly cost of developing and running a keyword process. To that we would add the results differential (recall and precision) before making a comparison.

We recognize that TAR is a new process and that some clients may be leery about paying an extra charge for Predict without being certain of the results. For that reason, I initiated an unconditional guarantee for Predict costs. Simply put, if you use Predict and are unhappy with the results for any reason in the first 90 days, just say so. We will turn it off and refund the Predict fees without question.

The bottom line is that we believe Predict with keyword search will be more effective than keyword search alone in almost every case. We believe that with such certainty that we are willing to put our proverbial money where our mouth is.


[1] It is also important to note that this 20 percent recall figure was an average across many different trials in the study. In some trials, keywords performed much better. But they performed much worse in others. This wide variance in search term performance is another reason to be cautious when comparing techniques from only a few trials.

[2] I should point out that the Total Recall track was not aimed at e-discoverists. Rather, several e-discovery focused teams participated because it was as close to e-discovery as the tracks got. It was fun to participate, but the topics were not set up to match what we are used to in the legal realm. Nor did they purport to be.


About John Tredennick

A nationally known trial lawyer and longtime litigation partner at Holland & Hart, John founded Catalyst in 2000. Over the past four decades he has written or edited eight books and countless articles on legal technology topics, including two American Bar Association best sellers on using computers in litigation technology, a book (supplemented annually) on deposition techniques and several other widely-read books on legal analytics and technology. He served as Chair of the ABA’s Law Practice Section and edited its flagship magazine for six years. John’s legal and technology acumen has earned him numerous awards including being named by the American Lawyer as one of the top six “E-Discovery Trailblazers,” named to the FastCase 50 as a legal visionary and named him one of the “Top 100 Global Technology Leaders” by London Citytech magazine. He has also been named the Ernst & Young Entrepreneur of the Year for Technology in the Rocky Mountain Region, and Top Technology Entrepreneur by the Colorado Software and Internet Association. John regularly speaks on legal technology to audiences across the globe. In his spare time, you will find him competing on the national equestrian show jumping circuit or playing drums and singing in a classic rock jam band.