Pioneering Cormack/Grossman Study Validates Continuous Learning, Judgmental Seeds and Review Team Training for Technology Assisted Review

This past weekend I received an advance copy of a new research paper prepared by Gordon Cormack and Maura Grossman, “Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery.” They have posted an author’s copy here.

The study attempted to answer one of the more important questions surrounding TAR methodology:

Should training documents be selected at random, or should they be selected using one or more non-random methods, such as keyword search or active learning?

Their conclusion was unequivocal:

The results show that entirely non-random training methods, in which the initial training documents are selected using a simple keyword search, and subsequent training documents are selected by active learning, require substantially and significantly less human review effort (P < 0.01) to achieve any given level of recall, than passive learning, in which the machine-learning algorithm plays no role in the selection of training documents.

Among passive-learning methods, significantly less human review effort (P < 0.01) is required when keywords are used instead of random sampling to select the initial training documents. Among active-learning methods, continuous active learning with relevance feedback yields generally superior results to simple active learning with uncertainty sampling, while avoiding the vexing issue of “stabilization” – determining when training is adequate, and therefore may stop.

The seminal paper is slated to be presented in July in Australia at the annual conference of the Special Interest Group on Information Retrieval (SIGIR), a part of the Association for Computing Machinery (ACM).

Why is This Important?

Their research replicates the findings of much of our own research and validates many of the points about TAR 2.0 which we have made in recent posts here and in an article published in Law Technology News. (Links to the LTN article and our prior posts are collected at the bottom of this post.)

Specifically, Cormack and Grossman conclude from their research that:

  1. Continuous active learning is more effective than the passive learning (one-time training) used by most TAR 1.0 systems.
  2. Judgmental seeds and review using relevance feedback is more effective than random seeds, and particularly for sparse collections.
  3. Subject matter experts aren’t necessary for training; review teams and relevance feedback are just as effective for training.

Their findings open the door to a more fluid approach to TAR, one we have advocated and used for many years. Rather than have subject matter experts click endlessly through randomly selected documents, let them find as many good judgmental seeds as possible. The review team can get going right away and the team’s judgments can be continuously fed back into the system for even better ranking. Experts can QC outlying review judgments to ensure that the process is as effective as possible.

While I will summarize the paper, I urge you to read it for yourself. At eight pages, this is one of the easier to read academic papers I have run across. Cormack and Grossman write in clear language and their points are easy to follow (for us non-Ph.D.s). That isn’t always true of other SIGIR/academic papers.

The Research

Cormack and Grossman chose eight different review projects for their research. Four came from the 2009 TREC Legal Track Interactive Task program. Four others came from actual reviews conducted in the course of legal proceedings.

The review projects under study ranged from a low of 293,000 documents to a high of just over 1.1 million. Prevalence (richness) was generally low, which is often the case in legal reviews, ranging from 0.25% to 3.92% with a mean of 1.18%.

The goal here was to compare the effectiveness of three TAR protocols:

  1. SPL: Simple Passive Learning.
  2. SAL: Simple Active Learning.
  3. CAL: Continuous Active Learning (with Relevance Feedback).

The first two protocols are typical in TAR 1.0 training. Simple Passive Learning uses randomly-selected documents for the training. Simple Active Learning uses judgmental seeds for the first round of training but then uses computer-generated seeds to further improve the classifier.

Continuous Active Learning also starts with judgmental seeds (like SAL) but then trains using review teams working primarily with highly relevant documents after the first ranking. Catalyst uses a CAL-like approach in Predict, but we further supplement the relevance feedback with a balanced, dynamically selected mixture that includes both relevance feedback and additional documents selected using Predict’s contextual diversity engine.

As the authors explain:

The underlying objective of CAL is to find and review as many of the responsive documents as possible, as quickly as possible. The underlying objective of SAL, on the other hand, is to induce the best classifier possible, considering the level of training effort.

For each of the eight review projects, Cormack and Grossman ran simulated reviews using each of the three protocols. They used review judgments already issued for each project as “ground truth.” They then simulated running training and review in 1,000 seed increments. In a couple of cases they ran their experiments using 100 seed batches but this proved impractical for the entire project.

(As a side note, we have done experiments in which the size of the batch is varied. Generally, the faster and tighter the iteration, the higher the recall for the exact same amount of human effort. Rather than delve further into this here, this topic deserves and will shortly receive its own separate blog post.)

The Results

Here are the key conclusions Cormack and Grossman reached:

The results show SPL to be the least effective TAR method, calling into question not only its utility, but also commonly held beliefs about TAR. The results also show that SAL, while substantially more effective than SPL, is generally less effective than CAL, and as effective as CAL only in a best-case scenario that is unlikely to be achieved in practice.

Our primary implementation of SPL, in which all training documents were randomly selected, yielded dramatically inferior results to our primary implementations of CAL and SAL, in which none of the training documents were randomly selected.

In summary, the use of a seed set selected using a simple keyword search, composed prior to the review, contributes to the effectiveness of all of the TAR protocols investigated in this study.

Perhaps more surprising is the fact that a simple keyword search, composed without prior knowledge of the collection, almost always yields a more effective seed set than random selection, whether for CAL, SAL, or SPL. Even when keyword search is used to select all training documents, the result is generally superior to that achieved when random selection is used. That said, even if passive learning is enhanced using a keyword-selected seed or training set, it is still dramatically inferior to active learning.

While active-learning protocols employing uncertainty sampling are clearly more effective than passive-learning protocols, they tend to focus the reviewer’s attention on marginal rather than legally significant documents. In addition, uncertainty sampling shares a fundamental weakness with passive learning: the need to dene and detect when stabilization has occurred, so as to know when to stop training. In the legal context, this decision is fraught with risk, as premature stabilization could result in insufficient recall and undermine an attorney’s certification of having conducted a reasonable search under (U.S.) Federal Rule of Civil Procedure 26(g)(1)(B).

Their article includes several Yield/Gain charts illustrating their findings. I won’t repost them all here, but here is their first chart as an example. It shows comparative results for the three protocols for TREC Topic 201. You can easily see that Continuous Active Learning resulted in a higher level of recall after review of fewer documents, which is the key to keeping review costs in check:


No doubt some people will challenge their conclusions, but they cannot be ignored as we move from TAR 1.0 to the next generation.


As the authors point out:

This study highlights an alternative approach – continuous active learning with relevance feedback – that demonstrates superior performance, while avoiding certain problems associated with uncertainty sampling and passive learning. CAL also offers the reviewer the opportunity to quickly identify legally significant documents that can guide litigation strategy, and can readily adapt when new documents are added to the collection, or new issues or interpretations of relevance arise.

From the beginning, we argued that continuous ranking/continuous learning is more effective than the TAR 1.0 approach of a one-time cutoff. We have also argued that clicking through thousands of randomly selected seeds is less effective for training than actively finding relevant documents and using them instead. And, lastly, we have issued our own research suggesting strongly that subject matter experts are not necessary for TAR training and can be put to better use finding useful documents for training and doing QC of outlier review team judgments, continuously and on the fly, with the ability to always determine where the outlier pool is shifting as review continues.

It is nice to see that others agree and are providing even more research to back up these important points. TAR 2.0 is here to stay.

Further reading on Catalyst’s research and findings about TAR 2.0:

6 thoughts on “Pioneering Cormack/Grossman Study Validates Continuous Learning, Judgmental Seeds and Review Team Training for Technology Assisted Review

  1. Pingback: Pioneering Cormack/Grossman Study Validates Continuous Learning, Judgmental Seeds and Review Team Training for Technology Assisted Review | @ComplexD

  2. Ken Chasse

    The TAR strategy is wrong. “Reading” documents by electronic means will never provide sufficient reliability or cost-saving for litigants of all income levels, i.e., 2-tier justice to benefit the rich only, is not a constitutionally acceptable answer. Instead, compare the auditor and legal researcher. Because the client is required to make its own financial records, the auditor doesn’t have to sort thousands of documents with a TAR device before doing the audit. Therefore teach the clients to apply a similar “indexing” discipline to all of their records. The information made available will be as useful as that in their financial records. Then the client’s attorney (lawyer) can search the client’s index to combine the searching and review stages, using the speed of electronic searching. And comparing legal research, there is: (1) highly indexed and summarized material to be searched; (2) by a highly trained researcher; using, (3) the speed of electronic searching. Therefore no “proportionality principle” is needed to limit the required amount of legal research to avoid “undue burden of cost.” So, solve the cost of review problem by bringing the same advantages to the client. Using the client’s index to prepare to make production will be one of the several functions of the “records management lawyer” specialist, who will be necessary in order to provide electronic records technology with an adequate legal infrastructure. It will be one of the most profitable areas of the practice of law. I’ve done such work for many years.

  3. Ken Chasse

    See also: (1) “Solving the High Cost of the ‘Review’ Stage of Electronic Discovery”;

    (2) “The Dependence of Electronic Discovery and Admissibility upon Electronic Records Management”;
    Ken Chasse, Toronto, Canada.

  4. Pingback: Latest Grossman and Cormack Study Proves Folly of Using Random Search For Machine Training – Part Three | e-Discovery Team ®

  5. Pingback: Predictive Analytics Alphabet Soup | Defines eDiscovery and Litigation Support Technology Methods « D4 eDiscovery

  6. Pingback: How Much Can I Save With CAL? A Closer Look at the Grossman/Cormack Research Results |

Leave a Reply

Your email address will not be published. Required fields are marked *