Your TAR Temperature is 98.6 — That’s A Pretty Hot Result

Our Summit partner, DSi, has a large financial institution client that had allegedly been defrauded by a borrower. The details aren’t important to this discussion, but assume the borrower employed a variety of creative accounting techniques to make its financial position look better than it really was. And, as is often the case, the problems were missed by the accounting and other financial professionals conducting due diligence. Indeed, there were strong factual suggestions that one or more of the professionals were in on the scam.

As the fraud came to light, litigation followed. Perhaps in retaliation or simply to mount a counter offense, the defendant borrower hit the bank with lengthy document requests. After collection and best efforts culling, our client was still left with over 2.1 million documents which might be responsive. Neither time deadlines nor budget allowed for manual review of that volume of documents. Keyword search offered some help but the problem remained. What to do with 2.1 million potentially responsive documents?

TAR 2.0: Continuous Active Learning

DSi loaded the documents into Insight Predict, which is our proprietary system for Technology Assisted Review. Predict uses an advanced form of Continuous Active Learning (“CAL”) that we developed over the past few years. The process takes advantage of Insight’s ability to rank documents in the review population on a continuous basis. As reviewers tag documents, the system takes into account the new judgments and then re-ranks the remaining, unseen documents, feeding increasingly more relevant documents to the reviewers. This review/train/ranking cycle continues until the review is complete.

Our continuous learning process often starts with documents found through keyword search or any other method of locating relevant documents¾such as witness interviews, key custodian review, etc. We typically feed these seeds into the system for an initial ranking to get the process started. They can also be added later to aid in training the algorithm. No matter how documents are found or where they are coded, those additional judgments can be fed back into Predict as judgmental seeds/training documents.

Continuous ranking also allows us to handle rolling document collections, which occurred in this case (and are common in most cases). Because we do not train based on a separate control set (but instead measure the fluctuation in ranking across all the files), we can add newly collected documents on the fly. They are immediately ranked, and any new subjects introduced by the added collections are identified for review by our contextual diversity algorithm.

You can read about our advanced CAL protocol here. This general protocol has been validated by two leading TAR experts through peer-reviewed research for the Association of Computing Machinery available here.

Results: 98% Recall; Only 6% Reviewed

In this case, our CAL protocol allowed the review team to complete its review after reviewing only about 136,000 documents out of the total population of approximately 2.1 million.[1] Of the documents reviewed, the team marked 23,950 as relevant. A systematic sample of just under 6,000 documents confirmed that the team had found approximately 98% of the relevant documents in the collection.

We can illustrate these results through a yield curve which is drawn from the systematic sample taken at the end of the review.

Yield Curve Showing Results of Predictive Ranking


In this yield curve, the X axis shows the number of documents actually reviewed by the team as a percentage of the total documents. The Y axis shows the percentage of relevant/responsive documents found as the review progressed. The red line shows the expected outcome of a linear review where documents would be presented randomly. The blue line shows the progress of the review team in finding relevant documents.

In total, the team reviewed approximately 6% of the document population and found 98% of the relevant files.

Seed Sets

TAR 1.0 products require that a senior attorney, often called a Subject Matter Expert (SME), do initial training before review can begin. Training is iterative in that the SME goes through a series of training rounds before the process is complete. It is a one-time process in that when training concludes, that is it. The review team jumps in to look at documents but there is no easy mechanism to return their judgments to the algorithm to make it smarter. One-time training means a one-time ranking.

Predict is built using a TAR 2.0 engine that allows but does not require that an SME do initial training. In previous posts here and here, we documented our research showing that SME training is not necessarily better than reviewer training. Predict encourages the use of review teams for training and the use of senior attorneys to find relevant documents using keyword search, witness interviews and any other means at their disposal.

In this case the senior attorneys used Insight’s powerful search tools to find initial seeds for training. They also were able to use relevant documents from an earlier production as examples of positive seeds. Predict allows judgment seeds like these to be added at any stage of the process.

Prioritized Review

After the initial ranking based on keyword and tagged seed documents, the review team began reviewing batches containing a mixture of highly ranked documents (likely relevant) plus a smaller number of exploratory documents chosen by the system through a “contextual diversity” algorithm.

The purpose of the contextual diversity process is to find documents that are markedly different than the ones already reviewed. Our proprietary algorithm identifies the most diverse sets of documents and pulls a representative document from each and presents it to the reviewer as part of the review batch. If the reviewer tags it as relevant, Predict uses this new information to promote other, similar documents for review. You can read about contextual diversity here.

Through sampling we estimated that about 1 in 100 of the documents in the total population was relevant to this inquiry, indicating a richness of 1%. As the CAL review progressed and the training took hold, reviewers received higher volumes of relevant documents, reflecting CAL’s objective to move relevant documents to the top of the order. Reviewers were seeing relevance rates between 10% and 25% and occasionally 35%, which represented a large increase over what could be expected from a linear review. You can read about the impact of a 7% improvement on review rates here.

Ultimately, the review team continued the review until the percentage of relevant documents in their batches petered out. At that point we conducted a systematic sample to determine our success at finding relevant documents.


We built our yield curve based on a systematic sample of just under 6,000 documents. We then focused on 5,354 sample documents that had not been reviewed and thus came from the approximately 1.8 million documents left in the discard pile (i.e. below the cutoff). Our purpose was to confirm that we were not leaving too many relevant documents in the discard pile and to calculate recall.

Out of the 5,354 not-reviewed samples, the attorneys found only one document that they tagged as relevant. Using a binomial calculator, which can be found here, we can calculate a point estimate for richness in the discard pile along with a confidence interval around that point estimate. Here are the figures we obtained.


With a point estimate of .02%, we estimate that there could be 371 relevant documents in the discard pile (out of 1,852,589). Using the upper confidence interval figure (0.0010) to calculate a worst-case scenario, we estimate that there could be as many as 1,853 documents in the discard pile. Note that we are using a confidence level of 95% for this calculation, which is an industry standard.

As noted earlier, we found 23,950 relevant documents for the review. Using our point estimate we can estimate that the team found 98% of the relevant documents (23,950 / 24,321). If we use the higher boundary of the confidence interval, we can estimate that the team found at least 93% of the relevant documents (23,950 / 25,803). Both are markedly higher than the recall values approved by the courts, which typically are closer to 75%.

TAR at 98.6? Pretty Hot

So, the team found about 98% of the relevant documents (at least 93% at the low range) after viewing only about 6% of the document population. Being able to confidently skip the review of over 1.8 million documents seems pretty hot to me. How about you?


[1] I note that this includes training and review. With Continuous Active Learning, training and review are part of the same process. Thus, there is no requirement that a separate subject matter expert review 3,000 or so documents as “training” in advance of the review.


About John Tredennick

A nationally known trial lawyer and longtime litigation partner at Holland & Hart, John founded Catalyst in 2000. Over the past four decades he has written or edited eight books and countless articles on legal technology topics, including two American Bar Association best sellers on using computers in litigation technology, a book (supplemented annually) on deposition techniques and several other widely-read books on legal analytics and technology. He served as Chair of the ABA’s Law Practice Section and edited its flagship magazine for six years. John’s legal and technology acumen has earned him numerous awards including being named by the American Lawyer as one of the top six “E-Discovery Trailblazers,” named to the FastCase 50 as a legal visionary and named him one of the “Top 100 Global Technology Leaders” by London Citytech magazine. He has also been named the Ernst & Young Entrepreneur of the Year for Technology in the Rocky Mountain Region, and Top Technology Entrepreneur by the Colorado Software and Internet Association. John regularly speaks on legal technology to audiences across the globe. In his spare time, you will find him competing on the national equestrian show jumping circuit or playing drums and singing in a classic rock jam band.