The Luck of the Irish: TAR Approved by Irish High Court

I do not know if any leprechauns appeared in this case, but the Irish High Court found the proverbial pot of gold under the TAR rainbow in Irish Bank Resolution Corp. vs. Quinn—the first decision outside the U.S. to approve the use of Technology Assisted Review for civil discovery.

The protocol at issue in the March 3, 2015, decision was TAR 1.0 (Clearwell). For that reason, some of the points addressed by the court will be immaterial for legal professionals who use the more-advanced TAR 2.0 and Continuous Active Learning (CAL). Even so, the case makes for an interesting read, both for its description of the TAR process at issue and for its ultimate outcome.

The Facts

This case grew out of the liquidation of the former Anglo Irish Bank. The plaintiffs claimed that defendants tried to wrongfully convert assets of the bank to the tune of €455 million (about $514.7 million).

download-pdfIn response to defendants’ document-production requests, plaintiffs ran keyword searches and found 1.7 million relevant documents. After de-duplication and culling, they were still left with 680,809 documents. Traditional linear review of these documents could take nine months and cost over 2 million, they estimated.

At that point, plaintiffs sought the court’s approval to use TAR to further reduce their review. One of their two experts—Conor Crowley, e-discovery counsel at UBS AG in Hong Kong—estimated that TAR would reduce the review population to 10% of the total. Not surprisingly, defendants opposed the request.

Training by Subject Matter Expert

Before reaching the legal issues, the court provided a lengthy rendition of the TAR 1.0 process, where senior partners (or a small team) do all of the training.

The expert group selects a first seed-set/training-set of documents, usually 25, from the data set. The expert interacts with the system by asking a yes/no question of the document against a series of controlled samples. (The plaintiff’s proposed question in this case is: Is this document relevant to any discovery category?) The documents in the training set will include privileged documents. The system builds a knowledge model as it learns from the expert and presents further samples for review.

Normally around 25-50 iterations (repetitions) are sufficient to build the model to the point where it can predict what the expert will chose as responsive in the sample being reviewed.

Approximately 1,000 documents will be used in the training sets in this case to get to the stable point.

The court’s decision gave no explanation as to how they could be sure the training would be complete after 1,000 documents. TAR 1.0 training can sometimes run to 3,000 documents or more. Indeed, research suggests that optimal TAR 1.0 training can involve as many as 8.000 documents. See, Gordon V. Cormack & Maura R. Grossman, Evaluation of Machine Learning Protocols for Technology-Assisted Review in Electronic Discovery, in Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘14) (July 2014).

Control Set?

The court then said that a further control set would be used to test the prediction model. I am not aware that a TAR 1.0 system requires multiple control sets for training, so the court may have meant that a further sample would be used to test the effectiveness of the ranking after the TAR 1.0 cutoff occurred. In any event, the court went on to note that sampling is appropriate to confirm that the discards do not have too many relevant documents.

[A] percentage of the remainder documents are also reviewed to ascertain the presence of relevant documents (false negatives). That percentage is determined by the model using random sampling.

Training would continue until stabilization, the court noted, at which point the system would rank all of the documents. Plaintiffs offered to share their cutoff threshold and open it to challenge. They also offered to use all reasonable quality control measures to test the discard documents.

Reviewing the Training Documents

In their proposed protocol, plaintiffs proposed to share the training documents with defendants. This would include both relevant and non-relevant documents, as well as those subject to privilege claims or deemed to include confidential business information. Barristers would be asked to look at confidential documents, with seven days to challenge the relevance designations.

Had this been a TAR 2.0 process, this sharing of initial seed documents used to train the classifier would not have been necessary. With CAL, all of the documents marked relevant constitute training seeds and are typically produced unless withheld on privilege grounds. (For more on this, see our post, Using Continuous Active Learning to Solve the ‘Transparency’ Issue in TAR.)

As long as we trust the human reviewers in their judgments, the only issue for discussion is the number of relevant documents left in the discard pile (not-reviewed). This should and can be determined through sampling and has little or nothing to do with the algorithm selecting the documents being actually reviewed. (See, Measuring Recall in E-Discovery Review, Part One: A Tougher Problem Than You Might Realize and Measuring Recall in E-Discovery Review, Part Two: No Easy Answers.)

The Silly F Measure?

To measure the success of the TAR 1.0 process, the court said that plaintiffs planned to rely on the “F-measure” (also called the F1 measure), which attempts to maximize both precision—the percentage of relevant documents presented for review—and recall—the percentage of relevant documents found through the TAR process.

The problem with this is, in my opinion, is that the F-measure can be misleading, as Bill Dimm and other top search scientists have pointed out. (See, Predictive Coding Performance and the Silly F1 Score.) It may suggest that the review process is optimal when recall is relatively low (50% for example). No court would deem a production process adequate that is based on finding only 50% of the relevant documents. Most courts have required 70% recall or higher.

I believe that recall is the important factor to consider in a TAR process, with precision of secondary importance because it saves on review costs. You can read my views on measuring recall in the two posts mentioned above. Others have also suggested that the F-measure is not a reliable indicator of success. (See, e.g., Ralph Losey’s post, Introducing “ei-Recall” – A New Gold Standard for Recall Calculations in Legal Search – Part One.)

In any event, the plaintiffs’ technical expert recommended that the F-measure be at least 80% “to guarantee a high amount of truly responsive (relevant) documents while giving a good balance to cost of manual review,” the court said. It may be that the experts were recommending reaching 80% recall rather than referring to the F-measure, which is not typically expressed as a percentage.

Defense Objections to TAR

The court next addressed defendants’ objections to TAR. Among their main objections:

  • TAR will not capture all relevant documents. Disagreeing, the court cited Conor Crowley’s testimony that TAR is “the most efficient and effective method of identifying relevant information.”
  • TAR is not suitable for data sets of under 1 million documents. The court cited Crowley for the proposition that TAR works even with 100,000 documents. (In a TAR 2.0 process, it works with 10,000 documents or less.)
  • The training sets are not specific to the discovery categories. The court noted that initial training in this system involved finding relevant documents.
  • A cutoff will lead to the omission of some relevant documents. The court disagreed, citing research that keyword search and human review might omit even more relevant documents.
  • There will be no savings in cost and time. The court simply rejected the argument that linear review would beat a TAR review.

The Law

Turning to its legal analysis, the court started by stating that proportionality, while not specifically expressed in its rules, was relevant to the fair disposition of the case as well as cost savings.

The court also indicated that its decision was guided by U.S. Magistrate Judge Andrew Peck’s opinion in Da Silva Moore v. Publicis Groupe & MSL Grp., 287 F.R.D. 182 (S.D.N.Y. 2012) (Peck, M.J.), affd, 2012 WL 1446534 (S.D.N.Y. Apr. 26, 2012), and by the Sedona Conference Principles and Cooperation Proclamation. Specifically, the court quoted Da Silva Moore:

The objective of review in e-discovery is to identify as many relevant documents as possible, while reviewing as few non-relevant documents as possible. Recall is the fraction of relevant documents identified during a review: precision is the fraction of identified documents that are relevant. Thus, recall is a measure of completeness, while precision is a measure of accuracy or correctness. The goal is for the review method to result in higher recall and higher precision than another method, at a cost proportionate to the value of the case.

Relying on Da Silva Moore and several other U.S. decisions, as well as its inherent authority to regulate discovery, the court showed no hesitation in approving TAR as an appropriate process for complying with Irish discovery.

Pursuant to the legal authorities which I have cited supra, and with particular reference to the albeit limited Irish jurisprudence on the topic, I am satisfied that, provided the process has sufficient transparency, Technology Assisted Review using predictive coding discharges a party’s discovery obligations under Order 31, rule 12.

The Protocol

The court said that it would approve the plaintiffs’ TAR protocol because the plaintiffs:

  1. Sought consent once the need became apparent.
  2. Provided an expert’s affidavit about the methodology.
  3. Submitted a copy of the search words used in the initial scoping exercise.
  4. Produced their expert for examination.
  5. Offered to consider modifying search terms.
  6. Sought court approval when the defense refused the process.

The protocol also provided for disclosure when the training reached “stabilization” and a disclosure procedure for checking the coding of the training documents.

TAR 2.0 Continuous Active Learning

Had the plaintiffs proposed a TAR 2.0 program built around CAL, the protocol would have been much simpler. Here is how it could have played out:

  1. There would be no reason to seek consent when using a TAR 2.0 protocol. There is no control set, no reason to have subject matter experts do the initial training and no special training seeds with potentially an undue impact on the ranking. All of the seeds count.
  2. The party would continue review until a reasonable recall level was obtained. What is reasonable would depend on an initial threshold, typically at least 70%, but should be governed by the document results. In other words, when the curve bends sharply (the elbow), consider stopping the review.For example, here is a good point to consider stopping the review. This was taken from Insight Predict.
  3. Recall would be proven through sampling. We recommend a systematic sample because you can use it to draw a yield curve as well as demonstrate recall.

That’s really all there is to it in a TAR 2.0 process. You get better results with a much simpler process. And it would have made the Irish court’s work much simpler. Sometimes with TAR 20, I find myself looking around for the leprechauns just to see how they pulled that magic off.

mm

About John Tredennick

A nationally known trial lawyer and longtime litigation partner at Holland & Hart, John founded Catalyst in 2000. Over the past four decades he has written or edited eight books and countless articles on legal technology topics, including two American Bar Association best sellers on using computers in litigation technology, a book (supplemented annually) on deposition techniques and several other widely-read books on legal analytics and technology. He served as Chair of the ABA’s Law Practice Section and edited its flagship magazine for six years. John’s legal and technology acumen has earned him numerous awards including being named by the American Lawyer as one of the top six “E-Discovery Trailblazers,” named to the FastCase 50 as a legal visionary and named him one of the “Top 100 Global Technology Leaders” by London Citytech magazine. He has also been named the Ernst & Young Entrepreneur of the Year for Technology in the Rocky Mountain Region, and Top Technology Entrepreneur by the Colorado Software and Internet Association. John regularly speaks on legal technology to audiences across the globe. In his spare time, you will find him competing on the national equestrian show jumping circuit or playing drums and singing in a classic rock jam band.