Ask Catalyst: How Does Insight Predict Handle ‘Bad’ Decisions By Reviewers?

[Editor’s note: This is another post in our “Ask Catalyst” series, in which we answer your questions about e-discovery search and review. To learn more and submit your own question, go here.]

We received this question:

Ask_Catalyst_MN_How_Does_Insight_Predict_Handle_Bad_Decisions_By_Reviewers-04I understand that the QC feature of Insight Predict shows outliers between human decisions versus what Predict believes should be the result. But what if the parties who performed the original review that Predict is using to make judgments were making “bad” decisions? Would the system just use the bad training docs and base decisions just upon those docs?

Similarly, what about the case where half the team is making good decisions and half the team is making bad decisions? Can Insight learn effectively when being fed disparate results on very similar documents?

Can you eliminate the judgments of reviewers if you find they were making poor decisions to keep the system from “learning” bad things and thus making judgments based on the human errors?

Today’s question is answered by Mark Noel, managing director of professional services. 

Most of your questions can be addressed by describing what our head research scientist Jeremy Pickens likes to call the “Tolstoy Principle.”  In Anna Karenina, Tolstoy wrote, “All happy families resemble one another, each unhappy family is unhappy in its own way.”

When using Predict to continuously rank documents, such as for responsiveness, all responsive documents tend to be alike while the non-responsive ones tend to be non-responsive in a great many different ways. This leads to a counter-intuitive effect with training documents.  If half of your reviewers are making good calls and half are making bad calls, they don’t evenly balance each other out.

Instead, the consistently coded documents will have a stronger pattern of similarity than the poorly coded documents, and the noise from the poorly coded documents tends to get filtered out as the algorithm learns (which is every time newly judged documents are made available for it to use). Put another way, the algorithm can distill the pattern from the (more) properly coded documents, while noisy documents have no pattern to distill because they tend to be much more diverse conceptually — each being non-responsive in its own way.

Another finding from our research on actual case data is that documents where a reviewer and a more senior QC attorney disagree tend to be borderline cases. That is, they’re not totally wrong, and the somewhat-responsive parts of the documents are alike while the somewhat non-responsive parts are more varied. This gives rise to a similar effect, because documents that are close-but-not-quite responsive still tend to display a pattern that the algorithm can latch onto and distill out.

In one experiment we ran, we trained multiple Predict classifiers using only those documents that the first-pass reviewers got wrong according to the expert doing QC. Even then, the Predict classifier was able to detect the pattern and train effectively at almost the same rate as when using the expert’s judgments. It wasn’t quite as good, but it was a lot closer than most people would expect, and was way better than random.

So once the responsiveness pattern is distilled by the Predict engine, any outlier documents will stand out pretty strongly, and will get caught in the QC algorithm’s net. Because of the Tolstoy Effect, the system is not as sensitive to reviewers making poor decisions. Rather than exclude a reviewer’s decisions, we use the reporting from the QC phase to identify reviewers who are being overturned a lot on QC so that the review manager or supervising attorney can get with them to make sure they’re on the same page as the rest of the team.

We could go through and pull those reviewers’ seeds later, but it usually doesn’t prove to be necessary, and it doesn’t make much of a difference in Predict’s performance unless the reviewer is really off-base and the review team is very small.