On Jan. 24, Law Technology News published John’s article, “Five Myths about Technology Assisted Review.” The article challenged several conventional assumptions about the predictive coding process and generated a lot of interest and a bit of dyspepsia too. At the least, it got some good discussions going and perhaps nudged the status quo a bit in the balance.
One writer, Roe Frazer, took issue with our views in a blog post he wrote. Apparently, he tried to post his comments with Law Technology News but was unsuccessful. Instead, he posted his reaction on the blog of his company, Cicayda. We would have responded there but we don’t see a spot for replies on that blog either.
We love comments like these and the discussion that follows. This post offers our thoughts on the points raised by Mr. Frazer and we welcome replies right here for anyone interested in adding to the debate. TAR 1.0 is a challenging-enough topic to understand. When you start pushing the limits into TAR 2.0, it gets really interesting. In any event, you can’t move the industry forward without spirited debate. The more the merrier.
We will do our best to summarize Mr. Frazer’s comments and offer our responses.
1. Only One Bite at the Apple?
Mr. Frazer suggests we were “just a bit off target” on the nature of our criticism. He rightly points out that litigation is an iterative (“circular” he calls it) business.
When new information comes into a case through initial discovery, with TAR/PC you must go back and re-train the system. If a new claim or new party gets added, then a document previously coded one way may have a completely different meaning and level of importance in light of the way the data facts changed. This is even more so the case with testimony, new rounds of productions, non-party documents, heck even social media, or public databases. If this happens multiple times, you wind up reviewing a ton of documents to have any confidence in the system. Results are suspect at best. Cost savings are gone. Time is wasted. Attorneys, entrusted with actually litigating the case, do not and should not trust it, and thus smartly review even more documents on their own at high pay rates. I fail to see the value of “continuous learning”, or why this is better. It cannot be.
He might be missing our point here. Certainly he is correct when he says that more training is always needed when new issues arise, or when new documents are added to the collection. And there are different ways of doing that additional training, some of which are smarter than others. But that is the purview of Myth #4, so we’ll address it below. Let us, therefore, clarify that when we’re talking about “only one bite of the apple,” we’re talking about what happens when the collection is static and no new issues are added.
To give a little background, let us explain what we understand to be the current, gold standard TAR workflow, to which we are reacting. What we see the industry in general saying is that the way TAR works is that you get ahold of the most senior, experienced, expertise-laden individual that you can, and then you sit that person down in front of an active learning TAR training (learning) algorithm and have the person iteratively judge thousands of documents until the system “stabilizes.” Then you apply the results of that learning to your entire collection and batch out the top documents to your contract review team for final proofing. At the point you do that batching, says the industry, learning is complete, finito, over, done. Even if you trust your contract review team to judge batched-out documents, none of those judgments are ever fed back into the system, to be used for further training to improve the ranking from the algorithm.
Myth #1 says that it doesn’t have to be that way. What “continuous learning” means is that all judgments during the review should get fed back into the core algorithm to improve the quality with regard to any and all documents that have not yet received human attention. And the reason why it is better? Empirically, we’ve seen it to be better. We’ve done experiments in which we’ve trained an algorithm to “stability,” and then we’ve continued training even during the batched-out review phase – and seen that the total number of documents that need to be examined until a defensible threshold is hit continues to go down. Is there value in being able to save even more on review costs? We think that there is.
You can see some of the results of our testing on the benefits of continuous learning here.
2. Are Subject Matter Experts Required?
We understand that this is a controversial issue and that it will take time before people become comfortable with this new approach. To quote Mr. Frazer:
To the contrary, using a subject matter expert is critical to the success of litigation – that is a big reason AmLaw 200 firms get hired. Critical thinking and strategy by a human lawyer is essential to a well-designed discovery plan. The expertise leads to better training decisions and better defensibility at the outset. I thus find your discussion of human fallibility and review teams puzzling.
Document review is mind numbing and people are inconsistent in tagging which is one of the reasons for having the expert in the first place. With a subject matter expert, you are limiting the amount of fallible humans in the process. We have seen many “review lawyers” and we have yet to find one who does not need direction by a subject matter expert. One of the marketing justifications for using TAR/PC is that human review teams are average to poor at finding relevant documents – it must be worse without a subject matter expert. I do agree with your statement that “most senior attorneys… feel they have better things to do than TAR training in any event.” With this truth, you have recognized the problem with the whole system: Spend $100k+ on a review process, eat up a large portion of the client’s litigation budget, yet the expert litigation team who they hired has not looked at a single document, while review attorneys have been “training” the system? Not relying on an expert seems to contradict your point 3, ergo.
Again, the nature of this response indicates that you are approaching this from the standard TAR workflow, which is to have your most senior expert sit for a number of days and train to stability, and then never have the machine learn anything again. To dispel the notion that this workflow is the only way in which TAR can or even should work is one reason we’re introducing these myths in the first place. What we are saying in our Myth #2 is not that you would never have senior attorneys or subject matter experts involved in any way. Of course that person should train the contract reviewers. Rather, we are saying that you can involve non-experts, non-senior attorneys in the actual training of the system and achieve results that are just as good as having *only* a senior attorney sit and train the system. And our method dramatically lowers both your total time cost and your total monetary cost in the process.
For example, imagine a workflow in which your contract reviewers, rather than your senior attorney, do all the initial training on those thousands of documents. Then, at some point later in the process, your senior attorney steps in and re-judges a small fraction of the existing training documents. He or she corrects via the assistance of a smart algorithm only the most egregious, inconsistent training outliers and then resubmits for a final ranking. We’ve tested this workflow empirically, and found that it yields results that are just as good, if not better, than the senior attorney working alone, training every single document. (See: Subject Matter Experts: What Role Should They Play in TAR 2.0 Training?)
Moreover, you can get through training more quickly, because now you have a team working in parallel, rather than an individual working serially. Add to that the fact that your senior attorney does not always have free time the moment that training needs to get done, the flexibility to bring that senior attorney in at a later point, and do a tenth of the work that he or she would otherwise have to do, and you have a recipe for success. That’s what this myth is about – the notion that the rest of the industry has, and your own response indicates, that unless a senior attorney does every action that in any way affects the training of the system, it is a recipe for disaster. It is not; that is a myth.
And again, we justify this not through appeals to authority (“that is a big reason AmLaw 200 firms get hired”), but through empirical methods. We’ve tested it out extensively. But if appeals to authority are what is needed to show that the algorithms we employ are capable of successfully supporting these alternative workflows, we can do so. Our in-house senior research scientist, Jeremy Pickens, has his PhD from one of the top two information retrieval research labs in the country, and not only holds numerous patents on the topic, but has received the best paper award at the top information retrieval conference in the world (ACM SIGIR). Blah blah blah. But we’d prefer not to have to resort to appeals to authority, because empirical validation is so much more effective.
Please note also that we in no way *force* you to use non-senior attorneys during the training process. You are of course free to work however you want to work. However, should time or money be an issue, we’ve designed our system so as to allow you to successfully and more efficiently work in a way that doesn’t only require senior attorneys or experts to do your training, exclusively.
3. Must I Train on Randomly Selected Documents?
We pointed out in our article that it is a myth that TAR training can only be on random documents.
You totally glossed over bias. Every true scientific study says that judgmental sampling is fraught with bias. Research into sampling in other disciplines is that results from judgmental sampling should be accepted with extreme caution at best. It is probably even worse in litigation where “winning” is the driving force and speed is omnipresent. Yet, btw, those who advocate judgmental sampling in eDiscovery, Grossman, e.g., also advocate that the subject matter experts select the documents – this contradicts your points in 2. You make a true point about the richness of the population making it difficult to find documents, but this militates against random selection, not for it. To us this shows another reason why TAR/PC is broken. Indeed “clicking through thousands of random documents is boring” – but this begs the question. It was never fun reviewing a warehouse of banker’s documents either. But it is real darn fun when you find the one hidden document that ties everything together, and wins your case. What is boring or not fun has nothing to do with the quality of results in a civil case or criminal investigation.
I hope we have managed to clarify that Myth #2 is not actually saying that you never have to involve a senior attorney in any way, shape or form. Rather we believe that a senior attorney doesn’t have to do every single piece of the TAR training, in all forms, at all times. Once you understand this, you quickly realize that there is no contradiction between what Maura Grossman is saying and what we are saying.
If you want to do judgmental sampling, let your senior attorney and all of his or her wisdom be employed in the process of creating the search queries used to find interesting documents. But instead of requiring that senior person to then look at every single result of those search queries, let your contract reviewers comb through those. In that manner, you can involve your senior attorney where his or her skills are the most valuable and where his or her time is the most precious. It takes a lot less time to issue a few queries than it does to sit and judge thousands of documents. Are we the only vendor out there aware of the notion that the person who issues searches for the documents and who judges all the found documents doesn’t have to be the same person? We would hope not, but perhaps we are.
Now, to the issue of bias. You’re quite right to be concerned about this, and we fault the necessary brevity of our original article in not being able to go into enough detail to satisfy your valid concerns. So we would recommend reading the following article, as it goes into much more depth about how bias is overcome when you start judgmentally, and it backs up its explanations empirically: Predictive Ranking: Technology-Assisted Review Designed for the Real World.
Imagine your TAR algorithm as a seesaw. That seesaw has to be balanced, right? So you have many in the industry saying that the only way to balance it is to randomly select documents along the length of that seesaw. In that manner, you’ll approximately have the same number of docs, at the same distance from the center, on both sides of the seesaw. And the seesaw will therefore be balanced. Judgmental sampling, on the other hand, is like plopping someone down on the far end of the seesaw. That entire side sinks down, and raises the other side high into the air, throwing off the balance. Well, in that case, the best way to balance the seesaw again is to explicitly plop down another equal weight on the exact opposite end of the seesaw, bringing the entire system to equilibrium.
What we’ve designed in the Catalyst system is an algorithm that we call “contextual diversity.” “Contextual” refers to where things have already been plopped down on that seesaw. The “diversity” means “that area of the collection that is most about the things that you know that you know nothing about,” i.e. that exact opposite end of the seesaw, rather than some arbitrary, random point. Catalyst’s contextual diversity algorithm explicitly finds and models those balancing points, and surfaces those to your human judge(s) for coding. In this manner, you can both start judgmentally *and* overcome bias. We apologize that this was not as clear in the original 5 Myths article, but we hope that this explanation helps.
We go into this subject in more detail here.
4. You Can’t Start Training until You Have All of Your Documents
One of the toughest issues in TAR systems is the requirement that you collect all of your documents before you start TAR training. This limitation stems from the use of a randomly selected control set to both guide training and provide defensibility. If you add new documents to the mix (rolling uploads), they will not be represented in the control set. Thus even if you continue training with some of these new documents, your control set would be invalid and you lose defensibility.
You might have missed that point in your comments:
I think this is similar to #1 in that you are not recognizing the true criticism that things change too much in litigation. While you can start training whenever you want and there are algorithms that will allow you to conduct new rounds on top of old rounds – the real problem is that you must go back and change previous coding decisions because the nature of the case has changed. To me, this is more akin to “continuous nonproductivity” than “continuous learning.”
The way in which we natively handle rolling uploads from a defensibility standpoint is to not rely on a human-judged control set. There are other intelligent metrics we use to monitor the progress of training, so we do not abandon the need for reference, or our defensibility, altogether – just the need for expensive, human-judged reference.
The way other systems have to work, in order to keep their control set valid, is to judge another statistically valid sample of documents from the newly arrived set. And in our experience, in the cases we’ve dealt with over the past five years, there have been on average around 67 separate uploads until the collection was complete. Let’s be conservative and assume you’re dealing with a third of that – say only 20 separate uploads from start to finish. As each new upload arrives, you’re judging 500 randomly selected documents just to create a control set. 500 * 20 = 10,000. And let’s suppose your senior attorney gets through 50 documents an hour. That’s 200 hours of work just to create a defensible control set, with not even a single training document yet judged. And since you’ve already stated that you need to hire an AmLaw 200 senior attorney to judge these documents, at $400/hour that would be $80,000. Our approach saves you that money right off the bat by being able to natively handle the control set/rolling upload issue. Plug in your own numbers if you don’t like these, but our guess is that it’ll still add up to a significant savings.
But the control set is only half of the story. The other half is the training itself. Let us distinguish if we may between an issue that changes, and a collection that changes. If it is your issue itself (i.e. your definition of responsiveness) that changes when new documents are collected, then certainly nothing we’ve explicitly said in these Five Myths will address that problem. However, if all you are talking about is the changing expression of an unchanging issue, then we are good to go.
What do we mean by the changing expression of an unchanging issue? We mean that if you’ve collected from your engineering custodians first, and started to train the system on those documents, and then suddenly a bunch of marketing custodians arrive, that doesn’t actually change the issue that you’re looking for. What responsiveness was about before is still what responsiveness is about now. However, how that issue is expressed will change. The language that the marketers use is very different than the language that the engineers use, even if they’re talking about the same responsive “aboutness.”
This is exactly why training is a problem for the standard TAR 1.0 workflow. If you’re working in a way that requires your expert to judge all the documents up front, then if the collection grows (by adding the marketing documents to the engineering collection), that expert’s work is not really applicable to the new information and you have to go back to the drawing board, selecting another set of random documents so as to avoid bias, feed those yet again to a busy, time-pressed expert, etc. That is extremely inefficient.
What we do with our continuous learning is once again employ that “contextual diversity” algorithm that we mentioned above. Let us return to the seesaw analogy. Imagine that you’ve got your seesaw, and through the training that you’ve done it is now completely balanced. Now, a new subset of (marketing) documents appears; that is like adding a third plank to the original seesaw. Clearly what happens is that now things are unbalanced again. The two existing planks sink down to the ground, and that third plank shoots up into the air. So how do we solve for that imbalance, without wasting the effort that has gone into understanding the first two planks? Again, we use our contextual diversity algorithm to find the most effective balance point, in the most efficient, direct (aka non-random) manner possible.
Contextual diversity cares neither why nor how the training over a collection of documents is imbalanced. It simply detects the most effective points that, once pressure is applied to those points, rebalance the system. It does not matter if the seesaw started with two planks and then suddenly grew a third via rolling uploads, or if the seesaw started with three planks, and someone’s judgmental sampling only hit two of those planks. In both cases, there is imbalance, and in both cases, the algorithm explicitly models and corrects for that imbalance.
You can read more about this topic here.
5. TAR Does Not Work for non-English Documents
Many people have now realized that, properly done, TAR can work for other languages including the challenging CJK (Chinese, Japanese and Korean) languages. As we explained in the article, TAR is a “mathematical process that ranks documents based on word frequency. It has no idea what the words mean.”
Mr. Frazer seems to agree but is pitching a different kind of system for TAR:
Words are the weapons of lawyers so why in the world would you use a technology that does not know what they mean? TAR & PC are, IMHO, roads of diversion (perhaps destruction in an outlier case) for the true litigator. They are born out of the need to reduce data, rather than know what is in a large data set. They ignore a far better system is one that empowers the subject matter experts, the true litigators, and even the review team to use their intelligence, experience, and unique skills to find what they need, quickly and efficiently, regardless of how big the data is. A system is needed to understand the words in documents, and read them as a human being would.
There are a lot of points that we could say in response to this, but this post is lengthy enough as it is. So let us briefly make just two points. The first is that we think Natural Language Processing (which apparently your company uses) techniques are great. There is a lot of value there. And frankly, we think that NLP techniques complement, rather than oppose, the more purely statistical techniques.
That said, our second point is simply to note that in some of the cases that we’ve dealt with here at Catalyst, we have datasets in which over 85% of the documents are computer source code. Where there is no natural language, there can be no NLP. And yet TAR still has to be able to handle those documents as well. So perhaps we should extend Myth #5 to say that it’s a myth that “TAR Does Not Work for Non-Human Language Documents.”
In writing the Five Myths of TAR, our point wasn’t to claim that Catalyst has the only way to address the practical limitations of early TAR systems. To the contrary, there are many approaches to technology-assisted review which a prospective user will want to consider, and some are more cost and time effective than others. Rather, our goal was to dispel certain myths that limit the utility of TAR and let people know that there are practical answers to early TAR limitations. Debating which of those answers works best should be the subject of many of these discussions. We enjoy the debate and try to learn from others as we go along.