I recently got a distress call from an e-discovery partner of ours with an unhappy client. “It seems like there is something wrong with your predictive ranking technology,” our partner said on the Google Hangout. “It’s proposing that the client team review too many documents–more than we got with key word searching. Our client is upset. We need to do something to explain this fast.”
In this case the client team had not used technology assisted review (TAR) before; this was their first try at the process. They wanted proof that it was worth the extra cost for the technology. Specifically, they wanted to see whether it actually cut down on review costs, like everyone claimed.
The problem was that the system didn’t seem to work–at least in their eyes. They had started the review process by running a series of key word searches, which was their normal practice. The searches hit on a total of about 11,000 documents, out of a test set of 50,000 documents. This suggested they had only 11,000 documents to review for the production, about 20% of the total collected.
Our partner had recommended that the client try our Predictive Ranking technology as a better means to find responsive documents. Everyone’s initial expectation was that doing so would reduce the review population even further than the 11,000 documents that the key word searches hit. Somehow they got the impression that the review population might go down to more like 7,000. That would certainly justify the extra expense of this new technology.
Unfortunately, the opposite turned out to be the case. Instead of recommending that the team review 7,000 documents, or even 11,000 documents, our system suggested reviewing more than 18,500 documents.
You can imagine the client’s consternation. “You want us to pay you for these results? You just increased rather than reduced my review costs. I like it better the old way.”
It was time to go to work to figure out what happened. Fortunately, our team had analyzed the key word search results and compared them to the documents identified through Predictive Ranking. My job was to explain the difference between the two approaches. My hope was to show that the system worked well and provided a better outcome than key word search¾at least if the goal was to identify and review potentially relevant documents.
To be sure, our Predictive Ranking system came up with more documents to review than did the key word searches. However, we quickly concluded that the key word searches–while finding many potentially responsive documents–missed a lot of others that should be considered as well. Here is how we reached that conclusion.
Let me start by giving you some basic information about the two processes. From there it becomes a bit easier to explain the difference in the results. You can then see how we got to our ultimate conclusion.
The client collected just over 51,000 documents for this production. As a first step, counsel created a set of key word searches and asked our partner to run them using our PowerSearch utility. As I mentioned earlier, the searches hit on about 11,000 documents.
We then started the Predictive Ranking process, which is the name we started using years ago for our TAR methods. The first step was to take an initial random sample for reference purposes and to estimate the overall richness of the population. We came up with an estimate of 22%, which would suggest that there were about 11,000 relevant documents in the collection.
Hmmm. That was pretty close to the number of documents found through the key word searches. Did they nail it this time?
We next worked with the client’s legal experts to review and tag seed documents and then to undertake several rounds of system training. At the end of the process, the system suggested a review cutoff below 36%. That meant that the review team would look at the top 36% of the documents and ignore (after a confirmatory sample) the remaining 64%.
The resulting numbers looked like this:
- Likely responsive and need review: 18,552.
- Likely non-responsive and don’t need review: 32,982
Our sampling also suggested that the top documents above the cutoff had a richness of about 50% (which meant half of these were likely responsive). The documents below the cutoff had a richness of about 7% (which meant that only 7 out of 100 were likely responsive). Seven percent seemed like a good number for the discard pile, one that most courts would accept.
As mentioned earlier, our results caused heartburn for our partner and its client. The key word search approach seemed to require that the team review only 11,318 documents. Why pay for Predictive Ranking if it requires that the team review 7,000 additional documents? That’s about 60% higher than the key word results.
Understanding the Numbers
The answer requires that we better understand what all these figures mean. Unless we can compare apples to apples, we have no way to judge the efficacy of the two approaches. Fortunately we had an easy way to do just that.
We compared the document IDs for the files returned from the key word searches with the files returned from our Predictive Ranking process. What we found was pretty interesting. Let me show you with this simple diagram:
I created this diagram to map the comparative results of our Predictive Ranking and the key word searches. The circle represents the total documents in the population. If you add up the numbers in the four quadrants, it comes up to the 51,534 files at issue
The four quadrants represent the different states of the document population.
- The top left quadrant represents documents that our Predictive Ranking system found likely responsive but did not return from the key word searches. There were 11,285 documents in this category.
- The top right quadrant represents documents that hit under both approaches. These documents were returned by the key word searches and were also designated by our Predictive Ranking system as potentially responsive. There were 7,267 documents in this category.
- The bottom left quadrant represents documents that did not hit under either approach. Neither our Predictive Ranking system nor the key word searches deemed them likely responsive. There were 28,931 documents in this category.
- The bottom right quadrant represents documents that were returned from the key word searches but were not deemed likely responsive by our Predictive Ranking system. There were 4,051 documents in this category.
So, what can we say about all this? First, we can say that the team should probably review the 7,267 documents found in the top right quadrant. Both approaches tagged them as likely responsive. That does not mean that they will all be responsive but it is a good bet that a lot of them are.
Second, we can suggest that the 28,931 documents in the bottom left quadrant include few responsive documents. Neither the key word searches nor our Predictive Ranking system hit on these documents. There is still a need for confirmatory sampling but we can be pretty sure that there are not a lot of responsive documents hiding in this quadrant.
The Two Key Quadrants
That leaves us with two quadrants to consider and this is where we find the answer to our puzzle. Together, these two quadrants represent about 15,000 documents. Here is what we can say about each:
- The top left quadrant represents 11,285 documents that our Predictive Ranking system found as likely responsive. The keyword searches provide no information about these documents other than to say that they did not return from the searches.
- The bottom right quadrant represents 4,051 documents that hit on counsel’s keyword searches but our Predictive Ranking system found to be likely non-responsive.
If counsel only reviewed documents that returned from the key word searches, they would be ignoring the 11,285 documents identified in the top left quadrant. Many of them had already been tagged during the Predictive Ranking training session and thus we knew that there were responsive documents in this quadrant. Our richness estimate went so far as to suggest that 50% of them were likely responsive, which meant that counsel might be missing 5,000-6,000 responsive documents using their key word approach. It quickly became evident that counsel would have to at least test additional documents in this quadrant before dismissing them as not responsive.
Conversely, our Predictive Ranking system led us to question how many of the 4,051 documents in the lower right quadrant were responsive. In fact, we knew from training that many of the documents in that quadrant were not responsive. At the least, that is what the reviewers concluded when they addressed them during the sampling.
We suggested that the client test the documents in this quadrant before engaging in review. Our suspicion was that these were false hits from the key word searches and not likely of interest. Our estimate was that they would find a richness of about 7%.
Answering the Question
By now you have already figured out how to respond to the client’s concerns. Simply put, the key word searches–while effective at finding some of the potentially responsive documents–missed a lot of others that should be reviewed. The Predictive Ranking system found many of the documents returned from the key word searches but it also found a lot of other potentially responsive documents. The total numbers were higher but there was good reason for that outcome. There were more documents that needed to be reviewed.
Put another way, search has two qualitative measures: precision and recall. Precision is a measure of the number of true hits (actually responsive documents) returned from your search compared to the total number returned. Recall is a measure of the total true hits returned from your search against the actual number of true hits in the population.
In our case, the key word searches may have been good on precision (assuming that the documents in the top right quadrant were, in fact, responsive). However, they seemed to miss the boat on recall. The searches missed a lot of the other responsive documents. That is not a good thing if your opponent chooses to challenge your production in court.
It turned out that my explanation proved helpful to our partner and the client team. They moved forward with their review using the documents ranked by our Predictive Ranking system. It turned out that there were a lot of responsive documents missed by the key word searches and many of the documents returned by the searches in the lower right quadrant were false hits. The explanation and diagram helped to clear up the mystery. I thought it might be helpful to others as well as they grapple with the mysteries of technology assisted review.
There may also be a moral to this story, so to speak. Discussion of technology assisted review often focuses on its ability to reduce document populations. But review is not just a numbers game—it’s also about getting it right. It does neither lawyers nor their clients any good to cut document populations if they are cutting a large number of potentially responsive documents in the process. As my story above illustrates, fewer is not always better. Here, Predictive Ranking proved itself superior to key word searching at getting it right. That may have saved counsel some grief further down the road.