Many legal professionals continue to question whether technology assisted review is right for them. Perhaps you are a corporate counsel wondering whether TAR can actually reduce review costs. Or maybe you are a litigator unsure of whether TAR is suitable for your case.
For anyone still uncertain about TAR, Catalyst is offering the TAR Challenge. Give us an actual case of yours in which you’ve completed a manual review, and we will run a simulation showing you how the review would have gone – and what savings you would have achieved – had you used Insight Predict, Catalyst’s award-winning TAR 2.0 platform.
Recently on this blog, I described one of these simulations. Today, I want to tell you about another.
In this case, the client asked Catalyst to simulate a TAR review of a collection that had previously been reviewed and coded in an actual litigation. The documents had been reviewed by the client’s own review team, which coded them for responsiveness (including identification of Hot documents), as well as for their relevance to 13 substantive issues.
The purpose of the simulation was to see whether and to what extent Predict could have enhanced the efficiency and effectiveness of the previous review. The simulation confirmed that using Predict would have enhanced the review in three significant ways:
- Predict would have reduced review effort by nearly 50% at the 80% recall level, thereby eliminating the need to review 259,589 documents.
- Predict located 90% of the Hot documents after reviewing only the first 25% of the collection.
- By the time the Predict review reached 80% recall, the vast majority of the documents relating to each of the 13 substantive issues had been identified.
For a detailed report of the simulation, download this PDF. What follows is a summary.
How We Conducted the Simulation
The client provided us with the full collection of the documents it had previously reviewed. The collection contained just under 700,000 documents, but after removing empty documents, documents that were not empty but had no meaningful extractable text, and documents for which there were no responsiveness judgments, there remained 521,669 documents. Of these, we knew from the prior review that 218,775 were responsive, for an overall collection richness of 41.9%.
We did not know the precise order in which the client’s team originally reviewed the documents, so we assumed it was a traditional linear review of the full collection. A TAR review, however, does not require the entire collection to be reviewed; the number of documents that need to be reviewed typically depends upon the recall objective. For comparison purposes, we evaluated the review effort necessary to achieve 70%, 80% and 90% recall with Predict.
We then proceeded to simulate a review of the client’s documents using Insight Predict’s continuous active learning (CAL) protocol. Specifically, we used Predict to continuously rank and batch the documents based on what Predict considered to be the most likely responsive documents. For each batch, we used the previously tagged judgments, essentially as an artificial review team, applying the previous judgments to documents in simulated Predict ranked order.
To begin, we selected a set of random documents to initialize training. Note that, while this approach is used for a simulation when we have no knowledge of the matter or the original starting point for the review, it is neither normal nor necessary when using Predict. Rather, we typically suggest that counsel use a keyword search or other search analytics to find just a handful of responsive documents to initiate training and begin the review/ranking process.
Once the initial seeds were selected, we applied the previous responsiveness judgments to these documents and submitted them to Predict to rank the entire collection. At this point in the simulation, those initial seeds were considered to have been seen and reviewed, and the balance of the collection was considered as unseen. We then selected a batch of about 1,000 unseen documents from the top of this ranking, applied the previous judgments to these newly selected documents, and then submitted all the seen documents back to the algorithm. The algorithm then re-ranked the entire collection again and a new batch of unseen documents was selected.
We continued this reranking process until we had sufficient information to reach a reasoned conclusion on the effectiveness of Predict. This is consistent with actual document review projects, in which the review team would likely stop once it has been determined that a desired percentage of responsive documents have been found (typically 80% recall in our experience).
As we went through the simulation, we plotted the results on a gain curve demonstrating the order in which responsive and non-responsive documents would have been reviewed with Predict, to provide an easy comparison to a linear review. A gain curve provides a simple but effective visual means to compare the TAR results to those from a linear review because it shows the cumulative number of positive documents that would be found throughout the entire review.
Simulation #1: Responsiveness
The primary objective in this simulation was to measure Predict’s performance against the previous linear responsiveness review. As you can see from Table 1 below, the Predict review was demonstrably more efficient than the linear review at recall levels that, in our experience, are typically associated with document production in civil litigation. At 80% recall, a level that has been widely accepted in both state and federal courts, Predict was 49.8% more efficient than linear review, eliminating the need to review over 250,000 documents.
The gain curve in Figure 2 provides a more comprehensive picture of the comparison between the Predict review and the linear review. The diagonal black line represents a linear review, showing the rate at which one would expect to find responsive documents when reviewing in random order. The black line on the left represents a theoretical perfect review, in which all responsive documents are reviewed before a single non-responsive document. Finally, the blue line represents the results of our simulation, showing how many documents would have to be reviewed to achieve a responsive document count or recall (reading from the y-axis), had Predict been used. (The heavier dashed lines along the y-axis are at 70%, 80%, and 90% recall.)
Simulation #2: Hot Documents
As a corollary to the analysis of the responsiveness review, we evaluated the extent to which Predict would impact the identification and review of Hot documents. Out of a total of 218,775 responsive documents in the collection, the client’s review team had coded only 183 as Hot.
Figure 3 shows the results. The blue line represents the gain curve for the responsiveness review and the red line represents the gain curve for the Hot documents. The almost vertical rise in the Hot gain curve shows that the vast majority of Hot documents, roughly 85%, were located before even 20% of the collection was reviewed for responsiveness. And 99% of the Hot documents would have been located by the time the responsiveness recall reached the 80% level.
This is important for two reasons. First, in the context of litigation, there is a benefit to locating the most important, or Hot, documents sooner rather than later, as they help to define strategy and shape the course of the litigation. Second, because a technology-assisted review typically concludes before the entire collection has been reviewed, there is always a possibility that some fraction of Hot documents may not be found.
Simulation #3: Issue Identification
As noted earlier, the client’s prior review had also coded the documents for each of 13 different substantive issues. Thus, as a second corollary to our analysis of the general responsiveness review, we evaluated the extent to which Predict might impact the identification and review of documents relating to each of these 13 issues.
The gain curves in Figure 4 show the results. By the time the responsiveness review reached a reasonable level of recall, each of the 13 substantive issues also reached a reasonable level of recall. The blue line again represents the responsiveness review, while the green lines each represent one of the 13 substantive issues. Looking at the 80% recall point on the responsiveness gain curve, you can see that more than 80% of the documents relating to all but three of the substantive issues have been located. For the other three, two are very near 80%, and the third is at slightly more than 70% – all reasonable levels of recall in the litigation context as well.
In summary, this simulation shows that Predict would eliminate the need to review as many as 259,589 documents from the collection, effecting nearly a 50% increase in review efficiency. Using Predict would also improve the identification of Hot documents, locating most of them before reviewing the first 25% of the collection, with very few remaining unseen by the time the review achieves 80% recall. At the same point, Predict would also have located the vast majority of the documents pertaining to the client’s 13 substantive issues.