Does Technology-Assisted Review Help in Reviewing Productions?

While much has been written about using technology-assisted review to assist in making productions, few have discussed using TAR to assist when you have received a production. (Earlier this month, Ralph Losey published a post about this, “Why a Receiving Party Would Want to Use Predictive Coding?“)

We recently had a chance to work with our partner DSi (formerly Document Solutions, Inc.) and its law firm client on a case that involved this very issue. In this post, I will explain the work we did and what we learned in the bargain.

Overview

Our client received about 960,000 documents from defense counsel in several recent document productions. The first production had some 750,000 documents, with the remaining documents added in subsequent uploads.

The documents were primarily in image format with associated text. There were native files for spreadsheets as well.

Because depositions were looming, the client asked whether our Predictive Ranking software, Insight Predict, might help the team in identifying Hot or Responsive documents for use in the depositions and, ultimately, in preparing for trial.

General Ranking Procedures

We started by loading the production documents into Insight Predict for analysis and ranking. By then, the team had already reviewed roughly 3,000 documents. We used these documents as initial seeds for the ranking algorithm and ranked the entire population.I’

We then took the top 2,000 ranked (but un-reviewed) documents and foldered them for the team to review. After that review, we used those documents as additional seeds for further ranking. We repeated this process eight times and had the team review about 16,000 top-ranked documents.

Our initial assignment was to provide Hot documents for the team to use in the depositions themselves. The results of that work were documented in my earlier blog post, “Does Technology-Assisted Review Work for ‘Hot’ Documents?” This blog addresses a different issue, the use of TAR to review received productions.

Comparing Predict to Key Word Searches Run on Productions

During the course of our work, we learned that counsel had run a series of key word searches to identify 180,000 potentially hot or responsive documents. The review team went through them in linear fashion and tagged each of them as Hot, Responsive or some variant of Not Responsive.

Given that we had already developed document rankings based on the seed set review, this gave us an excellent chance to compare the efficacy of Predict to the key word searches counsel had used. The question we asked was:

“What would have been the benefit to the attorney reviewers if they had reviewed the production documents in the order provided by Insight Predict?”

By that we mean, would Predict have brought Responsive and Hot documents to the reviewers more quickly than a linear review of the key word search results?

To answer that question, we started with the pool of about 16,000 seeds created during our ranking process. Using only those seeds, we ranked the document collection and then analyzed the ranking against the approximately 180,000 documents retrieved by the key word searches run by counsel. We then plotted a yield curve to show how the documents would have been ranked using Predict.

A yield curve (similar to an ROC curve) is a standard tool in the analytics world to try and visualize the results of a ranking. The X axis shows the number of documents in the population being reviewed. The documents are ordered based on the ranking; from most likely Hot or Responsive to least likely Hot or Responsive.

The Y axis shows the percentage of Hot or Responsive documents found as the team moves through the ranking. If the first document is, in fact, Hot or Responsive, the curve moves upwards. If, in contrast, the document is not Hot or  Responsive, then the curve moves to the right.

The goal is to have the curve move steeply upward. That means the ranking was successful and presented a lot of the Hot or Responsive documents at the beginning. A flatter line would indicate that the ranking was not successful. A random ordering would provide a straight line going from 0 on the left to 100% of the Hot or Responsive documents at the end of the review.

Here is how that curve looked:

Yield-Curve-for-Hot-Documents

The red curve shows the yield (ranking order) of the Hot documents in the population. This is a good result with a steep slope indicating a high presence of Hot documents. In contrast, the gray line shows how the review might go with a random ordering of the documents.

The blue line on the yield curve suggests that if we had used the Predict ranking to order the documents for review, the team would have found all, or substantially all, of the Hot documents after reviewing fewer than 40,000 documents (as opposed to reviewing approximately 180,000 based on the key word searches). That would result in a substantial savings in review costs.

We did a similar analysis for Responsive documents. It came out like this:

Yield-Curve-for-Responsive-Documents

While the curve is a bit different at the higher end, it too suggests that the team would have found substantially all of the Responsive documents after reviewing the first 40,000 in the rankings.

Evaluating Future Review Performance

The team ultimately asked for input on how well Predict might work for review of the rest of the production documents. To answer that question, we analyzed how the Predict engine, using the seeds described above, fared against all known document judgments. The two graphs below show those results. The vertical Y axis shows the percentage of Responsive documents, while the horizontal X axis shows the document count.

The ranking is quite strong for the first 40% of the Responsive documents, and then begins to flatten out.  To get to 80% of the Responsive documents, the estimate is that the team would have to review approximately a third of the collection (331,702 documents).  In that portion, approximately 269,000 un-reviewed documents remain.

Curve-Showing-Ranking-of-Hot

The Responsive curve is not quite as strong but still similar in form and value.

 Curve-Showing-Ranking-By-Relevance

It could likely be improved by adding more Responsive seeds to the mix[1] and changing the focus accordingly.

In either case, we think Predict proved that it can be valuable in helping the team target Hot and Responsive documents and reduce review costs.

Conclusion

In discovery, legal teams often receive as many documents as they produce; in some cases they receive many more than they produce. While TAR is typically thought of as a tool to reduce review populations for producing parties, we think it can be just as valuable for receiving parties. After all, sending a request for production is one thing. Figuring out what to do with the electronic documents you receive is quite another.


[1] Our focus on this project was to find “Hot” documents and thus we did not use a lot of merely “Responsive” seeds for the rankings.

mm

About John Tredennick

A nationally known trial lawyer and longtime litigation partner at Holland & Hart, John founded Catalyst in 2000. Over the past four decades he has written or edited eight books and countless articles on legal technology topics, including two American Bar Association best sellers on using computers in litigation technology, a book (supplemented annually) on deposition techniques and several other widely-read books on legal analytics and technology. He served as Chair of the ABA’s Law Practice Section and edited its flagship magazine for six years. John’s legal and technology acumen has earned him numerous awards including being named by the American Lawyer as one of the top six “E-Discovery Trailblazers,” named to the FastCase 50 as a legal visionary and named him one of the “Top 100 Global Technology Leaders” by London Citytech magazine. He has also been named the Ernst & Young Entrepreneur of the Year for Technology in the Rocky Mountain Region, and Top Technology Entrepreneur by the Colorado Software and Internet Association. John regularly speaks on legal technology to audiences across the globe. In his spare time, you will find him competing on the national equestrian show jumping circuit or playing drums and singing in a classic rock jam band.