It is time to put an end to family batching, one of the most widespread document review practices in the e-discovery world and one of the worst possible workflows if you want to implement an efficient technology-assisted review (TAR) protocol. Simply put, it is nearly impossible for family batching to be as efficient as document-level coding in all but the most unusual of situations.
We set out to evaluate this relationship with real world data, and found document-level coding to be nearly 25 percent more efficient than family batching, even if you review and produce all of the members of responsive families.
Based on this obvious benefit, we developed a streamlined two-phase workflow to (1) take optimal advantage of a document-level TAR review, and (2) immediately pass responsive families only to a full-fledged family review for production. Properly performed, this approach provides an opportunity to recognize even greater savings in the review and production process.
There is just no reason to batch documents as families for review any more. Family-batched reviews are less efficient, and lead to the review of many, many unnecessary non-responsive documents. And, with Catalyst Insight and Predict, family members are readily available if they are needed to provide context in a document-level review. Compared to family batching, a properly planned and executed document-level TAR review will save time and money, with minimal, if any, inconvenience.
Family Batching is an Unnecessary Review Relic
Just what is family batching, and what led to the widespread adoption of family batching in the e-discovery realm? The answer lies in the longstanding tradition of requesting and producing documents attached to one another together as a set, and the shortcomings of early e-discovery tools.
For as long as I have been practicing law, document production requests required respondents to produce attached documents together, maintaining the attachment relationship as well as possible. In the paper days, the (often full page…) definition of “document” almost always included a provision that specifically defined a document to include all the attachments. And, as if that wasn’t enough, the instructions reiterated the obligation to produce attached documents together.
Those definitions and instructions continued as we moved into the e-discovery realm. So it became the norm to review and produce even electronic documents together with all of their attachments, or family members.
Early e-discovery tools further perpetuated this approach, and spawned the practice of batching documents for review by family. Most of the early review tools did not have sophisticated analytics, and certainly did not have TAR capabilities. The best way to make the review more efficient was to present related electronic documents – i.e., document families – together. Batching documents to reviewers by family made it easier for lawyers to make production decisions about multiple documents at one time, which led to increases in review speed and efficiency.
There is really no need to continue batching documents by family, whether for efficiency or for production. Modern e-discovery tools include advanced analytics, such as TAR, as well as sophisticated batching techniques, such as email threading and near duplicates, that optimize review efficiency far beyond what could ever be achieved through family batching. And emerging jurisprudence has called into question the obligation to produce non-responsive family members merely because they are attached to a responsive document.
Perhaps the most superficially persuasive argument I have heard in support of family batching – that family review is necessary to provide context for responsiveness decisions – is really a red herring, especially as it relates to a TAR review. TAR tools operate on the features (typically words) of an individual document, not on the family relationships. For that reason, a document that says, “please see attached,” will likely never be responsive for the purposes of a TAR review regardless of how important the attachment may be. And, when it is important to see the family members to understand the true meaning of a document, superior TAR tools make those family members readily accessible without forcing family batching.
Family Batching is Simply Less Efficient for a TAR Review
It has always seemed obvious to me that family batching would necessarily be less efficient than document level review in a TAR review. Every TAR tool on the market will suggest both responsive and non-responsive documents for review while the algorithm is being trained. In a document-level review, you only have to review the responsive and non-responsive documents suggested by the tool for training. When families are batched together, however, you also have to review all of the family members of wholly non-responsive families – documents you would not otherwise need to review. Reviewing those non-responsive family members necessarily makes the TAR review less efficient.
We conducted a post-hoc simulation of a family batched TAR review project to see just what the impact of family batching might be. In that case, the initial TAR review was conducted on a family batched basis and we used the final coding judgments to simulate both a family batched review and a document-level review from the same starting point.
The case we looked at had roughly 250,000 documents, with nearly 30,000 responsive documents (or a richness of about 12 percent). In the family batching simulation, it was necessary to review about 70,000 documents to achieve 80 percent recall. This obviously included every responsive and non-responsive document prioritized for review by Predict, as well as the family members of each of those documents. In the simulated document-level review, the same level of recall was achieved after reviewing only 36,000 documents – nearly a 50 percent improvement in efficiency. The gain curves for the family batched (red line) and document-level (blue line) reviews are shown below, and you can read more about the simulation here.
Review efficiency increases even if you consider it necessary to review and produce all of the family members of the responsive document set (again, because you are not reviewing non-responsive families). The chart below shows gain curves for both the family batched (red) and document-level (blue) reviews. This simulation, however, incorporated the review of the family members of responsive families at every 10 percent recall increment. Those documents appear as the “branches” extending from the family-batched gain curve.
In the document-level simulation, 80 percent recall was again achieved upon the review of roughly 36,000 documents. But there were about 17,000 family members of responsive documents that would also need to be reviewed, putting the total review at roughly 53,000 documents – which still represents a 24 percent increase in review efficiency over a family batched review. In addition, reviewing the family members would increase recall to 85 percent (meaning that a continuous, contemporaneous review of family members would achieve 80 percent recall at something less than 53,000 documents).
Our simulation confirmed what I have always believed – that a document-level TAR review will typically be more efficient than a family batched review, because you will not be forced to review the family members of non-responsive families.
A More Efficient TAR Review
Based on my personal experience with an effective document-level TAR review, and given these simulation results, we developed a streamlined two-phase Predict review workflow that optimizes efficiency.
The first phase in the Predict review is a document-level review solely for the purpose of finding responsive documents. In fact, we often utilize a document coding panel that includes only the responsiveness field. The objective is to simply decide whether the document under review is responsive or not, based on the four corners of the document. As discussed above, family members are readily available through the coding panel if they are needed to provide meaning to the words in the document under review.
The reason I like this first phase approach is efficiency. When a reviewer is making a single decision, and that decision boils down to whether the document under review relates to the case or not, review rates can skyrocket. I have personally seen document review rates in excess of 250 documents per hour when a reviewer is focused solely on deciding whether documents are responsive or not. This is roughly five times the average rate for a typical multi-issue family review, which averages somewhere in the neighborhood of 45 to 60 documents per hour.
Once a document has been found to be responsive, that document and its family members are all passed to a second phase, traditional, production review. The entire responsive family can be reviewed for responsiveness, privilege, substantive issues, and the like. This phase of review proceeds at the typical multi-issue family review rate.
The benefit of this approach is that you are quickly identifying and restricting your review to the true review set, i.e., only those documents that need to be reviewed for production. While you will certainly review non-responsive documents in the course of the TAR review, you will never review the family members of wholly non-responsive families. As our case simulation illustrates, this makes the review as a whole much more efficient.
The argument has been made that this two-phase approach cannot be more efficient than a family batched review because you will necessarily review the first responsive document in a family twice – once during the first phase responsiveness review, and again during the second phase family review. While there may be a combination of factors (e.g., review rates, family sizes, richness, etc.) where that is the case, given the review rates that can be achieved it would certainly not be the norm.
For example, assume the first phase review located 250 responsive documents. Based on industry studies, a continuous active learning tool like Predict might lead to the review of 250 non-responsive documents as well. The first phase review would take 2 hours (at 250 docs per hour), and the second phase review of those same responsive documents would take another 5 hours (at 50 docs per hour), for a total of seven hours using the two-phase Predict review workflow. A typical, comprehensive, review of those same 500 documents would take 10 hours in a traditional production review.
The critical point of this two-phase review is that it gets away from family batching. Beyond that, it simply minimizes the total effort required to review the responsive families.
Just Say No to Family Batching in a TAR Review
The reason to avoid family batching in a TAR review is self-evident – you simply do not have to review the family members of wholly non-responsive families. Fewer documents to review means less time and greater efficiency. We have confirmed the increase in efficiency through simulation, whether you review only the responsive documents or you review their families as well. Either way, a document-level TAR review is more efficient than a family batched TAR review. And, using innovative workflows, you can even further optimize a document level TAR review. But whatever you do, “just say no” when it comes to family batching.
 See In re Takata Airbag Products Liability Litigation, No. 1:15-md-2599 (S.D. Fl. Mar. 1, 2016); In Re: Zoloft Prod. Liab. Litig., MDL No. 2342, 2013 WL 8445354, *4-5 (E.D. Pa. Oct. 31, 2013), adopted without objection, 2013 WL 8445280 (Nov. 19, 2013).