Ask Catalyst: How Did the F.B.I. Review All Those Emails So Fast?

[Editor’s note: This is another post in our “Ask Catalyst” series, in which we answer your questions about e-discovery search and review. To learn more and submit your own question, go here.]

We received this question:

Twitter_Ask_Catalyst_Mark_Noel

On Sunday, F.B.I. Director James Comey announced that his agency had completed its review of 650,000 emails it had found just eight days earlier. How could the F.B.I. review 650,000 emails in just a week?

Today’s question is answered by Mark Noel, managing director of professional services.


It is important to recognize that any sort of document review is a funnel. Just because the F.B.I. collected 650,000 emails doesn’t mean that investigators are going to put eyeballs on 650,000 emails. After applying a series of culling techniques, the emails that needed human attention could easily have numbered in the thousands rather than the hundreds of thousands.

Consider your own inbox. How many emails come from mailing lists? How many are receipts? How many are automated notifications from social media or monitoring software? Those can usually be culled immediately with a few simple searches.

On top of that, email systems create a lot of duplicates. Every message in your inbox is also sitting in someone else’s outbox. It’s likely that a large majority of emails to or from Hillary Clinton were already collected from Clinton’s servers and thus didn’t need to be reviewed again.

A report today in The New York Times confirms exactly this point. The F.B.I. was able to exclude most of the 650,000 emails as irrelevant. Of those that remained, a large proportion were duplicates of the emails the F.B.I had previously reviewed. The remaining documents that required human review numbered in the thousands rather than the hundreds of thousands.

The fact of the matter is that quick turnarounds such as this have become routine in e-discovery. An antitrust request might easily have 20 million documents to review and 30 days to comply. That’s close to 670,000 documents a day, rather than the 82,000 documents a day that the F.B.I. managed here. And remember, most of those documents are culled because they are duplicates or can be positively identified as irrelevant. Typically, only a few percent will end up needing actual human review.

Practically speaking, whether you are talking about F.B.I. investigators or e-discovery attorneys, it makes no sense to put human eyes on every email that is collected. It simply isn’t necessary. You don’t waste reviewers’ time by having them review iTunes receipts or industry newsletters. You don’t have them re-review identical copies of documents they’ve already seen. You cull all that stuff out before the actual review.

The New York Times article noted that a top Trump adviser called it “IMPOSSIBLE” that the F.B.I. could have “thoroughly reviewed 650,000 emails in 8 days.” This comment reflects a broad misunderstanding that leads people to assume that an agent or attorney is looking at every single email that is collected. That brute-force approach has not been practical for some time now. We’ve developed a lot of techniques to cull those collected documents down only to those that actually need human review.

Also, people still tend to believe that human eyes on documents is the most accurate way to review. But human reviewers routinely make mistakes and even disagree with their own judgments on the same documents about one in four times. The combinations of techniques now used in e-discovery can often match full human review for accuracy, while taking a lot less time.

Coincidentally, Law360 just published an article I coauthored with two of my Catalyst colleagues, Thomas C. Gricks III and Bayu Hardi, on the use of litigation analytics. The article discusses many of the tools that litigators use to cull a document collection and get to the heart of what it contains. If you’re interested in learning more, you can read that article here: The Role of Litigation Analytics in Your Case.