E-Discovery, My How You’ve Grown!

I got an email the other day from Justin, our business development rep in Chicago. He was writing about a case we had been helping with for maybe a year now. It was an interesting matter but nothing out of the ordinary. I believe they have about 150,000 documents on the site.

Justin wrote to tell me that our client would be adding data to the site, also nothing out of the ordinary. But then he surprised me. “They expect to be adding another 28 million pages to the site,” he reported. That did get my attention.

“Did you mean 28,000 pages?” I wrote back, mostly just to kid him. “Perhaps that was a typo,” I continued.

“Nope,” he answered. “That was not a typo. Our partner says to expect another 28 million pages on the site.” I had to laugh when I thought about that volume. We have come a long way in this industry when 28 million pages isn’t all that unusual.

When Discovery Had No ‘E’ in Front

We started Catalyst in 2000, more than 10 years ago. At the time, there was no “e” in front of discovery. Digital mostly meant scanned images of the paper originals. Even the email that existed was printed out to paper and then scanned. I remember former partners of mine who would get a CD in production. The first thing they asked for was to print it out. Often we would stamp the pages and scan them back in to the system.

I smile when I think about those days. Back then a “big case” had 30,000 documents in it. (Of course some were bigger than that but plenty were smaller as well.) When we were getting started, I took pride in the fact that we had a dedicated storage device called a Net Appliance Filer. It was an industrial-strength set of hard drives that were striped using the RAID 5 protocol. I confess to bragging about it because it really seemed special to me.

Most important–and this is what makes me smile–the device had a whole 600 gigabytes of usable space on it. We thought we might never need another one. Ever. Indeed, I used to look forward to the day when we had a million pages on our system. That seemed like a huge reach to me. The thought of reaching the million-document mark seemed even more formidable. You need a lot of cases at 30,000 documents each to reach a million. Thirty-three seemed like a lot of cases to me back then.

A Million Documents is Now Routine

Fast forward to 2011 when you rarely see the word discovery without an “e” in front of it. We buy SANs these days rather than network-attached storage and they hold a lot more data. I believe our standard buying unit is now 48 terabytes which, as most know, is like 48,000 of those gigabytes we originally thought about. We now have a number of these devices and just bought another.

Volumes have risen incredibly. Several years ago, we automated the loading process just to keep up with the demand. Clients and partners send us data directly via a specialized FTP system for automated processing and loading. These days, loading a million documents in a day is not unusual. Indeed, just checking as I write this sitting on a porch on a sparkling afternoon in Nashville, Tenn., there are 162 separate automated loading tickets in our system. It just boggles my mind to think of that much data.

Taking it further, it is not just the number of documents being sent to us that has grown dramatically, but also the size of cases. Four or five years ago, our average case size was about 16 gigabytes of documents. If you figured 5,000-10,000 documents per gigabyte (a number some people throw around but we believe is high), that would come to between 80 and 160,000 documents per case. That seemed big enough to me.

Today, after removing a number of really large outlier cases, our average has jumped to about 120 gigabytes a case. Using the same base figures as before, that suggests that cases have grown substantially, to somewhere between 600,000 and more than a million documents per case. That is just amazing when you think about it.

And speaking of outliers, we have clients with 20 terabytes of litigation documents on our system (and one prospect talking about 60 terabytes of case data). I have wondered how many total documents that represents but I am not even going to pull out my calculator to figure that one. We see cases with as many as 8 million documents in them under review. It just boggles the mind.

So, that is what I was thinking about when I got Justin’s email. My point isn’t to say that Catalyst is big or that our numbers are anything special. Rather, like many of you, I remember what litigation was like in the 1990s, a time when we thought paper discovery was getting out of hand. In those days, we moved from red-well folders, to filing cabinets to war rooms and even warehouses. But we never contemplated having even 100,000 documents to review, let alone millions. It just didn’t happen, at least not in my practice.

The volume of digital data is growing at an explosive rate, as we all know. Some claim the world is creating more than 988 exabytes of data a year in new content. That comes to 988 billion gigabytes (It goes exabyte to petabyte to terabyte to gigabyte.)

The great bulk of this data consists of video and audio files but written data keeps expanding too. And a lot of that stuff will be discoverable, which is what drives this industry.

The e-discovery market is all grown up now, with trade shows, industry magazines, experts and analysts. My how you’ve grown! My how we all have grown!


About John Tredennick

A nationally known trial lawyer and longtime litigation partner at Holland & Hart, John founded Catalyst in 2000. Over the past four decades he has written or edited eight books and countless articles on legal technology topics, including two American Bar Association best sellers on using computers in litigation technology, a book (supplemented annually) on deposition techniques and several other widely-read books on legal analytics and technology. He served as Chair of the ABA’s Law Practice Section and edited its flagship magazine for six years. John’s legal and technology acumen has earned him numerous awards including being named by the American Lawyer as one of the top six “E-Discovery Trailblazers,” named to the FastCase 50 as a legal visionary and named him one of the “Top 100 Global Technology Leaders” by London Citytech magazine. He has also been named the Ernst & Young Entrepreneur of the Year for Technology in the Rocky Mountain Region, and Top Technology Entrepreneur by the Colorado Software and Internet Association. John regularly speaks on legal technology to audiences across the globe. In his spare time, you will find him competing on the national equestrian show jumping circuit or playing drums and singing in a classic rock jam band.