How Many Documents in a Gigabyte? Our Latest Analysis Shows A Shifting Pattern

Catalyst_How_Many_Docs_2017Since 2011, I have been sampling our document repository and reporting about file sizes in my “How Many Docs in a Gigabyte” series of posts here. I started writing about the subject because we were seeing a large discrepancy between the number of files per gigabyte we stored and the number considered to be standard by our industry colleagues. Indeed, in 2011, I reported that we were finding far fewer documents per GB (2,500) than was generally thought to be the industry norm, which ranged from 5,000 to 15,000.

The article and its successors quickly became the most read articles on our blog. Scientists and practitioners alike were interested in our findings. Even an author of the famed Rand study on e-discovery expenditures told me that he and his team had read my articles as they were doing research.

Earlier Reports

In the last installment in this series in June 2016, I reviewed the averages we had generated over the years:

  • 2011: 2,500.
  • 2014: 4,500.
  • 2014: 3,000.
  • 2015: 4,400.
  • 2016: 3,415.

In that 2016 report and consistent with the above figures, I put the average number of documents in a gigabyte at 3,500. Based on gut feeling and experience, I suggested a range of between 3,000 and 4,000 documents per gigabyte.

2017 Report: The Number Keeps Dropping

I began my 2017 analysis following the same methodology as before. This time, I looked at daily reports from our repository from January 2014 to April 9, 2017, a period of just over three years. The numbers under study ranged from about 23 million records at the beginning to over 173 million in the later samples. That represented a six-fold increase in the sample size.

Here were the results:

Average 3,810
Minimum 2,782
Maximum 5,318
Median 3,927

These figures are certainly consistent with my earlier results and support my 3,500 document estimate or perhaps suggest that the number should be higher, perhaps closer to 3,900.

But here’s the rub. In looking at averages over time, I failed to notice a clear trend. The numbers are going down day by day. This suggests that the relevant number is closer to the minimum and that what we should be talking about is the downward trend rather than past averages.

Here is a chart showing how the daily figures measured over time.


I would say this tells an interesting story. We see a steady drop, albeit with some variations, in the number of documents in a gigabyte. Back in early 2014, the average number of documents exceeded 5,000, rising to as much as 5,318 per gigabyte. By February 2017, the numbers had dropped to as low as 2,782, although we see a small uptick in more recent measurements.

This represents a drop of almost half, corresponding with a doubling of file sizes. It seems like a trend that is likely to continue, at least if this chart is representative of other populations.

What Do We Make of This?

From our research, the pattern is clear: the number of documents per gigabyte is dropping and dropping fast. The reason seems obvious to me. As users add more rich content—such as graphs, charts, pictures and videos—to files, they get larger. I can’t think of any other reason for the decline.

Will the trend continue? I am betting that it will. Studies have shown the power of visual communications to both inform and persuade. Technology makes it easier and easier to add such content. What we can do, we will do and this is no exception.

How many documents in a gigabyte? In 2017, my thinking is that the number is closer to 2,800 than 3,900. In a couple more years, I bet it will drop further. It may be time for our industry to consider record pricing so you don’t have to keep paying for the increase in file sizes.

Read the prior posts in this series:


About John Tredennick

A nationally known trial lawyer and longtime litigation partner at Holland & Hart, John founded Catalyst in 2000. Over the past four decades he has written or edited eight books and countless articles on legal technology topics, including two American Bar Association best sellers on using computers in litigation technology, a book (supplemented annually) on deposition techniques and several other widely-read books on legal analytics and technology. He served as Chair of the ABA’s Law Practice Section and edited its flagship magazine for six years. John’s legal and technology acumen has earned him numerous awards including being named by the American Lawyer as one of the top six “E-Discovery Trailblazers,” named to the FastCase 50 as a legal visionary and named him one of the “Top 100 Global Technology Leaders” by London Citytech magazine. He has also been named the Ernst & Young Entrepreneur of the Year for Technology in the Rocky Mountain Region, and Top Technology Entrepreneur by the Colorado Software and Internet Association. John regularly speaks on legal technology to audiences across the globe. In his spare time, you will find him competing on the national equestrian show jumping circuit or playing drums and singing in a classic rock jam band.