Author Archives: Andrew Bye

mm

About Andrew Bye

Andrew is the director of machine learning and analytics at Catalyst, and a search and information retrieval expert. Throughout his career, Andrew has developed search practices for e-discovery, and has worked closely with clients to implement effective workflows from data delivery through statistical validation. Before joining Catalyst, Andrew was a data scientist at Recommind. He has also worked as an independent data consultant, advising legal professionals on workflow and search needs. Andrew has a bachelor’s degree in linguistics from the University of California, Berkeley and a master’s in linguistics from the University of California, Los Angeles.

Breaking Up the Family: Reviewing on a Document Level Is More Efficient

Lawyers have been reviewing document families as a collective unit since well before the advent of technology-assisted review (TAR).

They typically look at every document in the family to decide whether the family as a whole (or any part of it) is responsive and needs to be produced, or withheld in its entirety as privileged. Most lawyers believe that is the most efficient way to conduct a review. Continue reading

Predict Proves Effective Even With High Richness Collection

Finds 94% of the Relevant Documents Despite Review Criteria Changes

Our client, a major oil and gas company, was hit with a federal investigation into alleged price fixing. The claim was that several of the drilling companies had conspired through various pricing signals to keep interest owner fees from rising with the market.1 The regulators believed they would find the evidence in the documents.

The request to produce was broad, even for this three-letter agency. Our client would have to review over 2 million documents. And the deadline to respond was short, just four months to get the job done. Continue reading

How to Get More Miles Per Gallon Out of Your Next Document Review

How many miles per gallon can I get using Insight Predict, Catalyst’s technology assisted review platform, which is based on continuous active learning (CAL)? And how does that fuel efficiency rating compare to what I might get driving a keyword search model?

While our clients don’t always use these automotive terms, this is a key question we are often asked. How does CAL review efficiency1 compare to the review efficiency I have gotten using keyword search? Put another way, how many non-relevant documents will I have to look at to complete my review using CAL versus the number of false hits that will likely come back from keyword searches? Continue reading

Review Efficiency Using Insight Predict

An Initial Case Study

Much of the discussion around Technology Assisted Review (TAR) focuses on “recall,” which is the percentage of the relevant documents found in the review process. Recall is important because lawyers have a duty to take reasonable (and proportionate) steps to produce responsive documents. Indeed, Rule 26(g) of the Federal Rules effectively requires that an attorney certify, after reasonable inquiry, that discovery responses and any associated production are reasonable and proportionate under the totality of the circumstances.

In that regard, achieving a recall rate of less than 50% does not seem reasonable, nor is it often likely to be proportionate. Current TAR decisions suggest that reaching 75% recall is likely reasonable, especially given the potential cost to find additional relevant documents. Higher recall rates, 80% or higher, would seem reasonable in almost every case. Continue reading

How Good is That Keyword Search? Maybe Not As Good As You Think

Despite advances in machine learning over the past half-decade, many lawyers still use keyword search as their primary tool to find relevant documents. Most e-discovery protocols are built around reaching agreement on keywords but few require testing to see whether the keywords are missing large numbers of relevant documents. Rather, many seem to believe that if they frame the keywords broadly enough they will find most of the relevant documents, even if the team is forced to review a lot of irrelevant ones. Continue reading