Using Litigation Analytics to Find Critical Information in Your Case

Analytics_BlogAs data volumes grow in litigation, analytics become increasingly important tools for litigators. Analytics can help lawyers make sense of electronic information and reveal the stories hidden among the bits and bytes. But how well do you really understand analytics and what they can do for your case?

In a recent webinar, Litigation Analytics: How to Find Information Critical to Your Case, three experts in the use of analytics in litigation demonstrated core types of this technology and explained the different ways they can help you identify the core issues in your case more quickly and efficiently.

(Watch a complete recording of the webinar here.)

“The way I like to think about it is that analytics are any way you have to organize, visualize and search your data to find specific traits or patterns,” speaker Mark Noel, a former litigator and managing director, Professional Services, at Catalyst, told the webinar audience.

Noel was joined on the panel by Thomas C. Gricks III, a prominent e-discovery trial lawyer who recently joined Catalyst as managing director, Professional Services, and Bayu Hardi, senior consultant, Professional Services, at Catalyst and an expert in the use of analytics.

Over the course of the one-hour presentation, the panel covered the following forms of analytics:

  • Email threading.
  • Identification of near duplicates.
  • Timeline analysis.
  • Communications and social network analysis.
  • Fuzzy and advanced search.
  • Concept analysis.
  • Document clustering.
  • Technology assisted review (TAR).

What these tools have in common, Noel explained, is that they let us take a big mass of data and slice it along different dimensions. This allows us to see different patterns in the data and different ways it is organized so we can drill in and find what is important to the case.

Often, analytics are used for exploring a set of documents, rather than for putting documents into buckets like we do in a review for production, Noel said. “More commonly what we’re doing is we’re exploring a document set. We’re trying to see if there is anything unusual in there, if there are stories the documents can tell that we don’t know about yet. Often, it’s more like looking for needles in a haystack when we’re not sure what the needles might look like.”

 Email Threading

One of the most familiar forms of analytics is email threading, the panelists agreed. The reason to thread emails is that grouping them by topic and in order boosts review speed and accuracy. When reviewers can follow a topic through from one document to the next, it is much easier for them to be efficient and consistent.

Gricks cautioned, however, that lawyers must be alert to how the threading is being performed and what the implications may be. For example, one common method is to thread emails by subject line. But a recipient might simply use the reply button from an earlier email to raise a completely different topic.

“You need to realize there are different techniques and it is not necessarily so that every email in that set is actually related to the issue you’re trying to follow,” Gricks said.

Analyzing Timelines, Communications and Concepts

Another useful form of analytics is timeline analysis. Using Catalyst Insight, Hardi showed how emails can be graphically represented along a timeline to show spikes and gaps in communications. Because these spikes may indicate activity of interest, a lawyer using Insight can easily drill down to see a more granular view of the timeline and also filter it by sender, custodian or other facets, Hardi said.

Gricks added that gaps in a timeline can be analyzed to help you understand whether they occurred naturally or are indicative of potentially missing documents, whether intentionally or by oversight.

Hardi also demonstrated Insight’s ability to analyze communications visually. Email communications are depicted as interconnected bubbles, with the relative size of the bubbles indicating the frequency of communications and the lines connecting them indicating who the communications were between. Here again, Insight allows a lawyer to drill down to specific emails and to filter by specific facets, such as a single email address or domain.

“As you see things that are interesting, you can follow those trails,” Noel added.

Another form of analytics the panel demonstrated is concept analysis, which is a way to find common concepts and themes based on the content of the documents. The geek term for this is “latent semantic indexing.”

One way this can help you is by finding documents that relate to a concept you are interested in but that describe it in unfamiliar words, Noel said. You might, for example, be interested in documents pertaining to large motor vehicles that carry cargo. You know these as trucks, but you may be unaware that in the UK they are called lorries. Concept analysis can help you find all the relevant documents, no matter the exact words they contain.

Clustering vs. TAR

Still another common form of analytics – but one the panelists agreed they are less and less likely to use – is clustering. Simply put, clustering is a way of organizing similar documents together.

“A clustering tool is passive machine learning,” Noel explained. “It will look at all the documents and figure out how to lump them together in clusters all on its own, without input from human reviewers. That can be useful for certain tasks, such as to get an overview of a group of documents quickly.”

But Noel and Gricks both said that they have largely abandoned the use of passive clustering in favor of active-learning TAR technologies that use human input to help define the clusters that are important to a case.

“We’ve made the decision not to use background clustering without input from people,” Gricks said. “We use predictive coding to define the clusters we want to look at.”

Your goal as a litigator is to find and identify the best documents as soon as you can, Gricks explained. To that end, he uses the active-learning capabilities of Insight Predict to help him quickly focus on the best documents.

He does this by setting up projects within Predict related to the types of information he needs. One project, for example, may be related to damages. Insight then constantly ranks and pulls to the top the best documents related to that project so the lawyer does not have to just wait for them to come up. The lawyer can then choose to look at only the top documents using these various analytics tools to see, for example, what communications are being sent and what they involve.

Gricks offered a tip for further speeding the process: Create a synthetic seed document.

Seed documents – the documents used to train a TAR algorithm – are typically actual documents from the collection that someone has reviewed and tagged as relevant or not. But when you know the concepts that are relevant, Gricks said, you can create your own seed document packed with all the concepts you want and force the tool to bring those documents to the top.

This is a way to use TAR to start to get to the better information sooner, so you can start to understand the case and start to understand what you need to do to continue discovery with the other side, he explained.

A concern with this approach would be bias – that by telling the system what to find you may not find other potentially relevant information. Catalyst’s system overcomes this through its use of a contextual diversity algorithm. It continuously scans the document collection and grabs representative documents from groups and clusters that you don’t know anything about yet and brings them to your attention. That way, nothing gets overlooked.

Using the tools in combination is really the key here, Noel said. The different analytics tools find different patterns for you, because they cut the data in different directions.

“You want to use combinations of these,” Noel explained. “And the combinations are powerful because, the more tools you have in your toolkit, the more combinations you have that you can put together. So you want to have a number of these that you’re comfortable with, and then when you’re using them, you want to do that thoughtful iteration.”



About Bob Ambrogi

Bob is known internationally for his expertise in the Internet and legal technology. He held the top editorial positions at the two leading national U.S. legal newspapers, the National Law Journal and Lawyers USA. A long-time advisor to Catalyst, Bob now divides his time between law practice and media consulting. He writes two blogs, LawSites and MediaLaw, co-authors's Legal Blog Watch, and co-hosts the weekly legal-affairs podcast Lawyer2Lawyer. A 1980 graduate of Boston College Law School, Bob is a life member of the Massachusetts Bar Foundation and an active member of the Massachusetts Bar Association, which honored him in 1994 with its President's Award.