Author Archives: Jeremy Pickens

About Jeremy Pickens

Jeremy Pickens is one of the world’s leading information retrieval scientists and a pioneer in the field of collaborative exploratory search, a form of information seeking in which a group of people who share a common information need actively collaborate to achieve it. Dr. Pickens has seven patents and patents pending in the field of search and information retrieval. As senior applied research scientist at Catalyst, Dr. Pickens has spearheaded the development of Insight Predict. His ongoing research and development focuses on methods for continuous learning, and the variety of real world technology assisted review workflows that are only possible with this approach. Dr. Pickens earned his doctoral degree at the University of Massachusetts, Amherst, Center for Intelligent Information Retrieval. He conducted his post-doctoral work at King’s College, London. Before joining Catalyst, he spent five years as a research scientist at FX Palo Alto Lab, Inc. In addition to his Catalyst responsibilities, he continues to organize research workshops and speak at scientific conferences around the world.

Comparing the Effectiveness of TAR 1.0 to TAR 2.0: A Second Simulation Experiment

Catalyst_Simulation_TAR1_vs_TAR2In a recent blog post, we reported on a technology-assisted review simulation that we conducted to compare the effectiveness and efficiency of a family-based review versus an individual-document review. That post was one of a series here reporting on simulations conducted as part of our TAR Challenge – an invitation to any corporation or law firm to compare its results in an actual litigation against the results that would have been achieved using Catalyst’s advanced TAR 2.0 technology Insight Predict.

As we explained in that recent blog post, the simulation used actual documents that were previously reviewed in an active litigation. Based on those documents, we conducted two distinct experiments. The first was the family vs. non-family test. In this blog post, we discuss the second experiment, testing a TAR 1.0 review against a TAR 2.0 review. Continue reading

Comparing Family-Level Review Against Individual-Document Review: A Simulation Experiment

Catalyst_Simulation_ExperimentIn two recent posts, we’ve reported on simulations of technology assisted review conducted as part of our TAR Challenge—an opportunity for any corporation or law firm to compare its results in an actual, manual review against the results it would have achieved using Catalyst’s advanced TAR 2.0 technology, Insight Predict.

Today, we are taking a slightly different tack. We are again conducting a simulation using actual documents that were previously reviewed in an active litigation. However, this time, we are Continue reading

Deep Learning in E-Discovery: Moving Past the Hype

blog_lightbulb_with_flareDeep learning. The term seems to be ubiquitous these days. Everywhere from self-driving cars and speech transcription to victories in the game “Go” and cancer diagnosis. If we measure things by press coverage, deep learning seems poised to make every other form of machine learning obsolete.

Recently, Catalyst’s founder and CEO John Tredennick interviewed Catalyst’s chief scientist, Dr. Jeremy Pickens (who we at Catalyst call Dr. J), about how deep learning works and how it might be applied in the legal arena.

JT: Good afternoon Dr. J. I have been reading about deep learning and would like to know more about how it works and what it might offer the legal profession. Continue reading

Catalyst Research: Family-Based Review and Expert Training — Experimental Simulations, Real Data

Catalyst_Exclusive_ResearchABSTRACT

In this research we answer two main questions: (1) What is the efficiency of a TAR 2.0 family-level document review versus a TAR 2.0 individual document review, and (2) How useful is expert-only (aka TAR 1.0 with expert) training, relative to TAR 2.0’s ability to conflate training and review using non-expert judgments [2]? Continue reading

Catalyst’s Report from TREC 2016: ‘We Don’t Need No Stinkin Training’

blog_data_500One of the bigger, and still enduring, debates among Technology Assisted Review experts revolves around the method and amount of training you need to get optimal[1] results from your TAR algorithm. Over the years, experts prescribed a variety of approaches including:

  1. Random Only: Have a subject matter expert (SME), typically a senior lawyer, review and judge several thousand randomly selected documents.
  2. Active Learning: Have the SME review several thousand marginally relevant documents chosen by the computer to assist in the training .
  3. Mixed TAR 1.0 Approach: Have the SME review and judge a mix of randomly selected documents, some found through keyword search and others selected by the algorithm to help it find the boundary between relevant and non-relevant documents.

Continue reading

Ask Catalyst: Does Insight Predict Use Metadata In Ranking Documents?

[Editor’s note: This is another post in our “Ask Catalyst” series, in which we answer your questions about e-discovery search and review. To learn more and submit your own question, go here.]
Ask_Catalyst_TC_Jeremy_Pickens
We received this question:

In ranking documents, does Insight Predict use metadata information or is the ranking based solely on the document text?

Today’s question is answered by Dr. Jeremy Pickens, chief (data) scientist.

Insight Predict, Catalyst’s unique, second-generation technology assisted review engine, does use metadata. However, there are dozens if not hundreds of different types of metadata that could be extracted from various kinds of documents. Some metadata has proven more fruitful, other metadata less so. Continue reading

Ask Catalyst: How Does Insight Predict Handle Synonyms?

By | This entry was posted in Ask Catalyst and tagged on by .

[Editor’s note: This is another post in our “Ask Catalyst” series, in which we answer your questions about e-discovery search and review. To learn more and submit your own question, go here.]

We received this question:

Ask_Catalyst_TC_Jeremy_Pickens
How does Insight Predict handle synonyms? For example, assume document 1 has “car” in it and not “automobile” and document 2 has “automobile” and not “car.” If Predict gets the thumbs up on document 1, it doesn’t necessarily rank document 2 high, correct? It doesn’t know that the words are the same concept, right?

Today’s question is answered by Dr. Jeremy Pickens, senior applied research scientist.

As the Catalyst scientist who developed Insight Predict, I made the conscious, explicit choice not to build synonyms into the process. I’ll tell you why: Continuous learning obviates the issue. Continue reading

Ask Catalyst: What Is ‘Supervised Machine Learning’?

[Editor’s note: This is another post in our “Ask Catalyst” series, in which we answer your questions about e-discovery search and review. To learn more and submit your own question, go here.]
Ask_Catalyst_JP_What_is_Supervised_Machine_Learning-03

We received this question:

What is supervised machine learning?

Today’s question is answered by Dr. Jeremy Pickens, senior applied research scientist. Continue reading

Ask Catalyst: What Are The Thresholds for Using Technology Assisted Review?

[Editor’s note: This is another post in our “Ask Catalyst” series, in which we answer your questions about e-discovery search and review. To learn more and submit your own question, go here.]
Ask_Catalyst_Jeremy_Pickens_Thresholds_for_TAR-03

We received this question:

What are the thresholds (in numbers of docs) at which your company will recommend the use of predictive coding? Would this be case dependent or just a percentage of documents (e.g. 100 out of 1,000 documents giving us 10%)?

Today’s question is answered by Dr. Jeremy Pickens, senior applied research scientist. Continue reading

Does Your TAR System Have My Favorite Feature? A Primer on Holistic Thinking

A_to_B-01-01I have noticed that in certain popular document-based systems in the e-discovery marketplace, there is a feature (a capability) that often gets touted.  Although I am a research scientist at Catalyst, I have been on enough sales calls with my fellow Catalyst team members to have heard numerous users of document-based systems ask whether or not we have the capability to automatically remove common headers and footers from email. There are document-based systems that showcase this capability as a feature that is good to have, so clients often include it in the checklist of capabilities that they’re seeking.

This leads me to ask: Why?

For the longest time, this request confused me. It was a capability that many have declared that they need, because they saw that it existed elsewhere. That leads me to want to discuss the topic of holistic thinking when it comes to one’s technology assisted review (TAR) algorithms and processes. Continue reading