Insight Predict is different because it is specifically designed to handle big data. Most technology-assisted review (TAR) products are built to run on specific hardware and use Microsoft SQL. Insight Predict is built to run at a data center and uses NoSQL graph databases that are engineered for big data.

Predict's big-data architecture gives it capabilities that appliance-based systems cannot match, delivering more targeted results in even less time. Here are 10 key differences between Predict and standard TAR systems:

Predict Ranks all the Documents Every Time

Most TAR systems start by choosing 500 or so documents to use as a reference set. The training is done against the reference set in the hopes that it will be representative of the larger population. They do this because the hardware would take too long to work against all the documents, which could number in the millions.

Predict's engine can rank large numbers of documents quickly (e.g., 10 minutes for 700,000 documents). This gives you a more accurate picture of the entire document population and better results when you rank them for review.

Better ranking means fewer documents to review, which saves time and money.


Predict Is Built To Handle Rolling Document Uploads

Most discovery comes in waves, which can be a problem when you use a TAR 1.0 product. Most do their training against a 500 document reference set. In doing so, they assume that the reference set was drawn randomly from the entire document set.

What happens when more documents arrive later? The reference set is no longer valid and thus you work can no longer be defended. Your only choice is to start the training again and hope there are no more documents.

Insight Predict does not use a reference set. Rather, it analyzes all the documents for each ranking. If you add new documents to the collection, the team simply keeps reviewing as the system integrates the new documents into the mix. No retraining or lost time.

Predict Supports Continuous Active Learning

Traditional TAR 1.0 systems are keyed into a single ranking after training is "finished." This is a good first step and will reduce review populations. But, it doesn't have to be the end of the process.

With Insight Predict, the learning process continues throughout the review, with reviewer judgments being fed back continuously to train the system. This will continue to improve the the yield curve, which means fewer documents to review than you originally estimated.
Unlike most systems, Predict is architected with a big data engine that ranks all of the documents all of the time. The ranking process continues as you feed reviewer judgments back to the system. In turn, the ranking algorithm keeps improving, which means the review continues to get better and better. A smarter system means fewer documents to review, faster and at less cost.

Predict Lets Your Review Team Work Alongside Your Experts

TAR 1.0 systems are built around the notion that a single expert must do all of the training. This means the review team cannot start until the expert finishes that work. Often that training involves click through thousands of randomly selected documents.

With Predict, your expert and your review team work in tandem. Use your trial team to find as many good documents as possible, using keyword search, witness interviews, etc. Feed them into the system as initial seeds and then let the review team get going. Their ongoing judgments are automatically fed back to the system for continuous active learning. Experts can then use our QC module to review and correct the team’ work.

This lets the review start right away (a key requirement in many cases). This way, the expert is not a bottleneck for the process and bills fewer high-dollar hours doing basic review.

Using Expert for QC Improves Result

Our tests show that this approach of using the expert to QC the review team works as well or better than the standard expert-led approach. Here is one sample from our testing where you can see that this approach achieved the best results:

Best Approach Yield Curve

This approach also saves the client on review costs. The following example, based on 500 documents, illustrates the magnitude of savings this approach can achieve:

In the second example, we had 20 reviewers tag all 5,000 of the documents. Then the expert QC's 500 documents that the system thought needed a second look. For the larger reviews more typical in a TAR process, the savings in both time and money are significant.

Predict Can Use As Many Manual Seeds As You Can Find

Most TAR systems eschew manual seeds (selected by a human rather than a random process). This is based on the belief that letting the lawyers find and add key documents will bias the system. The system can do a better job, the theory goes, by letting it find key documents through random selection.

Lawyers with experience in the process will tell you how frustrating it can be to sit and click through hundreds or thousands of random documents in the hope that they will find one that is relevant. "Why do I have to keep clicking?" they ask. "Why can't I use my intellect and training to get the process started?"

We take the opposite approach. Find as many key documents as you can and add them to the system as manual seeds. Hundreds, thousands or even tens of thousands—the more seeds you add, the better the yield curve.

Because Predict continuously ranks documents, your yield curve continues to improve as you add back the documents reviewed by your team. Whether you add back 10,000 judged seeds or even 100,000, you will see the curve improve in most cases.

While some TAR companies will tell you that they fear the expert will bias the system, their real concern is that their systems can't handle more than a limited number of manual seeds.

Protecting Against Expert Bias

To combat any possibility of bias using Predict, we recommend two steps:

  1. Users should run automated samples during the course of training and review. These will automatically include random and diversity samples in the mix.

    A contextual diversity sample presents documents that are different from anything already seen and judged. It is designed to show you documents you might otherwise have missed.

  2. In order to create a yield curve, you must run a systematic random sample. This presents you with documents drawn from all parts of the rankings and helps ensure that you have reviewed all kinds of potential seeds.

    After you create a yield curve, and assuming you don't want a final cutoff of your documents, you can add the systematic samples as additional seeds.

    Yield Curve

    This will typically have the benefit of improving the ranking curve further. Here is an example of such a case. The more manual seeds you add, the fewer documents you ultimately need to review.

Predict is Integrated with Catalyst Insight

Most TAR systems are standalones that operate separately from the search and review platform. As a result, you have to export files from the processing and review platform, export the metadata and then load it all into your TAR system, where the expert then reviews the data to train the system. When training is completed, the data and tagging must be exported to the review platform before review can begin.

Catalyst Insight is one of the few fully integrated e-discovery platforms. You can process, load, search, analyze, review and produce within one system. This allows you to work more quickly with fewer mistakes.

Insight Predict is an integrated module within Insight for performing Predictive Ranking. Search within Insight to find manual seeds and take advantage of dynamic updates to the review population during your Predictive Ranking process. Your expert can work alongside the review team to tag documents, both within and outside of Predict. The same forms can be used and the system can take advantage of any tags added by any other user at any other time.

Our integrated system speeds review, saves costs and reduces mistakes.

Predict Is Specially Designed To Handle CJK And Other Challenging Languages

Many e-discovery professionals wrongly believe that TAR will not work for Asian-language discovery. We have proven that TAR can work for complex languages such as Chinese and Japanese so long as the document text is properly tokenized.

We demonstrated this in a pilot project involving a key Japanese custodian, where we estimated that about 45 percent of the documents were relevant. After tokenizing the text and running it through our Predictive Ranking process, we were able to show that we found 95% of the relevant documents in the first 48 percent of the ranking. That left about 2% of the relevant documents remaining in the last 52 percent of the population.

Because review of CJK documents typically requires high-cost translators and multilingual reviewers, it can be particularly expensive. Using Predictive Ranking to prioritize CJK documents for review can result in substantial savings.

Predict Is the Most Flexible System On The Market

Many standard TAR systems lock you into a single approach to the process. For example, some systems require that an expert review documents in batches of 40 before the ranking. Once the ranking is completed, you are expected to create a cutoff and then move to a static review.

Insight Predict is one of the most flexible systems on the market. You can start the seeding process just about any way you like—use an initial expert or team of experts if you prefer or simply start your review. You can add manual seeds at any point in the process, not just at the start. You can rank at any time and feed top ranked documents to the reviewer any time you like.

While you can choose to follow a traditional TAR process using Predict, you are never locked in to a mandated approach. This flexibility allows you to adapt Predict to a variety of processes and to pioneer new ways to take advantage of its raw power.

Use Predict any way you like—fit the process to your needs.

Predict Can Rank Documents by Custodian For Upcoming Depositions

For many TAR systems, ranking the documents is an all-or-nothing proposition. If you need to focus on a particular witness for an upcoming deposition, you may have to create separate TAR systems for each witness and may even have to pay additional fees for each.

With Predict, you can create separate rankings for each of your key witnesses without building a new graph database and without additional charges. This can be valuable to deposition teams that must prepare for each witness separately. Seeing a ranked list of hot documents for each witness can help make the depositions more effective and efficient.

Predict can Quickly Find Hot Documents in Productions from Opposing Parties

TAR is generally thought of as useful only for documents to be reviewed before a production. However, Insight Predict can be just as effective in finding relevant documents from opposing party productions as it is for your own files.

Simply follow the same processes you would with your own documents. Search for good seeds, pull random samples or use automated samples to get started. Or simply start your review.

Whichever approach you choose, Predict can help you find important documents more quickly than the alternative—slow linear review of the entire production. Just start working down the rankings until you have found the documents you need to examine your next witness.

These are just some of the ways that Insight Predict differs from traditional TAR systems. They help explain why our unique, integrated Predictive Ranking system is turning heads across the country and around the world.

About Catalyst

Catalyst designs, builds and hosts the world’s fastest and most powerful document repositories for large-scale discovery and regulatory compliance. We back our technology with a highly skilled Professional Services team and a global partner network to ensure the best e-discovery experience possible.
Catalyst Repository Systems

1860 Blake Street, 7th Floor
Denver, CO 80202

Phone: 303.824.0900 | Toll Free: 877.557.4273
Fax: 303.293.9073 |  Privacy PolicyPrivacy Shield


We're hitting the road! Join us for our TAR in the Round Roadshow in Florida! Attendees will be eligible for CLE…

Mired in the marketing hype of AI and Deep learning? @acedsonline Webinar details how AI is used in #eDiscovery:…

Crisis Management After a Data Breach