Author Archives: Bayu Hardi

Catalyst’s Report from TREC 2016: ‘We Don’t Need No Stinkin Training’

blog_data_500One of the bigger, and still enduring, debates among Technology Assisted Review experts revolves around the method and amount of training you need to get optimal[1] results from your TAR algorithm. Over the years, experts prescribed a variety of approaches including:

  1. Random Only: Have a subject matter expert (SME), typically a senior lawyer, review and judge several thousand randomly selected documents.
  2. Active Learning: Have the SME review several thousand marginally relevant documents chosen by the computer to assist in the training .
  3. Mixed TAR 1.0 Approach: Have the SME review and judge a mix of randomly selected documents, some found through keyword search and others selected by the algorithm to help it find the boundary between relevant and non-relevant documents.

Continue reading