The first post in this series discussed current developments in Artificial Intelligence (AI) generally and its application to law. In this post, we take a closer look at the AI tools and companies at work in legal research and ediscovery.
Artificial intelligence is hard at work in the law — for example, in legal research and ediscovery — though often there is no “AI Inside” label on the box.
Lexis and Westlaw have applied natural language processing (NLP) techniques to legal research for 10-plus years. No doubt Bloomberg BNA does as well. After all, the core NLP algorithms were all published in academic journals long ago and are readily available. The hard (very hard) work is practical implementation against good data at scale. Legal research innovators like Fastcase and RavelLaw have done that hard work, and added visualizations to improve the utility of results.
Recently, ROSS Intelligence has been applying IBM Watson’s Q&A technology to legal research on bankruptcy topics, after winning a finalist spot in an IBM Cognitive Computing Competition among 10 universities. After building and training the data set, ROSS invites users to evaluate search results, and feeds those evaluations back to the engine to continue tuning (the essence of machine learning) in the manner of recommendation engines at Netflix and Amazon as well as Google’s feedback loops based on what we do with the search results we’re shown. No date for commercial release of ROSS has been announced.
Last October, Thomson Reuters, publishers of Westlaw (and incidentally, the Legal Executive Institute blog), announced a collaboration to use Watson across TR’s information businesses. Although nothing was said publicly about TR’s specific plans for Watson, one could speculate that the vast trove of legal content in Westlaw and the army of subject matters experts in the company could together do impressive things to improve legal research. Watson needs big data and training, at least initially by people: TR has both.
On February 1, at a private “innovation summit” TR teased the legal industry with hints that Watson Esq. will ride into town with a beta service for financial services regulation by the end of this year. Jean O’Grady’s commentary is, as usual, acute.
…[N]either AI nor Watson is magic. It takes time, human expertise, and painstaking effort to assemble useful data sets, analyze the content, train the algorithms and test the results. The broader the targeted topic, the greater the effort.
Take note of the timeline: even a company with the immense resources of content and expertise of Thomson Reuters, even in partnership with IBM, needs more than a year to get to beta with an AI legal research product. Why? Because neither AI nor Watson is magic. It takes time, human expertise, and painstaking effort to assemble useful data sets, analyze the content, train the algorithms and test the results. The broader the targeted topic, the greater the effort. For perspective, the IBM Jeopardy team’s account of their work is excellent.
Technology-assisted review (TAR, or predictive coding) uses natural language and machine learning techniques against the gigantic data sets of e-discovery. Recommind, Equivio (now part of Microsoft), Content Analyst and other vendors develop or license these tools. TAR has been proven to be faster, better, cheaper and much more consistent than human-powered review (let’s have another initialism: HPR). See, for example, the paper by University of Waterloo’s Gordon V. Cormack and Wachtell, Lipton, Rosen & Katz’s Maura R. Grossman, Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery. (The story of Grossman and Cormack’s work was well told recently by Susan Beck in The American Lawyer.)
Yes, it is assisted review, in two senses. First, the technology needs to be assisted; it needs to be trained by senior lawyers very knowledgeable about the case. Second, the lawyers are assisted by the technology, and the careful statistical thinking that must be done to use it wisely. Thus, lawyers are not replaced, though they will be fewer in number.
Done right, TAR is both powerful and reliable. Doing it right isn’t easy. One needs to understand the principles, and even some of the statistical mathematics, especially when appearing in court to argue that the outcomes are defensible and consistent with the standards of Federal Rule of Civil Procedure 26 and comparable rules in other courts.
A good place to start the journey to understanding is TAR for Smart People, a book by John Tredennick, one of the pioneers of ediscovery (and legal technology generally). TAR for Smart People is a superb guide to a critical and often misunderstood topic. The book is clear but technically deep, founded on fact, balanced and engaging. Who knew statistical sampling could be fun?
In scale and impact on costs, TAR is the success story of machine learning in the law. It would be even bigger but for the slow pace of adoption by both lawyers and their clients.
In the next and final installment on this series of blog posts on AI, we’ll look at tools and companies working on issues of compliance, contract analysis, case prediction and document automation.