The Future of Predictive Coding - Rise of the Evidentiary Expert?
Reprinted with permission.
What’s the value of a word?
Ponder these mind-boggling statistics, courtesy of the ABA 2012 Litigation Section Annual Conference:
Some companies estimate that for every 340,000 pages of information preserved for litigation, only 1 is actually used. In addition, discovery comprises approximately 50% of the cost of litigation.
Like a dog chasing its own tail, technology has been forced to generate new solutions to deal with the escalating costs and burdens associated with legal review of massive amounts of electronically stored information.
Welcome to computer-assisted document coding and review, sometimes better known by the legal industry as predictive coding. Thanks in part to three cases that have recently emerged on predictive coding, (Da Silva Moore, Kleen Products, LLC, and Global Aerospace Inc.), this relatively novel technique is now garnering recognition, and in one seminal case, judicial approval.
In the ground-breaking case of Da Silva Moore v. Publicis Groupe, Case No. 11-cv-01279 (S.D.N.Y. April 26, 2012) the U.S. District Court for the S.D. of New York became the first court to officially approve the use of predictive coding as an acceptable way of reviewing electronically stored documents in certain cases.
What is Predictive Coding?
Although definitions can differ, what is commonly referred to as predictive coding – perhaps more appropriately called computer-assisted document coding and review – is a human-driven, human-supervised technique of utilizing computer technology to review, analyze, and process large sets of electronically stored data for relevance, privilege, priority, issue-relation, or other thematic patterns.
According to a report submitted at the ABA 2012 Litigation Section Annual Conference, predictive coding involves the development of decision-making criteria which is based upon a training set, and then applied to a larger body of data for the purpose of making predictions. At the heart of predictive coding lies the concept of “supervised learning,” defined as “an algorithm that learns from human decisions and then has the ability to apply those decisions to new data.”
Da Silva Moore: A Revealing Look
Although the parties
agreed to use predictive coding in Da Silva Moore, they disagreed about its implementation. A closer look at the
case reveals how the process plays out in litigation and raises practical questions, including whether proper
implementation will require the use of experts.
The stipulation specifies that the defendants first identify a small number of documents, called the “initial seed set,” which is representative of the categories to be reviewed and coded. The seed sets are then each applied to the relevant category, which begins the “software training process” – meaning the software uses the seed sets to prioritize and identify similar documents within the larger body of stored documents.
Defendants then review and code a “judgmental sample” of the “software-suggested” documents, ensuring they are properly categorized, and “calibrate” the process by recoding the documents. In this manner, the software is trained from the new coding, and the entire process is repeated in what is called an iterative training process.
The Da Silva Moore plaintiffs objected to the stipulation’s
protocols, arguing, inter alia, that the predictive coding methodology utilized by the defendants lacked generally
accepted standards of reliability, and violated Federal Rule of Civil Procedure 26 and Federal Rule of Evidence
Judge Carter disagreed, however, pointing out that the protocol contains standards for measuring reliability, requires active participation from plaintiffs at various stages of the process, and provides for judicial review in the event of a dispute prior to final production.
Specifically, Judge Carter concluded that plaintiffs’ arguments challenging the reliability of the method were premature, stating,
It is difficult to ascertain that the predictive software is less reliable than the traditional keyword search. Experts were present during the February 8 conference and Judge Peck heard from these experts. The lack of a formal evidentiary hearing at the conference is a minor issue because if the method appears unreliable as the litigation continues and the parties continue to dispute its effectiveness, the Magistrate Judge may then conduct an evidentiary hearing.”
Da Silva Moore paves the way for the use of predictive coding as a defensible method of discovery in appropriate cases, but perhaps foreshadows potentially thorny pretrial issues for its future. For example, how will parties agree upon which custodians will be searched? How will issue tags used in the coding process be determined? How many documents must be reviewed in proportion to the overall corpus in order to ensure a statistically reliable number of representative and responsive documents? How many iterative rounds are necessary?
The Rise of the Evidentiary Expert …
Which begs the question: Will recent court acceptance of predictive coding result in the rise of an “evidentiary expert”?
The question arises as some who have criticized the use of predictive coding claim that even small errors can balloon into massive deficiencies. Arguably one small mistake, especially early on the in initial coding process, can turn into big problems and potentially result in false positives or miss documents altogether.
As one commentator has noted, “An expert in the case must carefully train the program in order for it to be able to identify the correct documents with accuracy equal to that of a human reviewer; even a tiny mistake in the algorithm can turn into huge deficiencies in quality. It is better to use a less advanced tool very well than to place an extremely complex tool in the hands of someone who doesn’t know how to use it.”
Whether this requires the expertise of a linguist to code and determine the initial sample set used in computer training, then correct and recode during iterative review; a statistician who calculates error rates, and can defend the process as relevant and methodologies as reliable; or a technology expert who determines the number of iterative rounds required to stabilize training of the software, the technique requires a legal team – at least one of which will likely need to be an expert.
The Bottom Line
Although the first to officially approve the use of predictive coding, the Da Silva Moore case is not an e-discovery panacea that opens the flood gates for using predictive coding in every case. If one digs a little deeper, there are many questions yet unanswered about the future of predictive coding, as well as valuable lessons to be learned from Da Silva Moore, which we’ll discuss in a future post.
Meanwhile, do you think predictive coding will continue to gain acceptance in the courts, resulting in increased
demand for certain kinds of specialized evidentiary experts?
This article was originally published in BullsEye, an expert witness and litigation news blog published by IMS ExpertServices. IMS ExpertServices is a full service expert witness and litigation consultant search firm, focused exclusively on providing best-of-class experts to attorneys. We are proud to be the choice of nearly all of the AmLaw Top 100.
For your next expert witness search, call us at 877-838-8464 or visit our
Related Pages on RF Cafe
- Tax Court Okays Use of Predictive Coding to Review Documents
- Court-Appointed Experts: The Future of Litigation?
- Inventor Testimony in Patent Litigation
- Regression Analysis in Litigation
- Expert Testimony Central to Coffee Class Decision
- 10th Circuit Posits 'Unifying Theory' for Daubert Gatekeeping
- Do You Need an Expert to Sue an Expert?
- What the #!$% Is Bitcoin?
- Dilbert Versus Daubert - Which Standard Controls in Patent Design Cases?
- Lack of Expert Leads to Reversal of Patent Case
- Excluding Expert Testimony the Jury Already Heard
- 7th Circuit Excoriates Lawyers, Judges for 'Fear of Science'
- Federal Circuit Ponders Abandoning De Novo Review
- Apple, Samsung Daubert Docs Should Have Been Sealed, Federal Circuit Rules
- When an Expert's Testimony Counters His Own Report
- Fortune Telling & Reliability? An Expert Testimony Enigma
- Can Expert Statements Inadvertently Waive Protection?
- E- Discovery: 10 Strategic Steps for Defensible Search
- The 'Almighty' Federal Circuit? Evolving Patent Policy & Jurisprudence
- A Scientific Weapon for the Courtroom?
- Experts Face Fewer Challenges in Court, Survey Says
- Death of the ITC?
- No Appeal for Expert Witness 'Third Wheel'
- Patent Trolls on Trial?
- To Testify or Not to Testify: Re-Designating An Expert Witness?
- The Future of Predictive Coding (Part II) – Caveats Revealed
- The Future of Predictive Coding - Rise of the Evidentiary Expert?
- Think Before You Click - Facebook’s "Like" Button
- A Siri-ous Affair?
- Denial of Cert in "Junk Science" Case Leaves Lawyers Reeling
- A Peek "Under the Hood" of America Invents
- 10 Predictions for Litigation in 2012
- Attorneys Turn to iPads to Prepare and Question Experts
- Two Mistakes That Can Produce Tragedy in Patent Litigation
- Opposing Experts & Summary Judgment
- Could IBM's Watson Make Experts Obsolete?
- An Expert's Change of Mind Can Be Shattering
- Why Do They Call Us Expert Witnesses? Part II
- Why Do They Call Us Expert Witnesses? Part I
- Bilski's Lesson: Avoid Abstraction
- Expert Secrecy: An Ethics Dilemma?
- New Federal Rule on Experts Takes Effect Dec. 1, 2010