Reprinted with permission.
The Future of
Predictive Coding (Part II) – Caveats Revealed
Maggie Tamburro on
What’s the value of a word?
Ponder these mind-boggling statistics, courtesy of
the ABA 2012 Litigation Section Annual Conference:
Some companies estimate that for every 340,000 pages of
information preserved for litigation, only 1 is actually used. In addition, discovery comprises approximately 50%
of the cost of litigation.
Like a dog chasing its own tail, technology has been forced to generate new
solutions to deal with the escalating costs and burdens associated with legal review of massive amounts of
electronically stored information.
Welcome to computer-assisted document coding and review, sometimes
better known by the legal industry as predictive coding. Thanks in part to three cases that have recently emerged
on predictive coding, (Da
Kleen Products, LLC, and
Aerospace Inc.), this relatively novel technique is now garnering recognition, and in one seminal case,
In the ground-breaking case of Da Silva Moore v. Publicis Groupe, Case No. 11-cv-01279 (S.D.N.Y. April 26,
2012) the U.S. District Court for the S.D. of New York became the first court to officially approve the use of
predictive coding as an acceptable way of reviewing electronically stored documents in certain cases.
What is Predictive Coding?
definitions can differ, what is commonly referred to as predictive coding – perhaps more appropriately called
computer-assisted document coding and review – is a human-driven, human-supervised technique of utilizing computer
technology to review, analyze, and process large sets of electronically stored data for relevance, privilege,
priority, issue-relation, or other thematic patterns.
According to a
report submitted at the ABA 2012 Litigation Section Annual Conference, predictive coding involves the
development of decision-making criteria which is based upon a training set, and then applied to a larger body of
data for the purpose of making predictions. At the heart of predictive coding lies the concept of "supervised
learning," defined as "an algorithm that learns from human decisions and then has the ability to apply those
decisions to new data."
Da Silva Moore: A Revealing Look
Although the parties
agreed to use predictive coding in Da Silva Moore, they disagreed about its implementation. A closer look at the
case reveals how the process plays out in litigation and raises practical questions, including whether proper
implementation will require the use of experts.
Central to the opinion issued by Magistrate Judge Peck
approving predictive coding (which was later adopted by Judge Carter on April 26) is a stipulation submitted by
the parties which details the protocols governing the defendant’s e-discovery production.
The stipulation specifies that the defendants first identify a small number of documents,
called the "initial seed set," which is representative of the categories to be reviewed and coded. The seed sets
are then each applied to the relevant category, which begins the "software training process" – meaning the
software uses the seed sets to prioritize and identify similar documents within the larger body of stored
Defendants then review and code a "judgmental sample" of the "software-suggested" documents,
ensuring they are properly categorized, and "calibrate" the process by recoding the documents. In this manner, the
software is trained from the new coding, and the entire process is repeated in what is called an iterative
The Da Silva Moore plaintiffs objected to the stipulation’s
protocols, arguing, inter alia, that the predictive coding methodology utilized by the defendants lacked generally
accepted standards of reliability, and violated Federal Rule of Civil Procedure 26 and Federal Rule of Evidence
Judge Carter’s Opinion
Judge Carter disagreed, however, pointing out that the protocol
contains standards for measuring reliability, requires active participation from plaintiffs at various stages of
the process, and provides for judicial review in the event of a dispute prior to final production.
Specifically, Judge Carter concluded that plaintiffs’ arguments challenging the reliability of the method were
It is difficult to ascertain that the predictive software is less reliable than the
traditional keyword search. Experts were present during the February 8 conference and Judge Peck heard from these
experts. The lack of a formal evidentiary hearing at the conference is a minor issue because if the method appears
unreliable as the litigation continues and the parties continue to dispute its effectiveness, the Magistrate Judge
may then conduct an evidentiary hearing."
Da Silva Moore paves the way for the use of predictive coding as
a defensible method of discovery in appropriate cases, but perhaps foreshadows potentially thorny pretrial issues
for its future. For example, how will parties agree upon which custodians will be searched? How will issue tags
used in the coding process be determined? How many documents must be reviewed in proportion to the overall corpus
in order to ensure a statistically reliable number of representative and responsive documents? How many iterative
rounds are necessary?
The Rise of the Evidentiary Expert …
Which begs the question:
Will recent court acceptance of predictive coding result in the rise of an "evidentiary expert"?
question arises as some who have criticized the use of predictive coding claim that even small errors can balloon
into massive deficiencies. Arguably one small mistake, especially early on the in initial coding process, can turn
into big problems and potentially result in false positives or miss documents altogether.
commentator has noted, "An
expert in the case must carefully train the program in order for it to be able to identify the correct documents
with accuracy equal to that of a human reviewer; even a tiny mistake in the algorithm can turn into huge
deficiencies in quality. It is better to use a less advanced tool very well than to place an extremely complex
tool in the hands of someone who doesn’t know how to use it."
Whether this requires the expertise of a
linguist to code and determine the initial sample set used in computer training, then correct and recode during
iterative review; a statistician who calculates error rates, and can defend the process as relevant and
methodologies as reliable; or a technology expert who determines the number of iterative rounds required to
stabilize training of the software, the technique requires a legal team – at least one of which will likely need
to be an expert.
The Bottom Line
Although the first to officially approve the use of predictive coding, the
Da Silva Moore case is not an e-discovery panacea that opens the flood gates for using predictive coding in every
case. If one digs a little deeper, there are many questions yet unanswered about the future of predictive coding,
as well as valuable lessons to be learned from Da Silva Moore, which we’ll discuss in a future post.
Meanwhile, do you think predictive coding will continue to gain acceptance in the courts, resulting in increased
demand for certain kinds of specialized evidentiary experts?
This article was originally published in
BullsEye, an expert witness and litigation
news blog published by IMS ExpertServices. IMS ExpertServices is a full service
expert witness and litigation consultant search
firm, focused exclusively on providing best-of-class experts to attorneys. We are proud to be the choice of nearly
all of the AmLaw Top 100.
For your next expert witness search, call us at 877-838-8464 or visit our
Other IMS ExpertServices BullsEye and Expert Library Articles on RF Cafe: