October 01, 2008

Seeing the Future of Search in E-Discovery

By Kelly R. Young
Legal Tech Newsletter
September 26, 2008


The practice of "search" as part of electronic discovery is evolving before our eyes. Suddenly, what was once deemed industry standard is insufficient. Keyword search, the legal profession's preferred method for sifting through large collections of electronically stored information to find relevant or privileged information, had been widely accepted by courts and the legal community because its effectiveness was assumed and unchallenged.

Until now.

THE END OF KEYWORD SEARCHING?

That effectiveness, as well as the ability of attorneys to conduct such searches, has been increasingly subject to judicial scrutiny. These issues were brought to the fore in three cases decided earlier this year by Magistrate Judge John Facciola of the D.C. District Court (U.S. v. O'Keefe, 537 F. Supp. 2d 14 (D.D.C. 2008) and Equity Analytics v. Lundin, No. 1:2007cv2033 (D.D.C. March 7, 2008)) and Magistrate Judge Paul Grimm (Victor Stanley, Inc. v. Creative Pipe, Inc.).

Taken together, these decisions foreshadow how discovery of ESI will be conducted in the future. Of immediate value for legal technologists, the opinions provide critical guidance as to how electronic discovery ought to be conducted in order to successfully defend against challenges should they arise.

VICTOR STANLEY V. CREATIVE PIPE

Grimm's decision in Victor Stanley is the key opinion to date. In that case, the defendants in a copyright infringement case claimed attorney-client privilege over documents that they had inadvertently produced to the plaintiff after applying a search protocol to identify which material was not privileged and, therefore, should be produced. Grimm provides explicit guidance as to how a party should review ESI for production in order to avoid waiver of privilege and, by extension, to successfully defend against challenges to a party's use of various search and retrieval methodologies.

The key factor for the Victor Stanley court in determining waiver of privilege was the reasonableness of the precautions taken to prevent inadvertent disclosure of privileged material. Grimm reasoned that in order for the court to determine whether the defendants' search method was reasonably designed to prevent inadvertent disclosure, the defendants would have had to articulate:

which keywords were chosen and how were they used to search the document population;
the rationale for selecting those keywords;
the qualifications of the individuals selecting such keywords to design effective searches; and
whether and to what degree the results of the search were measured for reliability and quality.
Because the defendants in Victor Stanley failed to provide information on any of these four factors, the court held that there was insufficient basis to conclude that the search protocol was reasonable; and thus, the defendants had waived their privilege with respect to the produced documents. In other words, without evidence of the search's methodology, the effort could not be deemed reasonable; and without a reasonable effort to maintain privilege, the production could not be protected by privilege.

U.S. V. O'KEEFE

Grimm's Victor Stanley opinion builds on two prior decisions (O'Keefe and Equity Analytics) that had been recently handed down by his counterpart in the D.C. District Court, Judge Facciola.

In O'Keefe, Facciola dismissed a defendant's objection to the adequacy of keywords used by the prosecution. Noting the complexity of search in the identification and production of ESI, he concluded that it is a practice best left to experts:

Whether search terms or "keywords" will yield the information sought is a complicated question involving the interplay, at least, of the sciences of computer technology, statistics and linguistics. ... Given this complexity, for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.

Grimm in Victor Stanley echoes that same concern and observes that the proper selection of keywords involves "technical, if not scientific knowledge" and that "while it is universally acknowledged that keyword searches are useful tools for search and retrieval of ESI, all keyword searches are not created equal."

SEARCH IS A SCIENCE

What these cases bring to the fore is the fundamental nature of search in e-discovery: It is a science that transcends the methods that we have all comfortably, if clumsily, employed to date.

At its core, search is not fundamentally a technological effort, nor is it a legal question. The science of search is fundamentally a problem of information retrieval. As such, a problem of search must be solved with information retrieval tools, and that requires the application of multiple expert competencies, including not only law and information technology, but also linguistics, statistics, computer science and subject matter expertise. Neither lawyers nor IT personnel -- whether CIOs, system administrators, information architects or dedicated litigation support staff -- have all of these competencies, nor should they.

Along with everyone else, those of us in the legal community long ago fell under the spell first of Westlaw and Lexis -- and then Google -- and came to think that keywords were magic bullets when it came to working with large volumes of ESI. That may explain why it is so difficult for the legal profession to give up the keyword search approach. Keyword searches, combined with Boolean logic, can be effective tools when conducting certain kinds of searches, such as looking for all articles by a particular author with "Exxon" in the title, or simply one review of a newly opened Thai restaurant.

KEYWORDS CAN BE ELUSIVE

But with e-discovery, search becomes an entirely different challenge. An attorney can know a case inside out but still not know which e-discovery search strategy is best, much less which keywords in which combinations are likely to provide him with what he needs. In fact, numerous academic studies offer evidence that attorneys conducting keyword searches do quite poorly even under ideal circumstances. The best-known study, by Blair & Maron, is now more than two decades old. In that seminal research, attorneys constructing Boolean search queries were instructed to search a document collection until they were satisfied that they had retrieved 75 percent all material relevant to the information requests. Even with a relatively low performance target and no time constraint, the subjects (lead defense attorneys and paralegals) on average retrieved only 20 percent of the relevant documents. More disturbingly, they were confident that they had performed more than three times as well as they actually had performed.

It is not that lawyers do not have a strong grasp of language. Rather, it is that language is elusive and complex. The notion that this work is best suited for search experts should not surprise any of us, given the obvious challenges of searching gigabytes of data created in varied business contexts by multiple sources that communicate in unique professional dialects.

In the specific context of ESI discovery, the mandate for legal and IT departments to collaborate still holds true. But while it is necessary, it may no longer be sufficient. Search in e-discovery is an information retrieval discipline that involves expert competencies in addition to law and IT.

THE ROLE OF EXPERTS

Does this mean organizations need to hire linguists, statisticians and a cadre of other experts who practice information retrieval for a living? No, but those experts should not be ignored. Although Judge Facciola may be of the opinion that information retrieval experts are necessary, Judge Grimm suggests that lawyers may be qualified to design search protocols if they can demonstrate that they employed appropriate quality assurance and measurement, for example, by sampling their search results. The standard that the Victor Stanley court sets forth, however, is hardly lenient:

Selection of the appropriate search and information retrieval technique requires careful advance planning by persons qualified to design effective search methodology. The implementation of the methodology selected should be tested for quality assurance; and the party selecting the methodology must be prepared to explain the rationale for the method chosen to the court, demonstrate that it is appropriate for the task, and show that it was properly implemented.

In some, but not all, cases, this standard might require the producing party to engage an expert to design, sample and validate the search methodology. As is always the case under the Federal Rules of Civil Procedure, the need for experts is a function of both the expertise within the litigation team as well as the costs and benefits of such expertise for the case at hand.

Reasonableness still reigns, but Victor Stanley makes clear that litigants must do their homework before deciding what is reasonable. In particular, they need to understand the pros and cons of various search methods, and they must be able to explain why they chose one method over another. In essence, they ought to make a reasoned determination of whether their case would benefit from outside expertise. And regardless of the method or technology, lawyers must be prepared to demonstrate that they have exercised reasonableness in selecting any particular approach.

LOOK TO SEDONA, TREC FOR GUIDANCE

So if parties need to thoughtfully choose their particular search methodologies, how should they do so?

Judge Grimm specifically points parties and courts to two resources: The Sedona Conference's Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery and the federal government's Text REtrieval Conference ("TREC") Legal Track initiative. Both provide guidance in determining the reliability and reasonableness of any search methodology underlying the production of ESI.

Looking to the future, Judge Grimm states that "there is room for optimism that as search and information retrieval methodologies are studied and tested, this will result in identifying those that are most effective and least expensive to employ for a variety of ESI discovery tasks."

He goes on to discuss the TREC Legal Track specifically and states:

The goal of the project is to create industry best practices for use in electronic discovery. This project can be expected to identify both cost effective and reliable search and information retrieval methodologies and best practice recommendations, which, if adhered to, certainly would support an argument that the party employing them performed a reasonable ESI search, whether for privilege review or other purposes.

With the support of scholars, industry and government, the TREC Legal Track may very well emerge as the standard by which various vendors and search methodologies can be independently evaluated. In any event, it is a significant initiative that is likely to attract the attention of the bar and bench.

CONCLUSION

How quickly other courts follow Victor Stanley, and to what degree litigation technologists embrace the lessons of TREC and the Sedona Conference, remains to be seen. But, today's judicial and industrial developments mark the beginning of a fundamental change in the way we conduct e-discovery search. We are learning that the application of human legal and technical expertise, combined with search and retrieval methodologies that are grounded in sound scientific principles, offers all parties the possibility of achieving more accurate, cost-effective and defensible results. This will require litigants to change their approach to discovery in significant ways, but such changes will bring improvements on all fronts.

Kelly R. Young is an attorney in Washington, D.C., and a practice director at H5, a company that provides specialized information retrieval and document analysis services for law firms and corporate legal departments. He can be reached at kyoung@h5.com.

1 comment:

Mitchel said...

It's definitely worth it to invest in a solid ediscovery software, especially since everything is being recorded with the latest technology.