THE JOY OF BOOLEAN (Thoughts on crafting effective search syntax)

It may not seem obvious, but crafting search syntax is often a difficult part of the search equation. It’s reasonable to assume that most search tools on the market today are going to use Boolean logic to define the search parameters. AND, OR, and NOT are easy enough to grasp, but more robust techniques may not be. When using a combination of ANDs and ORs it may be necessary to group combinations of search terms in parentheses to get components of the formula to calculate in the correct order. Perhaps quotation marks are needed to search for multi-word phrases. Wildcards, proximity searching, field-specific searching (as opposed to searching an entire document), and regular expression searches all add complexity (and confusion).
As an IT professional (assuming that’s who you are), one can’t take it for granted that your legal staff will understand this, at least not initially. Allowing lawyers or paralegals to hand over a list of search terms and then inputting that list verbatim into a search tool is probably not going to get you the desired results.

As an example, one of our customers – a pharmaceutical company – was recently involved in a product liability case. They were asked to provide all documents belonging to certain individuals that contained any of a series of keywords. They pointed the search tool at the relevant mail files and used something similar to the following search syntax (exact names and keywords have been changed):

 

  • “Robert Sable” OR “Wanda Andrews” OR “Abby Miller” OR “Terry Connors” AND asbestos OR cancer OR mesothelioma OR pharmaceutical OR hospital OR chemotherapy

The logic seems simple enough: a series of names, separated by “OR,” which are then, in turn, separated with an “AND” from a series of keywords (themselves separated by “OR”). However…the returned Results set was nearly equal in size to the size of the combined mail files (!) and was virtually unusable. Furthermore, spot checking of the results set found most of the documents returned seemed totally irrelevant to the search.

What was the problem? With no instructions for the order in which the AND and OR operators calculate, the “AND” calculated before the “OR.” What was returned was essentially the logical equivalent of:

 

  • “Robert Sable” OR “Wanda Andrews” OR “Abby Miller” OR (“Terry Connors” AND asbestos) OR cancer OR mesothelioma OR pharmaceutical OR hospital OR chemotherapy

(“Terry Connors AND asbestos”) became one piece of search criteria. Every other name or word became its own, stand-alone search criteria.

Actually, two mistakes were made. First of all, parentheses should have been used to completely separate the names from the search terms:

 

  • (“Robert Sable” OR “Wanda Andrews” OR “Abby Miller” OR “Terry Connors”) AND (asbestos OR cancer OR mesothelioma OR pharmaceutical OR hospital OR chemotherapy)

Moreover, searching for the users’ names within their own mail files was redundant anyway, since it would probably have returned nearly EVERY document from the databases. This is exactly what occurred — the initial search had returned virtually every document from 3 of the 4 users.

In short, fundamentals are essential. Understand how the search tool works. It’s also a good practice to run a sample search against a subset of the actual data to be sure the search is returning reasonable results.

The keys to effective e-Discovery managaement are planning ahead and communication. When litigation is concerned, the need to be able to efficiently and properly find relevant data in a manner that’s defensible in the eyes of the court is critical. Know ahead of time where the legally relevant electronic data is stored. Keep policies, plans, and data maps up-to-date. If there isn’t an electronic discovery procedure in place, make it a priority. And above all, make sure there is a cooperative and effective communication exchange between the relevant parties within the organization.

 

Leave a Reply

Your email address will not be published. Required fields are marked *