Is eDiscovery Really the Problem
Or is it the foundation of our data management strategy, or lack thereof!
By Linda Sharp
Did you know that the Leaning Tower of Pisa began to lean during the construction process? The architect and engineer had unknowingly built the tower on sand. Yet, the builder continued adding floors, making one side taller than the other, in an attempt to remedy the problem. He didn’t realize that his attempts were futile, as with each floor, the taller side was adding more weight, thus causing the tower to lean even further. He had spent too much money and had too much at stake to turn back. Perhaps we have the same problem in the world of eDiscovery.
Let’s roll up our sleeves and understand the real problem
Recently, you have probably read many articles that claim to provide insight into the benefits of “predictive coding”, “automated document classification” or other technological assisted review, depending on the service provider that is sponsoring the article. In reality, the factors that are driving the costs of eDiscovery, is the manner in which information is managed and stored.
Let’s dive in! There are largely two types of data stores:
•1) Structured Data - which consists largely of databases that store information for HR, insurance policies, health care information, accounting data and many other data types. The bottom line is that these are largely stores of databases. You can efficiently run a search and derive the information that you seek. These data sets contain only legitimate business information. That copy of your medical records, the paragraphs that once derived created your policy of insurance, or that information that is needed to populate and export your quarterly report, etc. The information is maintained in a fielded environment that allows for ease of searching. The search will render a very accurate and timely result.
•2) Unstructured Data – this set of data has largely non-fielded information, has varying data types from social media, Blackberry devices, Bloomberg information, emails, Word, PPT, Excel, and the list goes on. It has no consistent structure from one application to the next. Thus, the name, “unstructured data”. It is this group of data that provides most of the information that is now contained in what the industry is calling “Big Data”.
Since structured data can be easily obtained by query of the database, this article is designed to target the obstacles of collecting information from unstructured data. This is where the issues for ESI largely arise. We’re talking largely about those emails and working files that you are trying to get your hands on in some kind of a meaningful way, so that the information can be reviewed and ultimately produced.
Too much data
We all have read about the staggering amounts of information that are being created every day and how those numbers are growing exponentially. We need to understand why there is such a problem in getting to this information. Let’s take a look at email since it is the most commonly sought after group of information, as well as is the largest volume of data sought. I suggest that you take a look at your own mailbox. How much information is contained in your mailbox that has absolutely nothing to do with the business of the organization? Historically, you might expect to see in the area of over 50% to be non-corporate communications. Additionally, corporations send out business communications regarding the day to day operations of the organization, those reminders of your annual enrollment for insurance, the holiday party, etc. Further, most individuals do not work on the same project for their entire career at any given organization. Thus, of the 50% which represents corporate communications, what percentage is actually at issue in your matter? Depending on the individual, it could be as little as 1-5% of their total data volume. Starting to see the problem?
Another issue arises in determining whether you have the correct custodians to begin with. As a general rule, the majority of matters brought are filed many years after the event actually occurred. Frequently, the designated custodians have changed positions or are no longer with the organization. It is common for outside counsel to coordinate with individuals that may have had nothing to do with the underlying events. Whether it is a construction defect case, a patent infringement matter, or a contract dispute, you have to get to the right people’s data. As part of the overall process of identifying custodians, counsel may be conducting interviews. Unfortunately, these interviews are often held with individuals that are already overwhelmed with their daily work load and have little time to assist. Yet still, the interviews may not include all of the appropriate individuals. Additional difficulties arise as custodians are attempting to remember events that occurred years earlier, to the extent that they can. Hmmm! This sounds like a recipe for disaster.
Garbage in, Garbage out!
The traditional method for handling ESI is: 1) ID the custodians the best you can, 2) do a full data collection or a selected data pull of their electronic information, 3) maybe do some sampling or ECA from those custodians’ data, 4) possibly grab additional information based on the findings from the sample or ECA process (thus repeat of steps 2-4), 5) review data (as an iterative process), 6) export the production set. Sound familiar? No wonder we have a problem with the costs associated with ESI. How do you know if you have the right custodians? Did you get the right data to start with? What key words should you use? This cumbersome process is compounded as attorneys attempt to identify key words, prior to having the ability to start looking at the data. I can’t think of an instance where the saying, “Garbage In, Garbage Out,” holds truer than in the world of ESI. Yet, clients, lawyers and judges struggle with understanding why ESI is so expensive and are trying to find the right solution to fix the problem.
Let’s throw technology at it!
Today’s blogs, legal periodicals, articles and of most recent, even cases are written around the benefits of various types of technology automated document review. Why? Because clients are tired of paying the costs associated with document review. I can’t say that I blame them. We all know that the statistics show that the costs of review can be as high as 10x the costs associated with processing and hosting combined. Thus, something has to be done. Throwing more technology at a flawed process isn’t going to fix the problem. If you’re trying to build a block wall, you’ll need certain things, but most importantly, a viable foundation. If the foundation that you are starting with, much like the Leaning Tower of Pisa, has issues, it doesn’t matter how you stack the blocks, there is going to be a problem. Unfortunately, in the world of ESI, these blemishes may be like the Wizard in the “Wizard of Oz”. You may not know that you have a problem until you pull back the curtain. This could be at the time that Plaintiffs challenge your methodology; you print off your privilege logs or Plaintiffs come across custodians which you have not identified, collected or reviewed their data.
What is the problem?
To fix the problem with the costs of ESI, you have to understand the underlying problem. Much like the builders of The Tower of Pisa, we find ourselves continuing along, utilizing the same process, hoping that this strategy will resolve the problem. The reality is it isn’t going to go away until we resolve the underlying issues. For corporations, it is the manner in which information is maintained.
Paper days. In the “old days”, information was placed in labeled folders, which ultimately went into red welds, etc. This information, once the employee decided it was no longer useful, was shipped to storage with an assigned retention period. As the period arrived, absent a legal hold, the information was summarily destroyed. Keep in mind that the folders only contained information that was relevant to the subject of the file.
Today. Let’s contrast the paper days with today. In the “paper days”, our business records didn’t contain those “honey pick up milk” communications, family photographs, etc. as we do today in the electronic world. We complain about the costs associated with ESI, its review and production, however, the largest volume of data either is a non-business record or has absolutely nothing to do with the matter at hand. Yet, we can’t seem to figure out why the costs of document review are so high. Thus, the new buzz, technological assisted review. This new buzz attempts to resolve the problem of reducing the costs associated with document review. The reality, we need to reduce or eliminate the volume of information that does not have a useful business purpose to begin with. Additionally, we need to reduce the number of incidents of the information so that we know what we have, where it came from, and which custodians have ownership. Lastly, we need to reduce the number of times that this information resides external to the organization. These simple steps, if implemented, would significantly reduce the volume of information that would have to be potentially collected, processed and “reviewed”.
Under the traditional model, we compound the problem of managing corporate information by “collecting” data on a case-by-case basis, sending it to various law firms who, in turn, send it to outside vendors. Unfortunately, how many times is the same information, or portions thereof, sent to different providers for different matters? There may be cases where you are compelled to provide assurance that certain information is no longer available. Can you comfortably make such a representation? Was an image of a hard drive or copies of backup devices provided to a service provider or counsel? Does the information exist outside of your environment and, thus, outside of your records retention policies and practices? What processes and protocols do you have in place to ensure that providers that you are working with are not “storing” your information after the conclusion of the matter? What about their back up devices? Keep in mind that data may be responsive to another matter should it raise its ugly head in the future. Since the information is available, through one of your prior agents, you may be subject to produce that information, or at a minimum, they may be subject to a third party subpoena.
Bottom line, who has your data?
How do we resolve the problem?
The largest problem that we face today is the sheer volume of non-business information or business records that no longer have a useful business purpose that are being stored and maintained in our IT infrastructures to then be collected, processed and reviewed when litigation arises. We need a solution that allows you to maintain your viable corporate information, yet, eliminate data that has no useful corporate purpose. This solution needs to ensure that corporate records retention policies and legal requirements for electronic data mirror that of paper documents. Then, when a matter does arise, you aren’t spending precious resources grabbing, processing and reviewing outdated or non-business records. Imagine how much more successful (and less expensive) a technology automated document review process could be if it was only manipulating viable data. What a novel idea...
What if compliance, records, IT and legal could actually collaborate on a holistic approach to resolve the issues surrounding enterprise data management? This strategy would provide the organization with a total information governance solution. It would reduce the costs of the IT infrastructure, eliminate information in keeping with records and legal policies, as well as provide a streamlined process for handling information.
This technology is available today and has been for a number of years. Unfortunately, many companies may be resistant to change, aren’t looking at data from a holistic approach, or they treat data and business needs as silos, rather than resolving the problem. Companies are slowly recognizing that there truly is a better solution.
Let’s fix the problem…
About the Author
Linda Sharp, Esq., MBA is the Associate General Counsel for ZLTechnologies. She is a member of the Los Angeles Chapter of Women in e-Discovery, an well-known author and speaker.