Gravity, physics, and the mysteries of dark data

Cosmologists and astrophysicists have long hypothesized about the existence of “dark matter,” a mysterious, invisible substance inferred by the gravitational behavior of the large-scale structure of the universe. While unseen by telescopes, dark matter may account for most of the total mass of the cosmos, and its presence explains such esoteric observances as the anomalous rotation velocities of galaxies within clusters and the rate of the universe’s overall expansion. In other words, we can’t see it, and most people don’t think about it, but it’s important stuff.

Like dark matter, “dark data” is also unseen and thus not frequently thought of. Dark data is information that resides in your environment and doesn’t serve any known purpose to your organization. It may exist on file shares, in the cloud, on users’ hard drives, or even on USB sticks or other mobile devices. It undoubtedly served a legitimate purpose at some point (and may even still have some untapped current value), but it’s largely out-of-sight, out-of-mind. Dark data is not monitored, not controlled, and like a giant black hole sucking in cosmic debris, it’s growing.

The mystery legacy data may be maintained as an afterthought, something that may be needed down the road. The prevailing attitude may be, “Why throw it away?” Dark data, however, is more than just a storage problem or a potential strain on IT resources. While it may be a hidden asset waiting to be discovered for some legitimate business need, it’s more likely a tremendous liability. After all, this isn’t random data. It’s information about your business practices, your business partners, your employees, even your customers.  Below are some risks associated with dark data.

Risk #1
There are a few reasons to be concerned. First, if your security is breached and this data is stolen, what sort of malicious purposes can it be put to? Recent thefts of customer credit/debit card information from Target, TJ Maxx, DSW, and other companies are a prime example, as is the hack of confidential email from Sony.

Risk #2
Perhaps of lesser concern is the possibility of a competitor stealing data, to either gain access to trade secrets, or even hidden opportunities lurking in your information stores.

Risk #3
The most likely risk, however, is what may be uncovered in the eDiscovery process should you face litigation. In addition to any potential criminal or embarrassing information, if any sort of confidential information is present – be it patient records, financial data, or client-matter information, you may be facing substantial financial or legal penalties. In this age of HIPAA (Health Insurance Portability and Accountability Act) and SOX (Sarbanes-Oxley), companies are under tremendous pressure to comply with regulation, or suffer the consequences.

So, what can be done to protect yourself and your legacy data? One method is to use encryption to control who has the authority to access information. Another is to create and enforce retention policies to ensure the regular, periodic disposal of data that has outlasted its usefulness. Courts have already recognized that keeping data around forever is simply not physically or financially feasible. An automated process that destroys data is the simplest route, typically with an option that warns users beforehand and provides an opportunity to preserve data that they wish to keep. The key to that later option is that the users should know that what they wish to maintain has business relevance. It’s not “dark.”

The optimal information governance solution would be to employ a method to inventory or audit data and analyze its content. “Optimal” because this allows you to assess whether the dark data could actually be mined to retrieve useful information. This allows a trade-off to decide what data’s potential is more valuable than its risk, and thus worth keeping, versus which legacy data has more potential for jeopardy than it has potential value to the organization. Such a solution may incorporate capabilities for data retention and eDiscovery, but, more importantly, it answers some basic questions:

  • What is the actual content of the data?
  • When was this data created?
  • Where does the data reside?
  • Who is able to access it?
  • Why does it need to be kept or maintained at all?

Dark data is not a problem that will fix itself. It’s an issue that needs to be addressed or the potential for risk will only increase. Only when the questions above have been properly addressed can rational decisions be made regarding what data should be retained, what data should be re-purposed, and what data can be defensibly eliminated.

 

To learn more about the dark data in your organization, contact us today!

Leave a Reply

Your email address will not be published. Required fields are marked *