If left unmanaged, your data can become overwhelming, making it difficult to procure information you need when you need it. While software (including Sherpa’s) is designed to address archiving, e-discovery, compliance, etc., the overarching goal is most always the same: to make managing and maintaining data a feasible task. In this post, you’ll see two types of data you’re accustomed to working with, paying close attention to the differences between structured and unstructured data.
What is Structured Data?
Before getting into unstructured data, you need to have an understanding for its structured counterpart. Structured data (as explained succinctly in Big Data Republic’s video) is information, usually text files, displayed in titled columns and rows which can easily be ordered and processed by data mining tools. This could be visualized as a perfectly organized filing cabinet where everything is identified, labeled and easy to access. Most organizations are likely to be familiar with this form of data and already using it effectively, so let’s move on to the hotter question.
What is Unstructured Data?
Believe it or not, your database of structured information doesn’t even contain half of the information available for your use! Seth Grimes, a leading industry analyst on the confluence of structured and unstructured data sources, published an article that stated, “80% of business-relevant information originates in unstructured form, primarily text.” This may seem like an outlandish percentage, but don’t jump to conclusions too fast. We’re just getting started.
Now that you have a grasp on structured data, it will be much easier to understand what unstructured data is. Unstructured data, usually binary data that is proprietary, is that which has no identifiable internal structure. It can be visualized as a level 5 hoarder’s living room; it’s a massive unorganized conglomerate of various objects that are worthless until identified and stored in an organized fashion. Once this organization process has taken place (through the use of specialized software), the items can then be searched through and categorized (to an extent) for obtaining insights. While data mining tools might not be equipped to parse information in email messages (however organized it may be), you may have very good reason to collect and categorize data from this source. This illustrates the importance and plausible breadth of unstructured data.
Email Has Structure, Right?
The term “unstructured” has faced major scrutiny for several reasons. One argument is that although some form of structure is not formally identified, it can still be implied and therefore should not be labeled as “unstructured.” The counter-point states that if data has some form of structure but is not helpful to the processing task at hand, it may still be characterized as “unstructured.” So, while email messages may contain information that does have some implied structure, we can label the information as “unstructured” because normal data mining tools aren’t equipped to parse it. Alas, both sides of the argument persist.
Unstructured Data Types
Unstructured data is raw and unorganized and organizations store it all. Ideally, all of this information would be converted into structured data however, this would be costly and time consuming. Also, not all types of unstructured data can easily be converted into a structured model. For example, an email holds information such as the time sent, subject, and sender (all uniform fields), but the content of the message is not so easily broken down and categorized. This can introduce some compatibility issues with the structure of a relational database system.
In case you’re still not quite sure what we mean, here is a limited list of types of unstructured data:
- Word Processing Files
- PDF files
- Digital Images
- Social Media Posts
Looking at the list, you may be wondering what these files have in common. The files listed above can be stored and managed without the format of the file being understood by the system. This allows them to be stored in an unstructured fashion because the contents of the files are unorganized.
The big data industry is growing but the problem of unstructured data going unused has been identified by organizations. Better yet, technologies and services are being developed in reaction. Darin Stewart of InformationWeek said in a recent article about big data, “The age of information overload is slowly drawing to a close. Enterprises are finally getting comfortable with managing massive amounts of data, content and information. The pace of information creation continues to accelerate, but the ability of infrastructure and information management to keep pace is coming within sight. Big Data is now considered a blessing rather than a curse.”