Policies for your organization’s IG strategy. Classification, we don't need no stinking classification.

Just because you’ve created retention policies for email and files, it in no way means you are done. Actually, the fun is about to begin.

First and foremost, you are never actually done with information governance (IG). The process is something that lives as long as you have data, and new data is growing at an exponential rate. Since i’m a comparison/metaphor kind of a guy, think about information governance vs. losing weight. Yes, you can achieve your goal of losing 20 pounds, but once you do, you aren’t done. You have to then maintain that weight. The world’s top athletes say it is harder to stay number one, then it was to get there. Why? Because they know that you are never done after you have achieved your goal. The same is true for IG. Once you think you have your arms around your data, new known/unknown data has been created. This means that you (and the rules you have created), must adapt as needed.

A second challenge in managing messages and files involves classification. Though the term classification sounds intuitive, it may not be clear for everyone so let’s define what classification means relative to your data. Classification is assigning (either via machine-assisted means or user interaction) ‘types’ to your data. I alluded to this process in my last blog post about files. I believe that classification is a very important piece to the information governance puzzle. The example I used previously, was that of a .PDF file, that file could be a contract document or a set of instructions for toy assembly. Although both files have the same file format I think we can agree they would be classified differently.  The same holds true for email. Is a message about lunch as important as a discussion regarding the risks of a deficient product? Probably not. This is where classification enters the fray and allows you to categorize data of the same type (e.g. email that is personal versus business).

So let’s talk about how the data gets classified. As stated above, there are two ways to classify:

  1. Manually, where a classification is performed by a person, or
  2. Machine-assisted (automated), where a classification is assigned based upon intelligence provided to the classification software

Let’s start with example of manually classifying email.

When a person classifies a message they must take into several criteria (sender, keywords, context, etc.) to assess what of category the message falls into. Sherpa Software has customers that do this in a variety of ways, here are a couple of examples:

One customer requires their mailbox owners to classify every email they send or receive. The classification types are based upon the department that mailbox owner works in so the classification categories for IT differ from those in legal allowing a classification process specific to each business unit. Each classification type assigned by the mailbox owner is tied to a retention schedule determining disposition.

Another customer deploys a stock set of retention folders within the mail, then requires mailbox owners to classify messages by placing them into the appropriate retention folder. The ‘punishment’ for messages not being classified, is that a shorter retention (e.g. 90 days) is assigned to all non-classified email.

In the machine-assisted paradigm software is automatically examining and classifying content. For this process to work properly the software must have the ability to both learn and apply the proper classification. The ability to learn has a fairly broad methodology. There are ‘static’ and ‘dynamic’ types of classification software.

A static classification software would use the configuration you provided and assign classifications based solely on that information. For instance classification could be assigned based upon content (e.g. keywords) within an email. Static classification models do not deal well with exceptions though making dynamic classification much more appealing. In dynamic classification software auto-learns based upon data that it previously analyzed. This could well be the ‘golden ticket’ of the classifying options. If software can discern what classification to assign based upon content, then the classifications will be much more accurate. The software can even understand synonyms for words within the data. For instance, it can learn that different breeds of dogs have much in common and use this information to ‘intuitively’ assign classifications.

If you are ready to undertake a classification project a series of big decisions lay in front of you.  The first is, should you classify your data? If you do decide to move forward with classification method is best for your organization?   If you chose the machine-assisted route should you select static or dynamic? You’ve probably guessed by not that I highly recommend dynamic classification software. It has the most upside and is well worth the investment. Though initial expense can be higher than some of the other options, I believe the amount of productivity gain is invaluable. It helps achieve an extremely high accuracy level and it does not rely on people to make subjective decisions about what classification to assign.

Classification is important and needs to be heavily considered, regardless of the method you choose. If you do decide to start classifying your data, allow me to quote from the Grail Knight in “Indiana Jones and the Last Crusade”, “You have chosen, [Pause] wisely.”

For more information, go to www.SherpaSoftware.com or call 1.800.255.5155

Leave a Reply

Your email address will not be published. Required fields are marked *