In a previous blog article, I talked about what to consider when creating retention policies for email. I spoke about things to consider when managing messages. Now, it’s time to talk about the elephant in the room- files. Many email administrators think that messages would win the prize for existing in the most locations and it is true that email can be stored in many different places and even in different formats. However, I firmly believe email doesn’t hold a candle to ‘loose’ files. For your reference, when I say ‘loose’ file, I mean any stand-alone file that is not stored within a specific database. Good examples are files with extensions of .txt, .pdf, .doc, etc.

If you started making a list of all places where these files could possibly be stored, you might break into a cold sweat. The very thought of files either being lost or falling into the wrong hands could wreak havoc on an organization. Add to this the legal implications of files not being managed by a retention policy and there is a great cause for concern.

Though eDiscovery is prevalent in almost every company, most of the focus is still on messages, with most of that focus on the centralized messages on your mail servers. The thought of performing eDiscovery on loose files is quite often dismissed, most often because there is no proper inventory for files. If you don’t know where files are stored, how could you possibly search and/or manage them?

This leads to my first consideration regarding files; and getting your arms around where they are stored. This may seem like an impossible task, but you need to start somewhere and at some point so why not now? Think of it like raking leaves in the fall. Though the wind is blowing and the leaves are falling, you still put rake in hand and gather the leaves together. You can’t possibly get all of the leaves into a pile, because they are still falling and the wind isn’t helping. But you collect the vast majority and dispose of them. Files are the same. While the weather is your opposition when raking, for files it will be fellow employees within your company. I’m not saying that your coworkers are intentionally doing something to undermine your file management. However, even the best intentions can cause problems. I know that I have placed files in a ‘safe’ place, only to forget where I put them and there they stay far past their time of usefulness.

Another metaphor to help understand computer storage is like comparing it to closets in a house. No one has a closet that is empty. If you have a closet, you fill it with stuff. The same is true for disk storage. If there’s room to store data, data will be stored. Users will not delete files until they have used all of the available storage given to them. And in today’s world, it’s becoming harder and harder to use all storage that is available. So what does that mean? That means that deleting files is a rare event, so they continue to accumulate endlessly. This is why file management is necessary. You must get an understanding of what is stored where and why.

So where do you start? Most files are going to be stored in one of three locations:

  1. file servers
  2. laptops/desktops
  3. any file-sharing software (e.g. SharePoint, Connections, etc.)

Creating a file inventory
Now that you have a place to start, what next? I advise creating an inventory of your files. You need to get an understanding of what types of files are stored in these three locations, who the custodian (owner) is and how old they are. Though this could be done manually, I recommend using a technology solution to be able to not only perform your initial inventory more efficiently, but also be able to perform this in perpetuity. This will provide you with a system to constantly inventory your files on a set schedule. This will most likely not be something that happens overnight. The distributed data that is on user machines can very much complicate this issue, but it is imperative the inventory be performed for each user. This is not a sprint, it is a marathon.

Once you have this inventory created, now it’s time to work with legal, risk management, etc. and get an understanding of what retentions can be assigned to the various file types that you have found. Keep in mind that not all files with the same extension contain the same type of data. For instance, a .pdf file could be a contract or a set of instructions for the assembly of a toy for an employee’s child. Obviously, inspecting every file manually could take years.  But having an idea of the file types will at least provide a foundation for how to handle the files.

Secure all files
Another method is to model what the government does regarding identifying counterfeit bills. The government agents memorize the attributes of the authentic bills and that allows them to easily identify those that are counterfeit. If you are able to confidently secure all files that you know are important, then you can start making informed decisions on what is left. And if you know you are retaining the necessary files and a retention policy is set for each file type, then the deletion of the ‘aged’, nonessential files will greatly reduce the amount of data that needs to be reviewed.

Create retention policies
Now let’s look at this another way. Some companies are required to maintain emails for a specific number of years (e.g. 7 or 10). If I received an email that had an attachment and I detach that file, how long should that file be kept? Should it comply with the retention policy for the message that contained it or should it follow the retention for the file type? These are questions that must be askedand also enforced with an automated policy. The possible legal/financial implications of these files is worth these discussions.

Though I am not advocating exactly how you manage your files, I do want to pose some questions that need to be asked within every organization.

  1. What file types are business-related?
  2. Should retention be assigned to files? If so, what types?
  3. Are there file types that should be retained infinitely?
  4. Are there owners whose files should never be deleted?
  5. What locations should first be managed? File servers? Laptops/Desktops?
  6. Are there files relative to litigation or eDiscovery that must be retained?

These questions should initiate interesting conversations within your company.

In summary, I strongly urge you to get an understanding of the files that are stored in your environment and give files the proper attention and management that they deserve. As always, you must have the proper people from these groups communicating on what action should be taken on files and when. Keep in mind that email is not the only data source that contains proprietary information.

I will leave you with this question:

Which is worse, to delete a file that should have been kept or to keep a file that should have been deleted?

This is for your company to decide what the best methodology is for you.

For questions or comments, please email