Bring the Light to Dark Data

by | Nov 11, 2019 | Cuong Le

When you ask business executives how important data is for enterprise success, it’s agreed that data so crucial that the world’s most valuable resource is no longer oil, but data – according to The Economist.

Data can be a powerful resource to harness, but what really happens with data?

In reality, the deep dark secret, hidden in the bowels of your company, is that data is largely mismanaged. Data and storage growth is at 81% a year with 85% of that growth being unstructured data. To keep up with this growth, the IT infrastructure team has been effectively acquiring storage bins (containers for all of the company’s data), where all of the data is being thrown into by the lines of business and others. All of this data is being managed as important data with high levels of SLAs for backup, availability, and performance. This is generating significant stress on your infrastructure.

I recently went to the Gartner IT Symposium, where we talked about how 50% of structured data and only 1% of unstructured data is actually used! To make matters worse, within the unstructured data, no one knows what it even is. Regardless of your company’s policies, people may be storing non-business information like the local taco shop menu, their iTunes library, or even worse maybe storing confidential or personally identifiable information in shared repositories. This generates a significant amount of risk.

In shared storage environments, we have seen anywhere from 60% to 85% of their unstructured data as dark. These are files that belong to a person no longer in the company, no one has accessed this data for years. This is extremely costly, as maintaining a high SLA of backup, availability, and readiness can cost $3,000 to $6,000 per TB per year in TCO. A contributing factor for the TCO is that your backup and operational procedures for files could mean that you make 17,000 copies of the same file over a 5 year period. Moving this dark data onto cloud/object storage can take the TCO down to $200 per TB per year. This has a significant impact on both your bottom line along with reducing risk in your environment.

To manage this, you should:

Know what you have – Do a file metadata discovery to understand what you have. Identify the dark data, especially dark data with high-risk information.

Categorize your data – Tag where that data originated from, what business it belonged to, application context, data value, and risk.

Automate data management – Create policies that automatically manage your data with data location optimization policies. Create quarantines for high-risk dark data, archive dark data, and target critical data to the highest value storage platforms.

Turn your dark data into an asset – Leverage the tagging, context of all your file metadata, to create insights about your data. Create libraries of categorized data that you can go mine and do augmented analysis.


Bring light to your dark data with the 4 steps mentioned above!