How to Take Charge of Your Dark Data Problem [Guide]

by | Apr 18, 2018 | Uncategorized

You hear it everywhere: your data’s valuable and you need to leverage it. But what about the data you don’t really use or know anything about–your dark data?

To summarize Garter Inc., dark data is the info you collect, process, and store in the course of business, but don’t use for anything else. And IBM estimates that dark data will account for 93% of all data generated by 2020.

Dark data can span many different files, file types, shares, exports, and storage resources, but most of it fits into a few different categories:   

  • It’s valuable – Analyzing valuable dark data (and its metadata) could give you business insights you don’t currently have.
  • It’s putting you at risk – You might have sensitive data on inadequate storage. How many files contain PHI or SSNs and are stored unencrypted?
  • It’s junk These files were useful once, but now they’re redundant, obsolete, or trival–what the industry sometimes calls “ROT.”  And now they’re just costing you valuable storage space.

Each of these categories needs action from you, whether it’s to harness insights, reduce risk, or lower storage costs.

Dark Data

How to Find Dark Data

To address dark data in your unstructured files, you first need to know which files are where.


Writing scripts is relatively easy to coordinate with one or two shares, if not time-consuming. But at the enterprise-level, your might struggle to submit it to a cron in Linux or Windows Batch while trying to run hundreds of jobs simultaneously. After that, you’d need to place the outputs in some kind of database to get the information in a usable format.

Metadata Viewers

You can deploy metadata viewers by directory or by file. They’re useful for spot-checking, but aren’t really scalable for getting a full view of what’s on your filesystem. And the hours add up quickly when you’re scripting, defining a process around gathering and reporting, and–if you’re archiving–writing extra programs.  

StorageX Discovery

StorageX file discovery is scalable and comprehensive. It can analyze a single share, export, storage resource, or your entire environment–empowering you to know which files you have, see where they are, and view their metadata.

Taking Action on Dark Data through Metadata & Custom Tagging

Knowing which files you have is a great start, but how do you take action? The answer lies in your file metadata.

Example 1: Acting On Valuable Data

Imagine the metadata on one of your file subsets tells you that many users from a single department regularly access a particular file type. Despite these files being hot, you might also learn they’re located on a lower-level tier.

Action: Move those files to a higher-performing tier. If your business applications need access to these files, deploy a StorageX RESTful API integration.

Example 2: Acting On Risky Data

On a different filesystem scan, you might identify a subset of files created by a customer-facing department–maybe the intake staff at your healthcare facilities or the advisory staff at your financial institution. Based on the metadata of who created the files, you know they probably contain sensitive PHI or PII data.

Action: Move these files to your encrypted storage resource or, if the files are also cold, to an encrypted archive.

Example 3: Acting on Junk Data

On another filesystem scan, say you identify 25 terabytes of files with a last-accessed date of more than a year ago. The metadata tells you nobody’s using them, nobody needs them, and wherever you have them is probably more expensive than archiving them.

Action: Write the cold files to an Object archive, typically for less than $0.10 / GB-Month TCO.

In each of these scenarios, StorageX empowers you to custom-tag the file subset you created. This makes it easier to recall your files if you need them, and the custom tags persist into your Object archive.

The Cost Savings of Tackling your Dark Data

In each of the scenarios above, you’re identifying data and putting it somewhere else–whether that somewhere else is new NAS or an object archive. And those new storage locations probably offer substantial savings over your legacy NAS.  

Imagine that in tackling your dark data, you found 20 TBs you can move to new NAS, and 30 TBs you can archive to object.

Right now, 50 TBs on that legacy NAS is probably costing you around $0.50 per GB Month TCO–totaling $300K per year. By contrast, let’s say that new NAS costs in the vicinity of $0.15 per GB Month TCO, and object storage might be around $0.03 per GB Month TCO.

So if you move the 20TBs to new NAS and 30TBs to object, those run rates total just $46,800 per year total, saving you about $250K per year total. How much leverage would that kind of savings give you the next time you ask for a raise?

Here’s that breakdown again:

Tackling Dark Data

Planning for Data Growth with Policy-Based Movement

The flood of files doesn’t stop once you’ve handled old dark or unstructured data. You also need a way to handle new data as it’s generated.

For each of the scenarios above, the StorageX file movement policies you implemented can act upon new files as they’re created. You can also automatically custom tag new files with each new scan of your storage resources, and those tags persist into your Object archive.

For example, you might create a policy to immediately reassign or quarantine the data of employees who leave the company. Or you might write finance department spreadsheets to S3 if nobody’s accessed them in the last 12 months. How many similar possibilities exist in your environment?

However you manage dark data, we think you should always maintain ownership of your files. That means no stubbing, sharding, sym links, gateways, proprietary namespaces, file virtualization, or vendor lock in.

Give us a call today and take charge of your dark data and file management.  

Contact Data Dynamics