Bring the Light to Dark Data

by | Sep 17, 2021 | Blog

According to “The Economist”, when you ask business executives how important data is for enterprise success, they agree that data is so crucial that the world’s most valuable resource is no longer oil, but data.

Data can be a powerful resource to harness, but what really happens with data?

Data is largely mismanaged, and this is a dark secret lurking deep within your company’s internal workings. Data and storage are growing at an annual rate of 81 percent, with unstructured data accounting for 85 percent of that expansion. To keep up with the company’s expansion, the IT infrastructure team has been collecting storage bins (containers for all the company’s data), into which all the data is being tossed by the lines of business and others. All this data is treated as critical information, with service-level agreements (SLAs) for backup, availability, and performance in place.

This is generating significant stress on your infrastructure. Companies assimilate all types of data – structured or unstructured, raw, real-time information – created in massive amounts in a matter of seconds in the age of big-data. Although dark data are a subset of big data, it accounts for most of the entire volume of big data acquired by businesses each year. Companies rarely analyzed or handle dark data for a variety of reasons, but this does not diminish its importance in terms of business value. There are two perspectives on the significance of dark data. Unanalyzed data, according to one viewpoint, includes hidden, valuable insights and represents a missed opportunity. The other point of view is that unanalyzed data, if not handled properly, might lead to a slew of issues, including compliance related and security issues.

It was noted during the Gartner IT Symposium that 50% of structured data and just 1% of unstructured data is really used!

To make matters worse, within the unstructured data, no one knows what it even is. Regardless of your company’s policies, people may be storing non-business information like the local taco shop menu, their iTunes library, or even worse maybe storing confidential or personally identifiable information in shared repositories. This generates a significant amount of risk.

60 to 85 percent of unstructured data in shared storage setups is dark.

These are files that belong to a person no longer in the company, no one has accessed this data for years. This is extremely costly, as maintaining a high SLA of backup, availability, and readiness can cost $3,000 to $6,000 per TB per year in Total cost of ownership (TCO). One aspect of the TCO is that you may make 17000 copies of a file over five years because of your backup procedures and operational procedures. Moving this dark data onto cloud/object storage can take the TCO down to $200 per TB per year. This has a significant impact on both your bottom line along with reducing risk in your environment.

How to manage dark data effectively

To effectively manage dark data, review the following steps:

  1. Know what data is being collected and what is already available: To bring light to dark data, the foremost best practice is to reveal what data is collected and already available to potential users within the organization.
  2. Sort your dark data according to your goal: To cope with a big amount of unstructured data, consider what kind of issues need to be addressed in your business operations or contact center environment. With so much data to filter through, it’s difficult to deal with without adequate goal setting. As a result, begin analyzing with those objectives in mind.
  3. Data discovery and classification: Data discovery is a method of gaining total visibility of an organization’s overall data environment by running a process on a huge amount of unstructured data. With the use of various data analytics tools or by applying various data pattern algorithms or queries, you can identify important data. The classification of enterprise data using a data categorization engine is the next stage in dark data management. This process enables businesses to determine the value of a certain piece of data and the business to which it belongs, such as where the data might be helpful, data value, security, and risk, and so on. This step will help in determining what exists within your dark data.
  4. Timeline to keep dark data: Companies must determine if and for how long they should retain data. This is critical to prevent spending potentially significant expenditures in collecting and retaining data that isn’t being utilized and won’t be used in the future—and, more critically, to ensuring that the data is properly handled and secured.
  5. Storing dark data: The most difficult aspect of working with dark data is just gaining access to it, as it is frequently held in isolated repositories near to the point of collection. It could also be stored in difficult-to-query systems and formats with limited analytical capabilities.
  6. Effective data use: The next step is to ensure that the collected data can be used effectively. There are two main approaches: (1) investing in tools able to query data where it is stored, and (2) moving the data to centralized platforms. Utilize tools that allow for data discovery, analysis, and visualization across multiple platforms and locations, so you won’t have to store the same information multiple times and it will be more visible. In addition, to reduce the number of data stores that must be tracked and managed, utilize storage platforms that aggregate and store data that would otherwise be inaccessible. Create policies that automatically manage your data with data location optimization policies. Create quarantines for high-risk dark data, archive dark data, and target critical data to the highest value storage platforms.
  7. Turn your dark data into an asset: Leverage the tagging, context of all your file metadata, to create insights about your data. Create libraries of categorized data that you can go mine and do augmented analysis.

Data Dynamics will help you unlock the value of dark data

For effective data archiving, we recommend Data Dynamics’ three-stage approach: discover, analyse, and archive.

Discover:

We use StorageX’s Analytics module at this point to scan chosen shares or exports and log all file metadata information. StorageX can report the platform operating system version, the filesystem type (e.g., on NetApp, a flexible volume), and the array hostname, among other things, in addition to usual metadata fields like owner, creation time, access time, whether the file is compressed or encrypted, and so on. StorageX also allows you to apply custom tags to the data scans during this step.

Analyze:

StorageX can be used to further identify and analyze the data once it has been detected. The metadata fields collected, and custom tags applied in the Discovery stage can be queried with StorageX. These queries can be used to find out how many of these files haven’t been visited in more than three years and/or have no known owners. Customers frequently discover that more than 50-70 percent of their scanned data fits a query like this, and in some situations, more than 90 percent!

Archive:

After you’ve located and examined the dark data, you can use StorageX to act. While StorageX can move data to lower-cost storage tiers (e.g., flash drives to magnetic discs), the best cost-effective option for the customer is to migrate infrequently accessed data to an object store. The main on-premises and cloud-based object store providers are natively supported by StorageX. Customers can take the results of their data analysis and archive it to an object-store immediately or on a scheduled basis using an easy-to-use interface.

Once you’ve dealt with old dark or unstructured data, the flood of files continues. You’ll also need a mechanism to deal with fresh data as it comes in. The StorageX file movement policies can act on new files as they’re produced. With each fresh scan of your storage resources, you can also automatically custom tag new files, and those tags will be saved in your Object archive.

For example, you might create a policy to immediately reassign or quarantine the data of employees who leave the company. Or you might write finance department spreadsheets to S3 if nobody’s accessed them in the last 12 months. How many similar possibilities exist in your environment?

Final Thoughts

Data breaches have gained a lot of attention in recent years as businesses become more reliant on digital data, cloud computing, and remote working. As a result, compliance and regulations have emerged as a necessity for ensuring information security. The Data Dynamics Compliance Suite utilizes both Insight AnalytiX and ControlX, as a part of the Unified Unstructured Data Management Platform, to provide intelligent identification of data sets that contain files with content fields that are associated with regulatory requirements.

Organizations often have dark data as part of their universe of assets, just as dark matter in physics is. Hence one cannot afford to ignore the dark data. Give us a call today and take charge of your dark data and file management.