Do you have data? Sounds like a trivial question but, it’s not. As a person, you may have personal finance documents, resumes, pictures, music, etc. Where is it stored? Maybe on your phone, your computer or, gulp, in the cloud. The challenges of managing your own personal data seem tedious and time-consuming. And that’s for a measly few terabytes of data.
What happens if you run a multinational corporation? What is your data? Where is your data? Who owns it? How do you categorize it? Companies of this size will usually have dedicated data centers, colors or may also use cloud storage to preserve and manage that data. The size of the data can range from hundreds of terabytes to hundreds of petabytes and they typically employ hundreds of people to manage, move, replicate, protect and back it up. The conversation around data of this size usually involves the words Terabytes, Petabytes, some cryptic names of a data center and the name of a product vendor you may or may not have heard of. Wouldn’t it be great if you knew how much data was in images, or music, word documents? What about the amount of data at each site? In each state or province? In each city? For each department, application or project? On each array or server?
Unfortunately, data today is defined by the server that houses it. A data manager may say “I have 100TB on NetApp Array X in the SWDC”. In that sentence, you never heard what kind of data, the team or department that owns it, how old it is or even how it is accessed. That’s because he/she doesn’t know. And he/she doesn’t know because he doesn’t have a product that can tell him. He/she may be able to run a script to determine the number of files that were created on a certain date or accessed on a certain date. But try doing that globally. Traditional file systems can tell us a lot of information about their data but do not allow for custom metadata fields which would help further define and classify data. Enter the CUSTOM TAG. Custom tagging allows you to categorize and classify each file in a way that helps to tell the story of the data, what it is and why it’s there. “What is this CUSTOM TAGGING,” you say? I’m glad you asked. Custom tagging is a feature that is available with Data Dynamics StorageX Analytics. StorageX Analytics can walk your unstructured file systems, ingesting metadata and store it in a database, making that metadata available for custom reporting and for Archival Policies. Now you’re able to produce reports telling you who owns the data, how large it is when it was created, modified or accessed. While StorageX is walking the filesystem, it can apply custom tags to the metadata as it is stored in the database. The custom tags are available to be used as datasets. If the files in the dataset are slated for Archive, StorageX will take those custom tags (along with all metadata that has been collected) and persist them to the S3 compliant object storage or Azure Blob storage that is the destination of your archive policy.
With all of this said, it seems important to develop some sort of tagging scheme. After all, the tags themselves can be used as datasets. In fact, all your scans may incorporate similar tags and values. Using the tags and values as datasets can help you aggregate all your cans together in a single query. Wow! Sounds very powerful. But hold on. It works better if you lay out the tagging scheme ahead of time. Here are some examples that I’ve used.
Large Financial Company X
- North America
- Secaucus, NJ
- Houston, TX
- Denver, CO
- London, UK
- Hong Kong
- HR departments
- Multiple rooms or floors in each datacenter.
- Uses multiple hardware/server vendors
- Accessing data via both SMB and NFS protocols, ie. Windows Shares and NFS Exports.
- Rigid change control process requiring approval for each data interaction.
- Each department is running multiple projects.
|Project||Billboard Campaign||department granular|
Using StorageX to scan data in a company as large as this would inevitably require multiple scans. If all scans are tagged in a consistent manner, there will be tag values that are common to multiple scans. The tags can be queried to provide varying company-wide, datacenter, departmental and organizational views. With a StorageX metadata analysis, each of these tags can be used to produce a dataset from multiple discovery scans. The resulting dataset will contain metadata from files across multiple scans allowing the user to query data from their entire company down to just a single share/export/mountpoint. Depending upon the tag that is used, StorageX can present a high-level view of the entire organization, or a more granular view down to the application and then into the file metadata view. Talk about Powerful!
Using the Companywide tag as your dataset will show all data within the entire company! Using the Subnet tag will show all data on a single subnet. And using the Application tag will produce reports for all data belonging to an application.
The tagging scheme and order that is used is up to you. The value and insights that you obtain are priceless.