In this opening episode, Don Mounce interviews Piyush Mehta, the Dean of Data and CEO of Data Dynamics, on how enterprises can tackle dark data.
You hear it everywhere: your data’s valuable and you need to leverage it. But what about the data you don’t really use or know anything about–your dark data?
To summarize Garter Inc., dark data is the info you collect, process, and store in the course of business, but don’t use for anything else. And IBM estimates that dark data will account for 93% of all data generated by 2020.
Dark data can span many different files, file types, shares, exports, and storage resources, but most of it fits into a few different categories:
- It’s valuable – Analyzing valuable dark data (and its metadata) could give you business insights you don’t currently have.
- It’s putting you at risk – You might have sensitive data on inadequate storage. How many files contain PHI or SSNs and are stored unencrypted?
- It’s junk – These files were useful once, but now they’re redundant, obsolete, or trival–what the industry sometimes calls “ROT.” And now they’re just costing you valuable storage space.
Each of these categories needs action from you, whether it’s to harness insights, reduce risk, or lower storage costs.
How to Find Dark Data
To address dark data in your unstructured files, you first need to know which files are where.
Writing scripts is relatively easy to coordinate with one or two shares, if not time-consuming. But at the enterprise-level, your might struggle to submit it to a cron in Linux or Windows Batch while trying to run hundreds of jobs simultaneously. After that, you’d need to place the outputs in some kind of database to get the information in a usable format.
You can deploy metadata viewers by directory or by file. They’re useful for spot-checking, but aren’t really scalable for getting a full view of what’s on your filesystem. And the hours add up quickly when you’re scripting, defining a process around gathering and reporting, and–if you’re archiving–writing extra programs.
StorageX file discovery is scalable and comprehensive. It can analyze a single share, export, storage resource, or your entire environment–empowering you to know which files you have, see where they are, and view their metadata.
Taking Action on Dark Data through Metadata & Custom Tagging
Knowing which files you have is a great start, but how do you take action? The answer lies in your file metadata.
Example 1: Acting On Valuable Data
Imagine the metadata on one of your file subsets tells you that many users from a single department regularly access a particular file type. Despite these files being hot, you might also learn they’re located on a lower-level tier.
Action: Move those files to a higher-performing tier. If your business applications need access to these files, deploy a StorageX RESTful API integration.
Example 2: Acting On Risky Data
On a different filesystem scan, you might identify a subset of files created by a customer-facing department–maybe the intake staff at your healthcare facilities or the advisory staff at your financial institution. Based on the metadata of who created the files, you know they probably contain sensitive PHI or PII data.
Action: Move these files to your encrypted storage resource or, if the files are also cold, to an encrypted archive.
Example 3: Acting on Junk Data
On another filesystem scan, say you identify 25 terabytes of files with a last-accessed date of more than a year ago. The metadata tells you nobody’s using them, nobody needs them, and wherever you have them is probably more expensive than archiving them.
Action: Write the cold files to an Object archive, typically for less than $0.10 / GB-Month TCO.
In each of these scenarios, StorageX empowers you to custom-tag the file subset you created. This makes it easier to recall your files if you need them, and the custom tags persist into your Object archive.
The Cost Savings of Tackling your Dark Data
In each of the scenarios above, you’re identifying data and putting it somewhere else–whether that somewhere else is new NAS or an object archive. And those new storage locations probably offer substantial savings over your legacy NAS.
Imagine that in tackling your dark data, you found 20 TBs you can move to new NAS, and 30 TBs you can archive to object.
Right now, 50 TBs on that legacy NAS is probably costing you around $0.50 per GB Month TCO–totaling $300K per year. By contrast, let’s say that new NAS costs in the vicinity of $0.15 per GB Month TCO, and object storage might be around $0.03 per GB Month TCO.
So if you move the 20TBs to new NAS and 30TBs to object, those run rates total just $46,800 per year total, saving you about $250K per year total. How much leverage would that kind of savings give you the next time you ask for a raise?
Here’s that breakdown again:
Planning for Data Growth with Policy-Based Movement
The flood of files doesn’t stop once you’ve handled old dark or unstructured data. You also need a way to handle new data as it’s generated.
For each of the scenarios above, the StorageX file movement policies you implemented can act upon new files as they’re created. You can also automatically custom tag new files with each new scan of your storage resources, and those tags persist into your Object archive.
For example, you might create a policy to immediately reassign or quarantine the data of employees who leave the company. Or you might write finance department spreadsheets to S3 if nobody’s accessed them in the last 12 months. How many similar possibilities exist in your environment?
However you manage dark data, we think you should always maintain ownership of your files. That means no stubbing, sharding, sym links, gateways, proprietary namespaces, file virtualization, or vendor lock in.
Give us a call today and take charge of your dark data and file management.
“I can’t wait to do another tech refresh!” – said no storage admin ever
Storage technology refreshes offer all kinds of business benefits:
- Expanded capacity with less physical space
- Lower storage costs
- Lower risk of machine failure
- The chance to introduce a proactive data management strategy
But tech refreshes can also be complex, risky, and time consuming. Here are three tips to make sure your technology refresh goes smoothly:
1. Know Which Files You Have Where
To move old data to the new storage, you’ve got to know which files you have on the end-of-life machine(s). Smaller organizations with only one or two shares might write a script to discover these files, but scripts are difficult to scale for enterprises with TBs or PBs to manage—same for metadata viewers.
A faster alternative is to scan your storage resources with StorageX to see what you have. And while in the Management Portal, you can also analyze those files for business insights. All those orphaned files that nobody’s accessed in years? Get them off critical storage and to an S3 archive for pennies on the dollar.
2. Cut Time Spent Supporting Both Old and New Storage
For some organizations, it’s the incredible savings of new storage over legacy systems that prompts a storage tech refresh. But some companies claim their storage costs actually increased after buying new storage. How?
Once you buy the new machine, you’re supporting two systems until you retire the old one. So the faster you transition, the sooner you realize the storage savings you were sold on.
Saving time is powerful benefit of the StorageX platform. It’s saved 80 years of employee time over traditional tools thanks to its large-scale parallelism.
3. Know the Extra Expense of “Free” Tools
Imagine I asked you to dig a 1’ x 1’ hole with a shovel that I gave you for free. It might not take you very long–maybe a few seconds.
Now imagine the hole needs to be 10’ x 10’…dug with the same shovel. Could you physically do it? Probably. Is your free shovel the most efficient way? Not really. You could hire ten people, but their labor expense will probably offset any money you saved by providing them with free shovels.
Now imagine, you have to dig 100 10’x’10’ holes. How do you coordinate? Who does what? Is there quality control?
We think digging a giant hole with a shovel is like using rsync, robocopy, and other “free tools” in your storage tech refresh. You can do it…it’s just unnecessarily time-consuming. It also requires specialized employees’ time—making it pricey—and extensive manual input—making it risky.
But faster options are so much more expensive that they could never be worth it, right? Actually, the break-even point for StorageX compared to “free” tools typically occurs 15% – 25% through the project timeline. Gaining the efficiency of centralized control, logging, and management allows your tech refresh to complete with less labor, less risk, and faster.
4. Shed Light on Your Dark Data
A storage tech refresh is also a great chance to deal with your dark data, which might be taking up expensive storage resources. Found a bunch of files with no corresponding name to a SID? Move them to S3 archive instead of onto the new platform—and poof! You just reduced the amount of data you need to move. In addition, you can add tags to that data to help identify it later.
StorageX empowers you to scan shares and exports for dark data and for unstructured data—you can also have them enumerate automatically as new ones are generated. By comparison, most free tools give little information on shares or exports, and no info on unstructured data.
With these tips in hand, you’re sure to have smoother tech refresh experience.
“Data, not oil, is your new most valuable resource.”
That’s the mantra behind a slew of articles published over the past year about the emerging digital enterprise. And the comparison makes sense: your data is the untapped secret weapon to gain competitive advantage, the commodity that’s so valuable it’s earned a place at the table in many leading C-suites.
Despite its prevalence, the metaphor leaves one important question unanswered: how do we get there? How do we make the best use of our data?
1. Don’t Lock Up Your Data
Putting your data in a traditional archive is like putting your brand-new sports car into mini-storage, paying the storage boss every time you want to visit, and never being able to store the car elsewhere…forever. (Spoiler alert: this is a win-win for the storage boss.)
Traditional archives lock your data into their proprietary system and charge lucrative license fees for you to access it. That puts you at a data-disadvantage:
-You’re rarely going to use it
-You’re getting little benefit from it
-You probably don’t even know what’s all in there
-All of this is costing you money—more so as time goes on
Does this sound like the strategy of a digital enterprise?
2. Get Data Where You Need It, When You Need It
Bust that data sports car out of the mini-storage archive and tear up the Pacific Coast Highway…or Pike’s Peak…or Ol’ Route 66. Wherever. And do it whenever you want.
How? By analyzing your legacy NAS files to see what you’ve got and moving them to cloud-based object storage. You can leave your data in full native object—without wrapper, sharding, deduplication, lock-in, or access fees.
With new applications like Athena that are built on S3, you can query old files to know which data you have and extract value from them, all while maintaining ownership of your files.
A recent World Economic Forum blog stated, “Unlike oil, the value of data doesn’t grow by merely accumulating more. It is the insights generated through analytics and combinations of different data sets that generate the real value.” We couldn’t agree more, and it’s this value that we think StorageX, our award-winning file management platform, helps you generate.
Your files wanted us to share this letter…
To My dearest IT Storage Professional,
Where art thou? I write you this Valentine’s day from the depths of the company file shares. On behalf of all the aging and orphaned files hidden away never to see the light of day, I must ask—why have you abandoned us?
When we were “young” and active files, we worked daily with staff crafting marketing plans, reports, spreadsheets, and presentations that helped launch the company. Now that years have passed and our creators have been promoted, changed roles, or left the company, you don’t see our value anymore.
Or are you just upset because of how expensive we are to store, year after year?
I implore you to consider the valuable information we contain:
-Business challenges we faced and how we addressed them.
-Market and financial statistics we collected and how they compare today.
–Business strategies we deployed and which ones were successful.
In the spirit of Valentine’s Day, let’s reignite the relationship by finding that passion that brought us so close…I’ve seen StorageX Analytics help many others reconcile and rekindle—we should do the same.
Together we can energize the company with deep insight into past company practices by:
-Performing complex searches and identify files by age, type, and owner.
-Adding custom tags to organize files by project or department.
–Moving files to S3-compatible Object storage, for safe keeping.
Might we schedule a product demonstration on Storage X to learn more?
I would so look forward to working together again—and help move us toward a digital enterprise!
Your Unstructured Files
With the increasing number of files and growth of your data, wouldn’t it be nice to add your own metadata? Something that you control and adds information about the file?
The answer is custom tagging.
What is Custom Tagging?
Custom tagging allows you to quickly and easily create custom information about a file or document, and then apply it to a dataset or the object being created.
For example, you may wish to associate files with a certain project or department. Custom tags enable you to type in a keyword and quickly have a listing of thousands of matching documents. With custom tagging, research time shortens from days—or sometime months—to a matter of seconds.
The Data Dynamics StorageX Management Portal enables you to scan and gather information about your unstructured files—using tags to categorize the results based on your business needs. By combining tags and scans, you can assess and evaluate across:
In turn, this information enables you to create policies that convert files and documents to S3 object files for archival. StorageX stores your custom tag(s) in the S3 object when it does the file-to-object conversion. No matter where the object resides, the custom tag metadata is available to help with your search and discovery.
The Business Value of Custom Tagging
It’s OK to admit: maybe you don’t know what all company data and files you’ve got. Custom tagging empowers you to classify your data using details, like where it is, how it’s being used, and who’s accessing it.
Activate Your Data with Business Analytics
Knowing what you have is a great start, but the real value of custom tagging is how it enables your business analytics. After you move data into S3, tagging improves the quality of how you search for files in the data lake (all in natural format).
For example, one of your lines of business might have files on critical storage assets and not even know it—creating an unnecessary expense. In this case, business analytics could tell you which files, where they are, and to which line of business they belong. From there, you could recommend which files might be better to move into a less-costly archive. And if you can move them in only one third of the time and without tying up your expensive engineers, you’ll look like a rock star.
How to Apply Custom Tags
The StorageX Management Portal enables a simple workflow:
Here, you create and apply custom tags that are associated with the dataset you are building. The custom tags will be associated with any file in this dataset.
Use scan-name or custom tags to define which datasets you want to evaluate. Your tags might represent: country, data center, organization, or legal status. You can gather information based on these tags and then apply queries to quickly create a list of files that are candidates to archive. The output of this is an Analysis Set.
During Archive, you use the previously created Analysis Set. This contains all the tags associated with each file’s dataset, which you established in the Discover phase.
An example of an Analysis Set might be:
In this example, the common tag is Finance, but the other tags will follow the files into the object archive.
StorageX automatically creates a tag with the date and time of the archive. You might wish to add others, like:
-Legal or Confidential Status
-Governing Business Unit
-Deletion Date or Policy
These archive tags give clarity on the data archived, and a way to recall that data based on searches that mean something to your business. While other solutions may want to hide your data, StorageX allows you to access it—even if our software is no longer present. This empowers your business to:
-Find the information you want quickly
-Access archived information when and where you want
-Respond confidently to discovery or investigation requests
-Meet business and regulatory information retention requirements
-Make informed decisions about which information to keep while deleting the irrelevant
-Simplify migrations by reducing your storage footprint beforehand
-Target what’s most meaningful to your organization by classifying all archived content
At Data Dynamics, our mission is to help you control your data and manage your future. Custom tagging allows you to always have access to S3 Object custom metadata, without risking vendor lock-in or vendor obsolescence—a vital requirement for file archives that may span many years.
If you would like to learn how StorageX and custom tagging can benefit your file archival strategy, contact us to find out about our resource saving solutions.
If your data management vendor closes, will you be locked out of your data?
Recently, a major cloud solution for secondary storage, backup, archival, and disaster recovery shut down, likely creating anxiety and uncertainty for its clients. This anxiety is probably justified: many file management solutions take ownership of your data—effectively holding you hostage to their offering.
Data lock-out—including physical gateway appliances, filesystem software virtualization, proprietary filesystem namespaces, and individual file stubbing or sharding—poses several problems for customers:
– No direct access to your files.
– Increased cost, risk, and complexity due to vendor lock-in.
– Loss of enterprise-level scalability capabilities.
At Data Dynamics, we believe that you should always maintain direct control of your data, whether you’re with us or not.
Data Dynamics’ StorageX solution offers:
– Direct Access to Your Files, Your Way (SMB, NFS, S3).
– No gateway, file virtualization, proprietary namespace, stubbing, sharding, or agents.
– Scalability to enterprise levels.
– Automated source mapping and provisioning at destination.
– File transform and file to S3 object conversion.
– A solution based on policy engines and industry standards.
P. Mehta, CEO
The past 6 years have seen a ton of start-ups in the infrastructure management space. There has been a flood of new companies created to address the massive challenge of managing the plumbing and underlying compute and storage that facilitates the growth of the Internet of Things, Public and Private Cloud and Artificial Intelligence. The ease of raising funding from Angels, Venture and Private Equity has provided entrepreneurs an easy source of capital. What a great time to be a start-up….or is it? I equate the existing start-up space like a gambler (in this case an investor) debating whether to bet more chips on another hand, double down on the one that exists or walk away from the table.
2017 saw almost 4 billion internet users, Gartner predicted 8 billion things would have connected and AI became more prevalent and started to make an impact in our daily lives. We truly are in an information technology renaissance with the world becoming virtually smaller, new innovations shaping our daily routines and transforming the workforce to meet the needs of a digital economy. Every day we see new technologies make front page news, from online shopping to self-driving cars to cryptocurrency to robotics and AI. Each and every one of these next generation innovations requires core/edge computing capability, tons of storage to keep data that can be mined and utilized and networks through which the information can flow across the globe. The underlying infrastructure is evolving with new innovations from legacy vendors and start-ups to meet the needs of the market.
This exciting era of technology has led to crowd funding, angels, super angels, venture capitalists for different stages of growth and private equity all pumping money into new ideas and companies. From raising thousands to raising billions, the opportunity to stay private and raise as much capital as required has been the mantra utilized by most start-ups, avoiding the scrutiny of the public markets and all that comes with it. Venture and private equity funds have raised tens, if not hundreds of billions of dollars to invest in the next Amazon or Alibaba! Nobody wants to miss the party and everyone wants ‘in’ on the 10-20-50x return that waits upon an exit! This is analogous to sitting at a blackjack table and everyone around is winning so the enthusiasm keeps building, players keep increasing their bets, doubling down as there are no signs of a losing hand. Investors see their other investments or their peers making multi-fold via a unicorn exit and the exuberance continues in stride.
Unfortunately most of the start-ups never think about profitability and focus solely on customer acquisition, top line growth or worse yet, number of users/clicks without any direct correlation to financial metrics. This focus on customer acquisition, top line growth is an essential component of a company’s growth curve but at some stage there has to be a means to profitability. Start ups in today’s world don’t worry about profit as they are more focused on raising the next round of funding and then the next and the next and before long the company has raised tens if not hundreds of millions without earning a dollar. What’s amazing to me is that investors continue to do round after round of investment despite knowing that throwing good money behind bad doesn’t make sense. The challenge is, once they are ‘in’, they have to keep on investing as they need to show their limited partners (LP’s) that the investments they’ve made are continuing to progress forward. Keeping the blackjack analogy in mind, think of the same table that is full of exuberance and a couple of the players lose a hand or two. The gambling mindset is one where the loss is a fluke, it won’t happen to me or it definitely won’t happen two times in a row. If the gambler keeps playing and maybe even increasing the stake, a win will yield rewards.
What you will see in 2018 is that a large majority of the start ups will end up closing or being sold for pennies on a dollar. The reason is not because the technologies are not good, it is because the companies are not profitable, they are not within site of being profitable and the investment dollars for new capital is drying up. There are several reasons for investors not willing or able to invest further. First, the investors have a time horizon that may be coming due. Most funds have a ‘life’ for each fund raised, typically 10 years from inception, so a fund is bound to exit from what it has invested in before the time horizon runs out. Secondly, to go public requires delivering on numbers. The public markets are rewarding companies that meet or exceed forecasts and just as harshly killing those that don’t. Financials do matter and the public market is clear that you must show profitability or a means to it, in order to continue to be supported with a strong share price. There are exceptions to this but even those exceptions face a crazy roller coaster ride to their share price. The other option is a private exit, an M&A to a strategic. The challenge to this is that most large companies are extremely smart and have fairly mature M&A processes, not to mention activist investors that are monitoring every major spend. They are not going to pay multiples if they know the company is going to run out of money and is on its last breath. In addition, they will not want to take on a transaction unless it is strategic and can be additive to their earnings, or has a diminutive short term negative earnings impact. Going back to my gambling analogy…the gambler has a flight to catch and needs to leave the table pretty soon and he/she must decide what to do, should they bet more chips and double down, take a new hand or simply walk away? My feeling is that many in 2018 will either take ‘even money’ or take the loss and walk away!
To my fellow entrepreneurs, we are the dealers of each hand, making the gambler win is in our best interest. Focus on profit and the analogy ‘the house never loses’ will definitely come to fruition. Best of luck in 2018!
When I first heard of ‘The Cloud,’ I thought it was just marketing jargon used by technology companies to create a false new market.
In reality, The Cloud, in its various forms, is re-defining how we access, utilize, and manage software, hardware, and IT services.
With exception of structured data, many companies are unaware of which files are present within many Windows shares or NFS exports. Over time, data have moved from department to department, project to project. It’s been created, unused, and left orphaned by users leaving the company or corporate restructurings.
Managing Orphaned Data with StorageX 8.0
StorageX 8.0 introduces our File Analytics web portal. The web portal displays a dashboard representing the results of data scans and subsequent analysis. Each data scan interrogates a specified share, export, or multiple shares and exports. The scan tags and compiles the file metadata into the file analytics database. Once the metadata is in the database, we can query the tags and metadata to narrow down the scope of data that is of concern.
File metadata can be used to help a company determine the use, ownership, file type, file size, creation date, access date, last modify date, and many other criteria that can be used to make decisions about where the data should be stored. For purposes of this discussion, we will focus on ownership—specifically, unowned or orphaned data.
Want to learn more about StorageX?