12 New Year's resolutions for your data

Your company was once at the forefront of the computing revolution. You deployed the latest mainframes, then minis, then microcomputers. You joined the PC revolution and bought Sparcs during the dot-com era. You bought DB2 to replace some of what you were doing with IMS. Maybe you bought Oracle or SQL Server later. You deployed MPP and started looking at cubes.

Then you jumped on the next big wave and put a lot of your data on the intranet and internet. You deployed VMware to prevent server sprawl, only to discover VM sprawl. When Microsoft came a-knocking, you deployed SharePoint. You even moved from Siebel to Salesforce to hop into SaaS.

Now you have data coming out of your ears and spilling all over the place. Your mainframe is a delicate flower on which nothing can be installed without a six-month study. The rest of your data is all on the SAN. That works out because you have a “great relationship with the EMC/Dell federation” (where you basically pay them whatever they want and they give you the “EMC treatment”). However, the SAN does you no good for finding actual information due to the effects of VM and application sprawl on your data organization.

Now the millennials want to deploy MongoDB because it’s “webscale.” The Hadoop vendor is knocking and wants to build a data lake, which is supposed to magically produce insights by using cheaper storage … and produce yet another storage technology to worry about.

Time to stop the madness! This is the year you wrangle your data and make it work for your organization instead of your organization working for its data. How do you get your data straight? Start with these 12 New Year’s resolutions:

1. Catalog where the data is

You need to know what you have. Whether or not this takes the form of a complicated data mapping and management system isn’t as important as the actual concerted effort to find it.

2. Map data use

Your data is in use by existing applications, and there’s an overall flow throughout the organization. Whether you track this “data lineage” and “data dependency” via software or sweat, you need to know why you’re keeping this stuff, as well as who’s using it and why. What is the data? What is the source system for each piece of data? What is it used for?

3. Understand how data is created

Remember the solid fuel booster at NASA that had a 1-in-300-year failure rate? Remember that the number was pretty much pulled out of the air? Most of the data was on paper and passed around. How is your data created? How are the numbers derived? This is probably an ongoing effort, as there are new sources of data every day, but it’s worthwhile to prevent your organization’s own avoidable and repeated disasters.

4. Understand how data flows through the organization

Knowing how data is used is critical, but you also need to understand how it got there and any transformation it underwent. You need a map of your organization’s data circulatory system, the big form of the good old data flow diagram. This will not only let you find “black holes” (where inputs are used but no results happen) and “miracles” (where a series of insufficient inputs can’t possibly produce the expected result), but also where redundant flows and transformations exist. Many organizations have lots of copies of the same stuff produced by very similar processes that differ by technology stack alone. It’s just data—we don’t have to pledge allegiance to the latest platform in our ETL process.

5. Automate manual data processing

At various times I’ve tried to sneak a post past my editor entitled something like “Ban Microsoft Excel!” (I think may have worked that into a post or two.) I’m being partly tongue in cheek, but people who routinely monkey with the numbers manually should be replaced by absolutely no one.

I recently watched the movie “Hidden Figures,” and among other details, it depicted the quick pace at which people were replaced by machines (the smarter folk learned how to operate the machines). In truth, we stagnated somewhere along the way, and a large number of people push bits around in email and Excel. You don’t have to get rid of those people, but the latency of fingers on the keyboard is awful. If you map your data, from where it originates and where it flows, you should be able to identify these manual data-munging processes.

6. Find a business process you can automate with machine learning

Machine learning is not magic. You are not going to buy software, turn it loose on your network, and get insights out of the box. However, right now someone in your organization is finding patterns by matching sets of data together and doing an “analysis” that can be done by the next wave of computing. Understand the basics (patterns and grouping, aka clustering, are the easiest examples), and try and find at least one place it can be introduced to advantage. It isn’t the data revolution, but it’s a good way to start looking forward again.

7. Make everything searchable using natural language and voice

My post-millennial son and my Gen-X girlfriend share one major trait: They click the microphone button more often than I do. I use voice on my phone in the car, but almost never otherwise. I learned to type at a young age, and I compose pretty accurate search queries because I practically grew up with computers.

But the future is not communicating with computers on their terms. Training everyone to do that has produced mixed results, so we are probably at the apex of computer literacy and are on our way down. Making your data accessible by natural language search isn’t simply nice to have—it’s essential for the future. It’s also time to start looking into voice if you aren’t there yet. (Disclaimer: I work for Lucidworks, a search technology company with products in this area.)

8. Make everything web-accessible

Big, fat desktop software is generally hated. The maintenance is painful, and sooner or later you need to do something somewhere else on some other machine. Get out of the desktop business! If it isn’t web-based, you don’t want it. Ironically, this is sort of a PC counterrevolution. We went from mainframes and dumb terminals to installing everything everywhere to web browsers and web servers—but the latest trip is worth taking.

9. Make everything accessible via mobile

By any stretch of the numbers, desktop computing is dying. I mean, we still have laptops, but the time we spend on them versus other computing devices is in decline. You can look at sales or searches or whatever numbers you like, but they all point in this direction. Originally you developed an “everything mobile” initiative because the executive got an iPad and wanted to use it on an airplane, and everything looked like crap in the iPad edition of Safari. Then it was the salespeople. Now it’s everyone. If it can’t happen on mobile, then it probably isn’t happening as often as or when/where it should.

10. Make it highly available and distributable

I’m not a big fan of the Oracle theory of computing (stuff everything into your RDBMS and it will be fine, now cut the check, you sheep). Sooner or later outages are going to eat the organization’s confidence. New York City got hit by a hurricane, remember?

It’s time to make your data architecture resilient. That isn’t an old client-server model where you buy Golden Gate or the latest Oracle replication product from a company it recently acquired, then hope for the best. That millennial may be right—you may need a fancy, newfangled database designed for the cloud and distributed computing era. Your reason may not even be to scale but that you want to stay up, handle change better, and have a more affordable offsite replica. The technology has matured. It’s time to take a look.

11. Consolidate

Ultimately the tree of systems and data at many organizations is too complicated and unwieldy to be efficient, accurate, and verifiable. It’s probably time to start chopping at the mistakes of yesteryear. This is often a hard business case to make, but the numbers are there, whether they show how often it goes down, how many people are spent maintaining it, or that you can’t recruit talent to maintain it. Sometimes if it isn’t broke, you still knock it down because it’s eating you alive.

12. Make it visual

People like charts—lots of charts and pretty lines.

This can be the year you drive your organization forward and prove that IT is more than a cost center. It can be the year you build a new legacy. What else are you hoping to get done with data this year? Hit me up on Twitter.

Source: InfoWorld Big Data