6 hidden bottlenecks in cloud data migration

6 hidden bottlenecks in cloud data migration

Moving terabytes or even petabytes of data to the cloud is a daunting task. But it is important to look beyond the number of bytes. You probably know that your applications are going to behave differently when accessed in the cloud, that cost structures will be different (hopefully better), and that it will take time to move all that data.

Because my company, Data Expedition, is in the business of high-performance data transfer, customers come to us when they expect network speed to be a problem. But in the process of helping companies overcome that problem, we have seen many other factors that threaten to derail cloud migrations if left overlooked.

Collecting, organizing, formatting, and validating your data can present much bigger challenges than moving it. Here are some common factors to consider in the planning stages of a cloud migration, so you can avoid time-consuming and expensive problems later.

Cloud migration bottleneck #1: Data storage

The most common mistake we see in cloud migrations is pushing data into cloud storage without considering how that data will be used. The typical thought process is, “I want to put my documents and databases in the cloud and object storage is cheap, so I’ll put my document and database files there.” But files, objects, and databases behave very differently. Putting your bytes into the wrong one can cripple your cloud plans.

Files are organized by a hierarchy of paths, a directory tree. Each file can be quickly accessed, with minimal latency (time to first byte) and high speed (bits per second once the data begins flowing). Individual files can be easily moved, renamed, and changed down to the byte level. You can have many small files, a small number of large files, or any mix of sizes and data types. Traditional applications can access files in the cloud just like they would on premises, without any special cloud awareness.

All of these advantages make file-based storage the most expensive option, but storing files in the cloud has a few other disadvantages. To achieve high performance, most cloud-based file systems (like Amazon EBS) can be accessed by only one cloud-based virtual machine at a time, which means all applications needing that data must run on a single cloud VM. To serve multiple VMs (like Azure Files) requires fronting the storage with a NAS (network attached storage) protocol like SMB, which can severely limit performance. File systems are fast, flexible, and legacy compatible, but they are expensive, useful only to applications running in the cloud, and do not scale well.

Objects are not files. Remember that, because it is easy to forget. Objects live in a flat namespace, like one giant directory. Latency is high, sometimes hundreds or thousands of milliseconds, and throughput is low, often topping out around 150 megabits per second unless clever tricks are used. Much about accessing objects comes down to clever tricks like multipart upload, byte range access, and key name optimization. Objects can be read by many cloud-native and web-based applications at once, from both within and outside the cloud, but traditional applications require performance crippling workarounds. Most interfaces for accessing object storage make objects look like files: key names are filtered by prefix to look like folders, custom metadata is attached to objects to appear like file metadata, and some systems like FUSE cache objects on a VM file system to allow access by traditional applications. But such workarounds are brittle and sap performance. Cloud storage is cheap, scalable, and cloud native, but it is also slow and difficult to access.

Databases have their own complex structure, and they are accessed by query languages such as SQL. Traditional databases may be backed by file storage, but they require a live database process to serve queries. This can be lifted into the cloud by copying the database files and applications onto a VM, or by migrating the data into a cloud-hosted database service. But copying a database file into object storage is only useful as an offline backup. Databases scale well as part of a cloud-hosted service, but it is critical to ensure that the applications and processes that depend on the database are fully compatible and cloud-native. Database storage is highly specialized and application-specific.

Balancing the apparent cost savings of object storage against the functionality of files and databases requires careful consideration of exactly what functionality is required. For example, if you want to store and distribute many thousands of small files, archive them into a ZIP file and store that as a single object instead of trying to store each individual file as a separate object. Incorrect storage choices can lead to complex dependencies that are difficult and expensive to change later.

Cloud migration bottleneck #2: Data preparation

Moving data to the cloud is not as simple as copying bytes into the designated storage type. A lot of preparation needs to happen before anything is copied, and that time requires careful budgeting. Proof-of-concept projects often ignore this step, which can lead to costly overruns later.

Filtering out unnecessary data can save a lot of time and storage costs. For example, a data set may contain backups, earlier versions, or scratch files that do not need to be part of the cloud workflow. Perhaps the most important part of filtering is prioritizing which data needs to be moved first. Data that is being actively used will not tolerate being out of sync by the weeks, months, or years it takes to complete the entire migration process. The key here is to come up with an automated means of selecting which data is to be sent and when, then keep careful records of everything that is and is not done.

Different cloud workflows may require the data to be in a different format or organization than on-premises applications. For example, a legal workflow might require translating thousands of small Word or PDF documents and packing them in ZIP files, a media workflow might involve transcoding and metadata packing, and a bioinformatics workflow might require picking and staging terabytes of genomics data. Such reformatting can be an intensely manual and time-consuming process. It may require a lot of experimentation, a lot of temporary storage, and a lot of exception handling. Sometimes it is tempting to defer any reformatting to the cloud environment, but remember that this does not solve the problem, it just shifts it to an environment where every resource you use has a price.

Part of the storage and formatting questions may involve decisions about compression and archiving. For example, it makes sense to ZIP millions of small text files before sending them to the cloud, but not a handful of multi-gigabyte media files. Archiving and compressing data makes it easier to transfer and store the data, but consider the time and storage space it takes to pack and unpack those archives at either end.

Cloud migration bottleneck #3: Information validation

Integrity checking is the single most important step, and also the easiest to get wrong. Often it is assumed that corruption will occur during the data transport, whether that is by physical media or network transfer, and can be caught by performing checksums before and after. Checksums are a vital part of the process, but it is actually the preparation and importing of the data where you are most likely to suffer loss or corruption.

When data is shifting formats and applications, meaning and functionality can be lost even when the bytes are the same. A simple incompatibility between software versions can render petabytes of “correct” data useless. Coming up with a scalable process to verify that your data is both correct and useable can be a daunting task. At worst, it may devolve into a labor-intensive and imprecise manual process of “it looks okay to me.” But even that is better than no validation at all. The most important thing is to ensure that you will be able to recognize problems before the legacy systems are decommissioned!

Cloud migration bottleneck #4: Transfer marshaling

When lifting a single system to the cloud, it is relatively easy to just copy the prepared data onto physical media or push it across the Internet. But this process can be difficult to scale, especially for physical media. What seems “simple” in a proof-of-concept can balloon to “nightmare” when many and varied systems come into play.

A media device, such as an AWS Snowball, must be connected to each machine. That could mean physically walking the device around one or more data centers, juggling connectors, updating drivers, and installing software. Connecting over the local network saves the physical movement, but software setup can still be challenging and copy speed may drop to well below what could be achieved with a direct Internet upload. Transferring the data directly from each machine over the Internet saves many steps, especially if the data is cloud-ready.

If data preparation involves copying, exporting, reformatting, or archiving, local storage can become a bottleneck. It may be necessary to set up dedicated storage to stage the prepared data. This has the advantage of allowing many systems to perform preparation in parallel, and reduces the contact points for shippable media and data transfer software to just one system.

Cloud migration bottleneck #5: Data transfer

When comparing network transfer to media shipment, it is easy to focus on just the shipping time. For example, an 80 terabyte AWS Snowball device might be sent by next-day courier, achieving an apparent data rate of more than eight gigabits per second. But this ignores the time it takes to acquire the device, configure and load it, prepare it for return, and allow the cloud vendor to copy the data off on the back-end. Customers of ours who do this regularly report that four-week turnaround times (from device ordering to data available in the cloud) are common. That brings the actual data transfer rate of shipping the device down to just 300 megabits per second, much less if the device is not completely filled.

Network transfer speeds likewise depend on a number of factors, foremost being the local uplink. You can’t send data faster than the physical bit rate, though careful data preparation can reduce the amount of data you need to send. Legacy protocols, including those that cloud vendors use by default for object storage, have difficulty with speed and reliability across long-distance Internet paths, which can make achieving that bit rate difficult. I could write many articles about the challenges involved here, but this is one you do not have to solve yourself. Data Expedition is one of a few companies that specialize in ensuring that the path is fully utilized regardless of how far away your data is from its cloud destination. For example, one gigabit Internet connection with acceleration software like CloudDat yields 900 megabits per second, three times the net throughput of an AWS Snowball.

The biggest difference between physical shipment and network transfer is also one of the most commonly overlooked during proof-of-concept. With physical shipment, the first byte you load onto the device must wait until the last byte is loaded before you can ship. This means that if it takes weeks to load the device, then some of your data will be weeks out of date by the time it arrives in the cloud. Even when data sets reach the petabyte levels where physical shipment may be faster over all, the ability to keep priority data current during the migration process may still favor network transfer for key assets. Careful planning during the filtering and prioritization phase of data preparation is essential, and may allow for a hybrid approach.

Getting the data into a cloud provider may not be the end of the data transfer step. If it needs to be replicated to multiple regions or providers, plan carefully how it will get there. Upload over the Internet is free, while AWS, for example, charges up to two cents per gigabyte for interregional data transfer and nine cents per gigabyte for transfer to other cloud vendors. Both methods will face bandwidth limitations that could benefit from transport acceleration such as CloudDat.

Cloud migration bottleneck #6: Cloud scaling

Once data arrives at its destination in the cloud, the migration process is only half finished. Checksums come first: Make sure that the bytes that arrived match those that were sent. This can be trickier than you may realize. File storage uses layers of caches that can hide corruption of data that was just uploaded. Such corruption is rare, but until you’ve cleared all of the caches and re-read the files, you can’t be sure of any checksums. Rebooting the instance or unmounting the storage does a tolerable job of clearing caches.

Validating object storage checksums requires that each object be read out into an instance for calculation. Contrary to popular belief, object “E-tags” are not useful as checksums. Objects uploaded using multipart techniques in particular can only be validated by reading them back out.

1 2 Page 2

Once the transferred data has been verified, it may need further extraction and reformatting and distribution before your cloud-based applications and services can make use of it. This is pretty much the opposite of the preparation and marshaling that occurred on premises.

The final step of scaling out the data is to verify that it is both correct and useful. This is the other side of the information validation planning discussed above and is the only way to know whether you are truly done.

Cloud migration is more about processes than data. Even seemingly simple tasks like file distribution can require complex migration steps to ensure that the resulting cloud infrastructure matches the desired workflow. Much of the hype surrounding cloud, from cost savings to scalability, is justifiable. But careful planning and anticipation of difficulties is essential to determining what tools and methods are necessary to realize those returns.

Seth Noble is the creator of the patented Multipurpose Transaction Protocol (MTP) technology and a top data transport expert. He is founder and president of Data Expedition, with a dual BS-MS degree from Caltech, and a doctorate in computer science from the University of Oklahoma for work developing MTP.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Source: InfoWorld Big Data

IDG Contributor Network: 3 requirements of modern archive for massive unstructured data

IDG Contributor Network: 3 requirements of modern archive for massive unstructured data

Perhaps the least understood component of secondary storage strategy, archive has become a necessity for modern digital enterprises with petabytes of data and billions of files.

So, what exactly is archive, and why is it so important?

Archiving data involves moving data that is no longer frequently accessed off primary systems for long-term retention.

The most apparent benefit of archiving data is to save precious space on expensive primary NAS or to retain data for regulatory compliance, but archiving can reap long-term benefits for your business as well. For example, archiving the results of scientific experiments that would be costly to replicate can be extremely valuable later for future studies.

In addition, a strong archive tier can cost-effectively protect and enable usage of the huge data sets needed for enhanced analytics, machine learning, and artificial intelligence workflows.

Legacy archive fails for massive unstructured data

However, legacy archive infrastructure wasn’t built to meet the requirements of massive unstructured data, resulting in three key failures of legacy archive solutions.

First, the scale of data has changed greatly, from terabytes to petabytes and quickly growing. Legacy archive can’t move high volumes of data quickly enough and can’t scale with today’s exploding data sets.

Second, the way organizations use data has also changed. It’s no longer adequate to simply throw data into a vault and keep it safe; organizations need to use their archived data as digital assets become integral to business. As more organizations employ cloud computing and machine learning/AI applications using their huge repositories of data, legacy archive falls short in enabling usage of archived data.

Third, traditional data management must become increasingly automated and delivered as-a-Service to relieve management overhead on enterprise IT and reduce total cost of ownership as data explodes beyond petabytes.

Modern archive must overcome these failures of legacy solutions and meet the following requirements.

1. Ingest petabytes of data

Because today’s digital enterprises are generating and using petabytes of data and billions of files, a modern archive solution must have the capacity to ingest enormous amounts of data.

Legacy software uses single-threaded protocols to move data, which was necessary to write to tape and worked for terabyte-scale data but fail for today’s petabyte-scale data.

Modern archive needs highly parallel and latency-aware data movement to efficiently move data from where it lives to where it’s needed, without impacting performance. The ability to automatically surface archive-ready data and set policies to snapshot, move, verify, and re-export data can reduce administrator effort and streamline data management.

In addition, modern archive must be able to scale with exponentially growing data. Unlike legacy archive, which necessitates silos as data grows large, a scale-out archive tier keeps data within the same system for simpler management.

2. API-driven, cloud-native architecture

An API-driven archive solution can plug into customer applications, ensuring that the data can be used. Legacy software wasn’t designed with this kind of automation, making it difficult to use the data after it’s been archived.

Modern archive that’s cloud-native can much more easily plug into customer applications and enable usage. My company’s product, Igneous Hybrid Storage Cloud, is built with event-driven computing, applying the cloud-native concept of having interoperability at every step. Event-driven computing models tie compute to actions on data and are functionally API-driven, adding agility to the software. Building in compatibility with any application is simply a matter of exposing existing APIs to customer-facing applications.

This ensures that data can get used by customer applications. This capability is especially useful in the growing fields of machine learning and AI, where massive repositories of data are needed for compute. The more data, the better—which not only requires a scale-out archive tier, but one that enables that data to be computed.

An example of a machine learning/AI workflow used by Igneous customers involves using Igneous Hybrid Storage Cloud as the archive tier for petabytes of unstructured file data and moving smaller subsets of data to a “hot edge” primary tier from which the data can be processed and computed.

3. As-a-Service delivery

Many of the digital enterprises and organizations with enormous amounts of unstructured file data don’t necessarily have the IT resources or budget to match, let alone the capacity to keep pace with the growing IT requirements of their exponentially growing data.

To keep management overhead reasonable and cost-effective, many organizations are turning to as-a-service solutions. With as-a-service platforms, software is remotely monitored, updated, and troubleshooted, so that organizations can focus on their business, not IT.

Modern archive solutions that are delivered as-a-service can help organizations save on total cost of ownership (TCO) when taking into account the amount of time it frees up for IT administrators to focus on other tasks—like planning long-term data management and archiving strategy.

This article is published as part of the IDG Contributor Network. Want to Join?

Source: InfoWorld Big Data

IDG Contributor Network: Why data democratization is crucial to your business

IDG Contributor Network: Why data democratization is crucial to your business

In the Information Age, the power of data has been mostly kept in the hands of a few data analysts with the skills and understanding necessary to properly organize, crunch, and interpret the data for their organization. This approach was born out of necessity—most employees were not trained how to effectively use the growing flood of data.

But things have changed with the emergence of technologies capable of making data shareable and interpretable for nondata analysts. Data democratization allows data to pass safely from the hands of a few analysts into the hands of the masses in a company.

Data democratization is a game-changer

Data democratization will catapult companies to new heights of performance, if done right. Indeed, the utopian vision of data democratization is hard to refuse.

“Data democratization means that everybody has access to data and there are no gatekeepers that create a bottleneck at the gateway to the data. The goal is to have anybody use data at any time to make decisions with no barriers to access or understanding,” says Bernard Marr, bestselling author of Big Data in Practice.

The ability to instantly access and understand data will translate into faster decision-making, and that will translate into more agile teams. Those teams will have a competitive advantage over slower data-stingy businesses.

But Marr believes it’s about more than just being able to take instant action. “When you allow data access to any tier of your company, it empowers individuals at all levels of ownership and responsibility to use the data in their decision making,” he says. If the current situation encourages team members to go around data to get things done on time, data democratization creates team members that are more data-driven.

When things happen in a good or bad sense, and the right people are proactively informed, those people can dig into and understand those anomalies and be proactively informed.

Ultimately, for marketers striving to create the ultimate customer experience, data democratization is a must. The question on their minds should not be if data democratization is coming, but how they can create it in their organization quickly and efficiently. as quickly as possible.

Laying the foundation for data democratization

Businesses that wish to benefit from data democratization will have to create it intentionally. This means an organizational investment must be made in terms of budget, software, and training. 

In the world of data democratization, breaking down information silos is the first step toward user empowerment. This cannot be done without customizable analytics tools capable of desegregating and connecting previously siloed data making it manageable from a single place.

Ideally, the tools will filter the data and visualizations shared with each individual—whether they are an executive, a director, or a designer—according to each person’s role. Marketing managers, for instance, will need data that allows them to analyze customer segments leading up to a new campaign. CMOs, on the other hand, will need data that allows them to analyze marketing ROI as they build next year’s budgets.

Those tools must help employees visualize their data. The ability to access data points in a visual way that consumers of the data can be comfortable with is important. These visualizations must align with the organization’s KPIs: metrics, goals, targets, and objectives that have been aligned from the top-down that enable data-driven decisions.

With the right tools in place, team training becomes the next essential step. Because data democratization depends on the concept of self-service analytics, every team member must be trained up to a minimum level of comfort with the tools, concepts, and processes involved to participate.

Last, you cannot have a democracy without checks and balances. The final step to sharing data across your data governance. Mismanagement or misinterpretation of data is a real concern. Therefore, a center of excellence is recommended to keep the use of data on the straight and narrow. This center of excellence should have a goal to drive adoption of data usage which is made possible by owning data accuracy, curation, sharing, and training. These teams are often most successful when they have budget, a cross-section of skillsets, and executive approval.

When executed this way, sharing data can allow every player on your team to realize the value of that data. Fortunately, we don’t have to wait for the future to see what marketing teams can accomplish when this powerful resource is available to them.

The future of data democratization is now

For a sterling example of data democratization in action, you need look no further than the Royal Bank of Scotland, a client of my company Adobe Systems. The bank’s digital marketing leaders invited representatives from multiple parts of its business—including its call center, human resources, and legal department—to help optimize parts of the customer experience. Working off the same data, these nonmarketers could bring fresh insights to the marketing process and revolutionize the bank’s customer experience.

 “Raising visibility from our digital marketing platform and data-driven strategies was vital to the shift,” says the bank’s head of analytics, Giles Richardson. “We had to have concrete, measurable insights and ways for our cross-functional teams to act on them to propel RBS into its next chapter.”

For the Royal Bank of Scotland and other businesses interested in making the move toward data democratization, the journey is not measured in reaching a single destination. It has to be viewed as an ongoing process.

“Expect that data democratization is an evolution where each individual small win, when nontechnical users gain insight because of accessing the data, adds up to ultimately prove the merits of data democratization,” says Marr.

Data democratization is the future of managing big data and realizing its value. Businesses armed with the right tools and understanding are succeeding today because they are arming all their employees with the knowledge necessary to make smart decisions and provide better customer experiences.

This article is published as part of the IDG Contributor Network. Want to Join?

Source: InfoWorld Big Data

Google Cloud tutorial: Get started with Google Cloud

Google Cloud tutorial: Get started with Google Cloud

When people think of the word Google, they think about search and the immense computational infrastructure that converts your words into a list of websites that probably have exactly what you’re looking to find. It took Google years to hire the engineers, design the custom computers, and create the huge collection of hardware that answers web queries. Now it can be yours with just a few keystrokes and clicks. 

Google rents out much of that expertise and infrastructure to other web companies. If you want to build a clever website or service, Google is ready to charge you to run it on its vast collection of machines. All you need to do is start filling out some web forms and soon you’ll have a big collection of servers ready to scale and handle your chores.

For a quick guide to getting started, and to navigating the many choices along the way, just follow me.  

Step 1: Set up your account

This is the easy part. If you’ve got a Google account, you’re ready to go. You can log into cloud.google.com and head right to your Console and Dashboard. There won’t be much to see here when you begin, but soon you’ll start to see details about what your vast computing empire is doing. That is, the load on any server instances you’ve created, the data flowing through the network, and the usage of APIs. You can assure yourself that everything is running smoothly with a glance.

Source: InfoWorld Big Data

IDG Contributor Network: Data governance 2.0

IDG Contributor Network: Data governance 2.0

At Bristol-Myers Squibb, I have the privilege of working for a company singularly focused on our mission to discover, develop, and deliver innovative medicines that help patients prevail over serious diseases. Accurate, high quality, and trustworthy data is central to our work in R&D, manufacturing, sales and marketing, and corporate functions. In IT, we strive to make sure the right data is available to the right audience at the right time with the right quality and controls to advance our company’s mission. With the digital data and analytic transformation that is pervasive across the health care industry, as an IT and a data professional, there has never been a more exciting time than now to transform how we manage, protect, and consume data to help patients prevail over serious diseases.      

The digital data and analytic transformation is not unique to health care. Everywhere you turn, in industry after industry, the focus is on digital and analytic transformation with companies in a race to become the digital enterprise powered by machine learning and AI. This transformation thirsts for trusted good quality data. Yet the one common theme, in my conversations with IT, analytics, and business leaders across industries, is the persistent dissatisfaction on the state of data in the modern enterprise. There is no disagreement on the aspirations of treating data as an asset and a fuel for the modern enterprise. Yet almost all enterprises suffer from the weight of legacy data infrastructure, dysfunctional data stewardship and poor rate of return on organizational investments in data management. 

So what is my solution?

I believe it is data governance 2.0, a pragmatic, relentless, self-sustaining data governance aided by machine-assisted data stewardship. I define data governance 2.0 as the combination of people, process, and technology that precisely articulates the data domains and assets that are critical to the enterprise (high risk and/or high value), defines the baseline of where the enterprise is today in managing the data (data ownership, data quality, data readiness), defines the target state of where the organization needs to be, orchestrates pragmatic ownership and asset management processes that efficiently fits in the organizational structure and culture, relentlessly monitors utilization and value, and course corrects without dogma when needed. This data governance 2.0 should use algorithmic automations, machine learning, and AI to reduce the organizational burden and bureaucracy so that human involvement in data governance shifts from mundane data stewardship tasks to qualitative action directed by the “machine.”

I recognize data governance is not new and perhaps it is the most overused phrase in the annals of data management. But there is no getting around the fact that unless the modern enterprise establishes a bedrock of good data governance, the edifices of digital and analytics transformation will erode and dissipate like the statue of Ozymandias looming over the rubbles in the desert. The time has come to leverage the analytic and AI advancements of today to reboot data governance and elevate it to the same level of importance as good financial governance.

So what are the key considerations for establishing this data governance 2.0 foundation? I plan to explore this through conversations with CDOs, industry leaders, peers, and practitioners. 

As the first in that series I had a chance to discuss the topic of data governance with Jason Fishbain, the chief data officer (CDO) at the University of Wisconsin at Madison. Fishbain is a passionate proponent of pragmatic data governance striving to build a strong data foundation at the University of Wisconsin at Madison. Synthesized below are the key takeaways from that discussion.

Data governance tied to strategic business objectives

Successful data governance efforts must be linked to business objectives. In his efforts at University of Wisconsin-Madison, Fishbain tied the need for good data governance closely to the educational analytic needs required to achieve the university’s strategic business objectives ranging from student recruitment to graduation goals. This enabled him to create rapid buy-in among the academic leaders on the need to own and manage the data effectively at the source. According to Fishbain, a chief data officer must be a strategic leader and a data evangelist deftly navigating the organizational structure to consistently and relentlessly advance the data governance goals. He focuses on informal and formal outreach to academic leaders to understand their priorities and see how he can enable them with the right quality data.

It is all about the outcome, be flexible on the governance model

Organizations often get bogged down in debates on data ownership and data governance models. In our discussion on how to define the right data governance models, Fishbain pointed out that in his experience, effective organizations eschew fidelity to organizational models in favor of a pragmatic selection of a model or models that best enable the outcome. Effective CDOs show the willingness to lead or to serve, be a COE or be a data stewardship unit all in the cause of enterprise data outcomes. If a certain department has the necessary skills, resources and the willingness to manage the data, then he believes the CDO should support them with standards, best practices and tools. If other areas lack this expertise, then the CDO should offer data management as a service.   

Data on the state of data is key

 According to Fishbain, a CDO must define and publish a few but relevant metrics on the “state of the data” to the organizational leaders tying them as much as possible to the attainment of strategic business objectives. As Fishbain puts it, if the CDO does not have data on the state of the data, then how can you shine a spotlight and mobilize organizational action? As in any good business metrics and KPI, these “state of the data” metrics must be limited, focused, and action-oriented.

Business process changes and technology shifts are opportunities for a data strategy refresh

Fishbain astutely observed that new capability deployments—be at a new CRM or ERP implementation or a redesign of a business process, are perfect opportunities for a CDO to advocate for a relook at the associated data strategy. Is the data ownership clear? What are the data quality objectives? What are the data consumption aspirations? These are all questions to ask at the launch of a new business process or technology initiative to call attention to the fact that in today’s enterprise, a lack of focus on data strategy is a surefire way to undercut the expected ROI of most capability investments.

In conclusion, there are pragmatic approaches to data governance which will allow the practitioners of its art and science—the CDOs and data leaders to steer the enterprise towards a future where effective data governance is the default than the exception.

This article is published as part of the IDG Contributor Network. Want to Join?

Source: InfoWorld Big Data