12 New Year's resolutions for your data

12 New Year's resolutions for your data

Your company was once at the forefront of the computing revolution. You deployed the latest mainframes, then minis, then microcomputers. You joined the PC revolution and bought Sparcs during the dot-com era. You bought DB2 to replace some of what you were doing with IMS. Maybe you bought Oracle or SQL Server later. You deployed MPP and started looking at cubes.

Then you jumped on the next big wave and put a lot of your data on the intranet and internet. You deployed VMware to prevent server sprawl, only to discover VM sprawl. When Microsoft came a-knocking, you deployed SharePoint. You even moved from Siebel to Salesforce to hop into SaaS.

Now you have data coming out of your ears and spilling all over the place. Your mainframe is a delicate flower on which nothing can be installed without a six-month study. The rest of your data is all on the SAN. That works out because you have a “great relationship with the EMC/Dell federation” (where you basically pay them whatever they want and they give you the “EMC treatment”). However, the SAN does you no good for finding actual information due to the effects of VM and application sprawl on your data organization.

Now the millennials want to deploy MongoDB because it’s “webscale.” The Hadoop vendor is knocking and wants to build a data lake, which is supposed to magically produce insights by using cheaper storage … and produce yet another storage technology to worry about.

Time to stop the madness! This is the year you wrangle your data and make it work for your organization instead of your organization working for its data. How do you get your data straight? Start with these 12 New Year’s resolutions:

1. Catalog where the data is

You need to know what you have. Whether or not this takes the form of a complicated data mapping and management system isn’t as important as the actual concerted effort to find it.

2. Map data use

Your data is in use by existing applications, and there’s an overall flow throughout the organization. Whether you track this “data lineage” and “data dependency” via software or sweat, you need to know why you’re keeping this stuff, as well as who’s using it and why. What is the data? What is the source system for each piece of data? What is it used for?

3. Understand how data is created

Remember the solid fuel booster at NASA that had a 1-in-300-year failure rate? Remember that the number was pretty much pulled out of the air? Most of the data was on paper and passed around. How is your data created? How are the numbers derived? This is probably an ongoing effort, as there are new sources of data every day, but it’s worthwhile to prevent your organization’s own avoidable and repeated disasters.

4. Understand how data flows through the organization

Knowing how data is used is critical, but you also need to understand how it got there and any transformation it underwent. You need a map of your organization’s data circulatory system, the big form of the good old data flow diagram. This will not only let you find “black holes” (where inputs are used but no results happen) and “miracles” (where a series of insufficient inputs can’t possibly produce the expected result), but also where redundant flows and transformations exist. Many organizations have lots of copies of the same stuff produced by very similar processes that differ by technology stack alone. It’s just data—we don’t have to pledge allegiance to the latest platform in our ETL process.

5. Automate manual data processing

At various times I’ve tried to sneak a post past my editor entitled something like “Ban Microsoft Excel!” (I think may have worked that into a post or two.) I’m being partly tongue in cheek, but people who routinely monkey with the numbers manually should be replaced by absolutely no one.

I recently watched the movie “Hidden Figures,” and among other details, it depicted the quick pace at which people were replaced by machines (the smarter folk learned how to operate the machines). In truth, we stagnated somewhere along the way, and a large number of people push bits around in email and Excel. You don’t have to get rid of those people, but the latency of fingers on the keyboard is awful. If you map your data, from where it originates and where it flows, you should be able to identify these manual data-munging processes.

6. Find a business process you can automate with machine learning

Machine learning is not magic. You are not going to buy software, turn it loose on your network, and get insights out of the box. However, right now someone in your organization is finding patterns by matching sets of data together and doing an “analysis” that can be done by the next wave of computing. Understand the basics (patterns and grouping, aka clustering, are the easiest examples), and try and find at least one place it can be introduced to advantage. It isn’t the data revolution, but it’s a good way to start looking forward again.

7. Make everything searchable using natural language and voice

My post-millennial son and my Gen-X girlfriend share one major trait: They click the microphone button more often than I do. I use voice on my phone in the car, but almost never otherwise. I learned to type at a young age, and I compose pretty accurate search queries because I practically grew up with computers.

But the future is not communicating with computers on their terms. Training everyone to do that has produced mixed results, so we are probably at the apex of computer literacy and are on our way down. Making your data accessible by natural language search isn’t simply nice to have—it’s essential for the future. It’s also time to start looking into voice if you aren’t there yet. (Disclaimer: I work for Lucidworks, a search technology company with products in this area.)

8. Make everything web-accessible

Big, fat desktop software is generally hated. The maintenance is painful, and sooner or later you need to do something somewhere else on some other machine. Get out of the desktop business! If it isn’t web-based, you don’t want it. Ironically, this is sort of a PC counterrevolution. We went from mainframes and dumb terminals to installing everything everywhere to web browsers and web servers—but the latest trip is worth taking.

9. Make everything accessible via mobile

By any stretch of the numbers, desktop computing is dying. I mean, we still have laptops, but the time we spend on them versus other computing devices is in decline. You can look at sales or searches or whatever numbers you like, but they all point in this direction. Originally you developed an “everything mobile” initiative because the executive got an iPad and wanted to use it on an airplane, and everything looked like crap in the iPad edition of Safari. Then it was the salespeople. Now it’s everyone. If it can’t happen on mobile, then it probably isn’t happening as often as or when/where it should.

10. Make it highly available and distributable

I’m not a big fan of the Oracle theory of computing (stuff everything into your RDBMS and it will be fine, now cut the check, you sheep). Sooner or later outages are going to eat the organization’s confidence. New York City got hit by a hurricane, remember?

It’s time to make your data architecture resilient. That isn’t an old client-server model where you buy Golden Gate or the latest Oracle replication product from a company it recently acquired, then hope for the best. That millennial may be right—you may need a fancy, newfangled database designed for the cloud and distributed computing era. Your reason may not even be to scale but that you want to stay up, handle change better, and have a more affordable offsite replica. The technology has matured. It’s time to take a look.

11. Consolidate

Ultimately the tree of systems and data at many organizations is too complicated and unwieldy to be efficient, accurate, and verifiable. It’s probably time to start chopping at the mistakes of yesteryear. This is often a hard business case to make, but the numbers are there, whether they show how often it goes down, how many people are spent maintaining it, or that you can’t recruit talent to maintain it. Sometimes if it isn’t broke, you still knock it down because it’s eating you alive.

12. Make it visual

People like charts—lots of charts and pretty lines.

This can be the year you drive your organization forward and prove that IT is more than a cost center. It can be the year you build a new legacy. What else are you hoping to get done with data this year? Hit me up on Twitter.

Source: InfoWorld Big Data

Apache Beam unifies batch and streaming for big data

Apache Beam unifies batch and streaming for big data

Apache Beam, a unified programming model for both batch and streaming data, has graduated from the Apache Incubator to become a top-level Apache project.

Aside from becoming another full-fledged widget in the ever-expanding Apache tool belt of big-data processing software, Beam addresses ease of use and dev-friendly abstraction, rather than just offering upraw speed or a wider array of included processing algorithms.

Beam us up!

Beam provides a single programming model for creating batch and stream processing jobs (the name is a hybrid of “batch” and “stream”), and it offers a layer of abstraction for dispatching to various engines used to run said jobs. The project originated at Google, where it’s currently a service called GCD (Google Cloud Dataflow). Beam uses the same API as GCD, and it can use GCD as an execution engine, along with Apache Spark, Apache Flink (a stream processing engine with a highly memory-efficient design), and now Apache Apex (another stream engine for working closely with Hadoop deployments).

The Beam model involves five components: the pipeline (the pathway for data through the program); the “PCollections,” or data streams themselves; the transforms, for processing data; the sources and sinks, where data’s fetched and eventually sent; and the “runners,” or components that allow the whole thing to be executed on a given engine.

Apache says it separated concerns in this fashion so that Beam can “easily and intuitively express data processing pipelines for everything from simple batch-based data ingestion to complex event-time-based stream processing.” This is in line with how tools like Apache Spark have been reworked to support stream and batch processing within the same product and with similar programming models. In theory, it’s one less concept for a prospective developer to wrap her head around, but that presumes Beam is used entirely in lieu of Spark or other frameworks, when it’s more likely that it’ll be used — at least at first — to augment them.

Hands off

One possible drawback to Beam’s approach is that while the layers of abstraction in the product make operations easier, they also put the developer at a distance from the underlying layers. A good case in point is Beam’s current level of integration with Apache Spark; the Spark runner doesn’t yet use Spark’s more recent DataFrames system, and thus may not take advantage of the optimizations those can provide. But this isn’t a conceptual flaw, it’s an issue with the implementation, which can be addressed in time.

The big payoff of using Beam, as noted by Ian Pointer in his discussion of Beam in early 2016, is that it makes migrations between processing systems less of a headache. Likewise, Apache says that Beam “cleanly [separates] the user’s processing logic from details of the underlying engine.”

Separation of concern and ease of migration will be good to have if the ongoing rivalries and competitions between the various big data processing engines continues. Granted, Apache Spark has emerged as one of the undisputed champs of the field, and become a de facto standard choice. But there’s always room for improvement, or an entirely new streaming or processing paradigm. Beam is less about offering a specific alternative than about providing developers and data-wranglers with more breadth of choice between them.

Source: InfoWorld Big Data

Beeks Financial Cloud Joins Equinix Cloud Exchange

Beeks Financial Cloud Joins Equinix Cloud Exchange

Equinix, Inc. has announced global financial cloud infrastructure provider, Beeks Financial Cloud, has deployed on Equinix’s Cloud Exchange as it continues to expand its business globally.

Beeks Financial Cloud leverages Cloud Exchange and Platform Equinix™ to connect its customers to global cloud services and networks via a secure, private and low-latency interconnection model. By joining the Equinix Cloud Exchange, Beeks Financial Cloud gains access to instantly connect to multiple cloud service providers (CSPs) in 21 markets, build a more secure application environment and reduce the total cost of private network connectivity to CSPs for its customers.

“Beeks Financial Cloud has continued to grow rapidly on Equinix’s interconnection platform, with Hong Kong being our eighth addition. Data centers underpin our business and we are confident that Equinix’s Cloud Exchange will enable the speed, resilience and reduced latency our customers have come to expect from our company. Equinix’s global footprint of interconnected data centers has allowed our business to really thrive,” said Gordon McArthur, CEO, Beeks Financial Cloud.

Today, banks, brokers, forex companies, and professional traders are increasingly relying on high-speed, secure and low-latency connections for more efficient business transactions, as demand for data centers and colocation services in the cloud, enterprise and financial services sector continues to grow. According to a July 2016 report by Gartner – Colocation-Based Interconnection Will Serve as the ‘Glue’ for Advanced Digital Business Applications – digital business is “enabled and enhanced through high-speed, secure, low-latency communication among enterprise assets, cloud resources, and an ecosystem of service providers and peers. Architects and IT leaders must consider carrier-neutral data center interconnection as a digital business enabler.”

Beeks Financial Cloud, a UK-based company, first deployed in an Equinix London data center four years ago on one server rack, now has approximately 80 interconnections within Equinix across eight data centers situated in financial business hubs around the world. These direct connections provide increased performance and security between Beeks and its customers and partners across its digital supply chain. Beeks was the first provider in the world to use cross connects to ensure a retail trader customer had a direct connection to their broker.

Beeks’ new deployment in Equinix’s Cloud Exchange provides the necessary digital infrastructure and access to a mature financial services business ecosystem to connect with major financial services providers in key markets around the globe via the cloud. Equinix’s global data centers are home to 1,000+ financial services companies and the world’s largest multi-asset class electronic trading ecosystem— interconnected execution venues and trading platforms, market data vendors, service providers, and buy-side and sell-side firms.

Equinix’s Cloud Exchange offers software-defined direct connections to multiple CSPs including Amazon Web Services (AWS), Google Cloud Platform, Microsoft Azure ExpressRoute and Office 365, IBM Softlayer, Oracle Cloud, and others. This has allowed Beeks to scale up rapidly while securely connecting to multiple cloud providers.

Beeks Financial Cloud has continued to expand its business on Equinix’s global interconnection platform of 146 International Business Exchanges™ (IBX®) in 40 markets across the globe. Beeks is currently deployed in Equinix’s International Business Exchanges™ (IBX®) in London, New York, Frankfurt, Tokyo, Chicago, and most recently, Hong Kong.

The move to Equinix’s Cloud Exchange is expected to help save approximately £1M over the next three years, while enabling Beeks Financial Cloud to meet the needs of its global customer base who thrive and grow through forex trading.

London is a key player in the global digital economy, with the fifth largest GDP by metropolitan area in the world. Equinix’s flagship London data center based in Slough (LD6) is one of the fastest-growing in the UK and has been established as a hub for businesses to interconnect in a secure colocation environment.

 

Source: CloudStrategyMag

Cloud Technology Partners Achieves AWS IoT Competency

Cloud Technology Partners Achieves AWS IoT Competency

Cloud Technology Partners (CTP) has announced that it has achieved the AWS IoT Competency designation from Amazon Web Services (AWS). CTP is one of a select number of AWS Consulting Partners to earn this competency, highlighting the value of its offerings that help clients build IoT solutions for a variety of use cases such as intelligent factories, smart cities, autonomous vehicles, precision agriculture, and personalized health care.

Achieving the AWS IoT Competency differentiates CTP as an AWS Partner Network (APN) member that has proven success delivering IoT solutions seamlessly on AWS. To receive the designation, APN Partners must demonstrate expertise in the AWS platform and undergo an assessment of the security, performance, and reliability of their solutions.

“CTP is proud to have been named a charter launch partner for the AWS IoT Competency,” said Scott Udell, vice president of IoT Solutions at Cloud Technology Partners. “Our team is dedicated to helping clients leverage the power of IoT and the agility of the AWS platform to achieve their business goals.”

AWS is enabling scalable, flexible, and cost-effective solutions from startups to global enterprises. To support the integration and deployment of these solutions, AWS established the IoT Partner Competency Program to help customers identify Consulting and Technology APN Partners with broad industry experience.

CTP recently completed an IoT engagement for RailPod, the leading manufacturer of railroad maintenance drones. CTP helped RailPod build a highly scalable IoT solution capable of ingesting massive quantities of real-time and batched data to ensure safer railroads.

“Cloud Technology Partners helped us build an enterprise-class IoT solution on AWS that enables RailPod to be a global leader in infrastructure information production to ensure safer railroads across the global railroad market,” said Brendan English, Founder and CEO of RailPod.

CTP’s IoT Practice and Digital Innovation teams are helping clients leverage the power of the cloud with the real-time knowledge learned from analyzing sensor data to help clients save millions in preventative maintenance on locomotives and railways, improve crop yields while saving money with intelligent irrigation, connect doctors and patients with medical devices and avoid accidents with autonomous vehicles.

CTP is a Premier AWS Consulting Partner and has achieved a number of competencies with AWS. In addition to the AWS IoT Competency, CTP holds the AWS Migration Competency, AWS DevOps Competency and is a member of AWS Next-Generation Managed Services Program.

Source: CloudStrategyMag

Microsoft’s R tools bring data science to the masses

Microsoft’s R tools bring data science to the masses

One of Microsoft’s more interesting recent acquisitions was Revolution Analytics, a company that built tools for working with big data problems using the open source statistical programming language R. Mixing an open source model with commercial tools, Revolution Analytics offered a range of tools supporting academic and personal use, alongside software that took advantage of massive amounts of data–including Hadoop. Under Microsoft’s stewardship, the now-renamed R Server has become a bridge between on-premises and cloud data.

Two years on, Microsoft has announced a set of major updates to its R tools. The R programming language has become an important part of its data strategy, with support in Azure and SQL Server—and, more important, in its Azure Machine Learning service, where it can be used to preprocess data before delivering it to a machine learning pipeline. It’s also one of Microsoft’s key cross-platform server products, with versions for both Red Hat Linux and Suse Linux.

R is everywhere in Microsoft’s ecosystem

Outside of Microsoft, the open source R has become a key tool for data science, with a lot of support in academic environments. (It currently ranks fifth in terms of all languages, according to the IEEE.) You don’t need to be a statistical expert to get started with R, because the Comprehensive R Archive Network (CRAN, a public library of R applications) now has more than 9,000 statistical modules and algorithms you can use with your data.

Microsoft’s vision for R is one that crosses the boundaries between desktop, on-premises servers, and the cloud. Locally, there’s a free R development client, as well as R support in Microsoft’s (paid) flagship Visual Studio development environment. On-premises, R Server runs on Windows and Linux, as well as inside SQL Server, giving you access to statistical analysis tools alongside your data. Local big data services based on Hadoop and Spark are also supported, while on Azure you can run R Server alongside Microsoft’s HDInsight services.

R is a tool for data scientists. Although the R language is relatively simple, you need a deep knowledge of statistical analytics to get the most from it. It’s been a long while since I took college-level statistics classes, so I found getting started with R complex because many of the underlying concepts require graduate-level understanding of complex statistical functions. The question isn’t so much whether you can write R code—it’s whether you can understand the results you’re getting.

That’s probably the biggest issue facing any organization that wants to work with big data: getting the skills needed to produce the analysis you want and, more important, to interpret the results you get. R certainly helps here, with built-in graphing tools that help you visualize key statistical measures.

Working with Microsoft R Server

The free Microsoft R Open can help your analytics team get up to speed with R before investing in any of the server products. It’s also a useful tool for quickly trying out new analytical algorithms and exploring the questions you want answered using your data. That approach works well as part of an overall analytics lifecycle, starting with data preparation, moving on to model development, and finally turning the model into tools that can be built into your business applications.

One interesting role for R is alongside GPU-based machine-learning tools. Here, R is employed to help train models before they’re used at scale. Microsoft is bundling its own machine learning algorithms with the latest R Server release, so you can test a model before uploading it to either a local big data instance or to the cloud. During a recent press event, Microsoft demonstrated this approach with astronomy images, training a machine-learning-based classifier on a local server with a library of galaxies before running the resulting model on cloud-hosted GPUs.

R is an extremely portable language, designed to work over discrete samples of data. That makes it very scalable and ideal for data-parallel problems. The same R model can be run on multiple servers, so it’s simple to quickly process large amounts of data. All you need to do is parcel out your data appropriately, then deliver it to your various R Server instances. Similarly, the same code can run on different implementations, so a model built and tested against local data sources can be deployed inside a SQL Server database and run against a Hadoop data lake.

R makes operational data models easy

Thus, R is very easy to operationalize. Your data science team can work on building the model you need, while your developers write applications and build infrastructures that can take advantage of their code. Once it’s ready, the model can be quickly deployed, and it can even be swapped out for improved models in the future without affecting the rest of the application. In the same manner, the same model can be used in different applications, working with the same data.

With a common model, your internal dashboards can show you the same answers as customer- and consumer-facing code. You can then use data to respond proactively—for example, providing delay and rebooking information to airline passengers when a model predicts weather delays. That model can be refined as you get more data, reducing the risks of false positives and false negatives.

Building R support into SQL Server makes a lot of sense. As Microsoft’s database platform becomes a bridge between on-premises data and the cloud, as well as between your systems of record and big data tools, having fine-grained analytics tools in your database is a no-brainer. A simple utility takes your R models and turns them into procs, ready for use inside your SQL applications. Database developers can work with data analytics teams to implement those models, and they don’t need to learn any new skills to build them into their applications.

Microsoft is aware that not every enterprise needs or has the budget to employ data scientists. If you’re dealing with common analytics problems, like trying to predict customer churn or detecting fraud in an online store, you have the option of working with a range of predefined templates for SQL Server’s R Services that contain ready-to-use models. Available from Microsoft’s MSDN, they’re fully customizable in any R-compatible IDE, and you can deploy them with a PowerShell script.

Source: InfoWorld Big Data

Tech luminaries team up on $27M AI ethics fund

Tech luminaries team up on M AI ethics fund

Artificial intelligence technology is becoming an increasingly large part of our daily lives. While those developments have led to cool new features, they’ve also presented a host of potential problems, like automation displacing human jobs, and algorithms providing biased results.

Now, a team of philanthropists and tech luminaries have put together a fund that’s aimed at bringing more humanity into the AI development process. It’s called the Ethics and Governance of Artificial Intelligence Fund, and it will focus on advancing AI in the public interest.

A fund such as this one is important as issues arise during AI development. The IEEE highlighted a host of potential issues with artificial intelligence systems in a recent report, and the fund seems aimed at funding solutions to several of those problems.

Its areas of focus include research into the best way to communicate the complexity of AI technology, how to design ethical intelligent systems, and ensuring that a range of constituencies is represented in the development of these new AI technologies.

The fund was kicked off with help from Omidyar Network, the investment firm created by eBay founder Pierre Omidyar; the John S. and James L. Knight Foundation; LinkedIn founder Reid Hoffman; The William and Flora Hewlett Foundation; and Jim Pallotta, founder of the Raptor Group.

“As a technologist, I’m impressed by the incredible speed at which artificial intelligence technologies are developing,” Omidyar said in a press release. “As a philanthropist and humanitarian, I’m eager to ensure that ethical considerations and the human impacts of these technologies are not overlooked.”

Hoffman, a former executive at PayPal, has shown quite the interest in developing AI in the public interest and has also provided backing to OpenAI, a research organization aimed at helping create AI that is as safe as possible.

The fund will work with educational institutions, including the Berkman Klein Center for Internet and Society at Harvard University and the MIT Media Lab. The fund has US $27 million to spend at this point, and more investors are expected to join in.

Source: InfoWorld Big Data

SolarWinds Recognized As Market Leader In Network Management Software

SolarWinds Recognized As Market Leader In Network Management Software

SolarWinds has announced the company has been recognized as the global market share leader in Network Management Software by industry analyst firm, International Data Corporation (IDC) in its latest Worldwide Semi-Annual Software Tracker. The tracker measures total market size and vendor shares based on each vendor’s software revenue, including license, maintenance, and subscription revenue.

“SolarWinds was founded on the premise that IT professionals desire IT management software that is more powerful, yet simpler to buy and much easier to use,” said Kevin B. Thompson, president and chief executive officer, SolarWinds. “IDC’s recognition of SolarWinds’ market share leadership validates that core value proposition inherent in all of our solutions, while also underscoring the incredible adoption rate we continue to see among customers in organizations of all sizes, in all parts of the world.”

According to the IDC  Worldwide Semi-Annual Software Tracker 1H 2016 release, SolarWinds® leads the network management software market with more than a 20 percent share of total market revenue for the first half of 2016. Strong demand for its Network Performance Monitor and Network Traffic Analyzer products fueled 14.2% year-over-year revenue growth during the same period.

Source: CloudStrategyMag

'Transfer learning' jump-starts new AI projects

'Transfer learning' jump-starts new AI projects

No statistical algorithm can be the master of all machine learning application domains. That’s because the domain knowledge encoded in that algorithm is specific to the analytical challenge for which it was constructed. If you try to apply that same algorithm to a data source that differs in some way, large or small, from the original domain’s training data, its predictive power may fall flat.

That said, a new application domain may have so much in common with prior applications that data scientists can’t be blamed for trying to reuse hard-won knowledge from prior models. This is a well-established but fast-evolving frontier of data science known as “transfer learning” (but goes by other names such as knowledge transfer, inductive transfer, and meta learning).

Transfer learning refers to reuse of some or all of the training data, feature representations, neural-node layering, weights, training method, loss function, learning rate, and other properties of a prior model.

Transfer learning is a supplement to, not a replacement for, other learning techniques that form the backbone of most data science practices. Typically, a data scientist relies on transfer learning to tap into statistical knowledge that was gained on prior projects through supervised, semi-supervised, unsupervised, or reinforcement learning.

For data scientists, there are several practical uses of transfer learning.

Modeling productivity acceleration

If data scientists can reuse prior work without the need to revise it extensively, transfer-learning techniques can greatly boost their productivity and accelerate time to insight on new modeling projects. In fact, many projects in machine learning and deep learning address solution domains for which there is ample prior work that can be reused to kick-start development and training of fresh neural networks.

It is also useful if there are close parallels or affinities between the source and target domains. For example, a natural-language processing algorithm that was built to classify English-language technical documents in one scientific discipline should, in theory, be readily adaptable to classifying Spanish-language documents in a related field. Likewise, deep learning knowledge that was gained from training a robot to navigate through a maze may also be partially applicable to helping it learn to make its way through a dynamic obstacle course.

Training-data stopgap

If a new application domain lacks sufficient amounts of labeled training data of high quality, transfer learning can help data scientists to craft machine learning models that leverage relevant training data from prior modeling projects. As noted in this excellent research paper, transfer learning is an essential capability to address machine learning projects in which prior training data can become easily outdated. This problem of training-data obsolescence often happens in dynamic problem domains, such as trying to gauge social sentiment or track patterns in sensor data.

An example, cited in the paper, is the difficulty of training the machine-learning models that drive Wi-Fi indoor localization, considering that the key data—signal strength—behind these models may vary widely over the time periods and devices used to collect the data. Transfer learning is also critical to the success of IoT deep learning applications that generate complex machine-generated information of such staggering volume, velocity, and variety that one would never be able to find enough expert human beings to label enough of it to kick-start training of new models.

Risk mitigation

If the underlying conditions of the phenomenon modeled have radically changed, thereby rendering prior training data sets or feature models inapplicable, transfer learning can help data scientists leverage useful subsets of training data and feature models from related domains. As discussed in this recent Harvard Business Review article, the data scientists who got the 2016 U.S. presidential election dead wrong could have benefited from statistical knowledge gained in postmortem studies of failed predictions from the U.K. Brexit fiasco.

Transfer learning can help data scientists mitigate the risks of machine-learning-driven predictions in any problem domain susceptible to highly improbable events. For example, cross-fertilization of statistical knowledge from meteorological models may be useful in predicting “perfect storms” of congestion in traffic management. Likewise, historical data on “black swans” in economics, such as stock-market crashes and severe depressions, may be useful in predicting catastrophic developments in politics and epidemiology.

Transfer learning isn’t only a productivity tool to assist data scientists with their next modeling challenge. It also stands at the forefront of the data science community’s efforts to invent “master learning algorithms” that automatically gain and apply fresh contextual knowledge through deep neural networks and other forms of AI.

Clearly, humanity is nowhere close to fashioning such a “superintelligence” — and some people, fearing a robot apocalypse or similar dystopia, hope we never do. But it’s not far-fetched to predict that, as data scientists encode more of the world’s practical knowledge in statistical models, these AI nuggets will be composed into machine intelligence of staggering sophistication.

Transfer learning will become a membrane through which this statistical knowledge infuses everything in our world.

Source: InfoWorld Big Data

Report: Enterprises Prefer Microsoft Azure, SMBs Favor Google Cloud Platform

Report: Enterprises Prefer Microsoft Azure, SMBs Favor Google Cloud Platform

A new survey by Clutch found that enterprises strongly prefer Microsoft Azure, while small- to medium-sized businesses (SMBs) gravitate toward Google Cloud Platform. The survey was conducted in order to gain more knowledge on the “Big Three” cloud providers: Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure.

Nearly 40% of Azure users surveyed identified as enterprises. In comparison, only 25% identified as SMBs and 22% identified as startups/sole proprietorships. Conversely, 41% of GCP users surveyed identified as SMBs.

The trends among enterprises and SMBs reflect the strengths of each platform. “It goes back to the trust and familiarity issues,” said Nicholas Martin, principal applications development consultant at Cardinal Solutions, an IT solutions provider. “Windows Server and other Microsoft technologies are prevalent in the enterprise world. Azure provides the consistency required by developers and IT staff to tightly integrate with the tools that Microsoft leaning organizations are familiar with.”

Meanwhile, Dave Hickman, vice president of global delivery at Menlo Technologies, an IT services company, said that “small businesses tend to lean more on pricing than security or toolsets.” Thus, GCP’s lower pricing can be more palatable for an SMB.

Clutch’s survey also investigated the primary reasons respondents selected one of the three providers. The largest percentage of users (21%) named “better selection of tools/features” as their top reason, while “familiarity with brand” and “stronger security” nearly tied for second place.

Experts emphasized how users will choose a provider based on the selection of tools or features it offers. “Infrastructure-as-a-service will reside mainly on AWS, cloud services will be on Microsoft’s side, while Google will dominate analytics,” said Brian Dearman, solutions architect at Mindsight, an IT infrastructure consulting firm. “Even though every platform offers each type of service, people will want the best.”

The survey included 85 AWS users, 86 GCP users and 76 Microsoft Azure users. While these totals do not reflect each platform’s market share, the nearly even number of respondents using each provider allowed Clutch to analyze opinions and behaviors more equally.

Based on the survey findings, Clutch recommends that companies consider the following:

  • If your business is an enterprise, requires Windows integration, or seeks a strong PaaS (platform-as-a-service) provider, consider Microsoft Azure.
  • For heavy emphasis on analytics or if you are an SMB with a limited budget, look into GCP.
  • If service longevity, IaaS (infrastructure-as-a-service) offerings, and a wide selection of tools are important to you, AWS may be your best option.

Source: CloudStrategyMag

Busted! 5 myths of digital transformation

Busted! 5 myths of digital transformation

“Digital” is the new “cloud.” Once upon a time, these words meant something. Now they mean whatever a speaker wants them to mean — especially if, internally or externally, they’re trying to sell you something. Not surprising, this level of ambiguity has created a fertile environment for mythical thinking.

Behind all the blather and marketing mayhem, digital this and digital that can provide serious opportunities for companies whose leaders can see through the haziness.

And it creates serious challenges for CIOs — challenges that clear-eyed CIOs can prepare for and overcome, but that will bulldoze the unwary ones who see it as more of the same old same-old.

With this in mind, here are five common myths you’ve probably encountered when reading about digital transformation, along with the nonmythical issues and opportunities behind them.

Source: InfoWorld Big Data