Data is eating the software that is eating the world

No one doubts that software engineering shapes every last facet of our 21st century existence. Given his vested interest in companies whose fortunes were built on software engineering, it was no surprise when Marc Andreessen declared that “software is eating the world.”

But what does that actually mean, and, just as important, does it still apply, if it ever did? These questions came to me recently when I reread Andreessen’s op-ed piece and noticed that he equated “software” with “programming.” Just as significant, he equated “eating” with industry takeovers by “Silicon Valley-style entrepreneurial technology companies” and then rattled through the usual honor roll of Amazon, Netflix, Apple, Google, and the like. What they, and others cited by Andreessen, have in common is that they built global-scale business models on the backs of programmers who bang out the code that drives web, mobile, social, cloud, and other 24/7 online channels.

Since the piece was published in the Wall Street Journal in 2011, we’ve had more than a half-decade to see whether Andreessen’s epic statement of Silicon Valley triumphalism proved either prescient or, perhaps, merely self-serving and misguided. I’d say it comes down more on the prescient end of the spectrum, due to the fact that most (but not all) of the success stories he cited have continued their momentum in growth, profitability, acquisitions, innovation, and so forth. People from programming backgrounds – such as Mark Zuckerberg – are indeed the multibillionaire rockstars of this new business era. In this way, Andreessen has so far been spared the fate of Tom Peters, who saw many of the exemplars he cited in his 1982 bestseller “In Search of Excellence” go on to be deconstructed by business rivals or blindsided by trends they didn’t see coming.

Rise of the learning machines

However, it has become clear to everyone, especially the old-school disruptors cited by Andreessen, that “software,” as it’s normally understood, is not the secret to future success. Going forward, the agent of disruption will be the data-driven ML (machine learning) algorithms that power AI. In this new era, more of the logic that powers intelligent applications won’t be explicitly programmed. The days of predominantly declarative, deterministic, and rules-based application development are fast drawing to a close. Instead, the probabilistic logic at the heart of chatbots, recommendation engines, self-driving vehicles, and other AI-powered applications is being harvested directly from source data.

The “next best action” logic permeating our lives is evolving inside of applications through continuous inference from data originating in the Internet of Things and other production applications. Consequently, there will be a diminishing need for programmers, in the traditional sense of people who build hard-and-fast application logic. In their place, the demand for a new breed of developer – the data scientist – will continue to grow. This term refers to the wide range of specialists who craft, train, and manage the regression models, neural networks, support vector machines, unsupervised learning models, and other ML algorithms upon which AI-centric apps depend.

To compound the marginalization of programmers in this new era, we’re likely to see more ML-driven code generation along the lines that I discussed in this recent post. Amazon, Google, Facebook, Microsoft, and other software-based powerhouses have made huge investments in data science, hoping to buoy their fortunes in the post-programming era. They all have amassed growing sets of training data from their ongoing operations. For these reasons, the “Silicon Valley-style” monoliths are confident that they have the resources needed to build, tune, and optimize increasingly innovative AI/ML-based algorithms for every conceivable application.

However, any strategic advantages that these giants gain from these AI/ML assets may be short-lived. Just as data-driven approaches are eroding the foundations of traditional programming, they’re also beginning to nibble at the edges of what highly skilled data scientists do for a living. These trends are even starting to chip away at the economies of scale available to large software companies with deep pockets.

AI and the Goliaths

We’re moving into an era in which anyone can tap into cloud-based resources to cheaply automate the development, deployment, and optimization of innovative AI/ML apps. In a “snake eating its own tail” phenomenon, ML-driven approaches will increasingly automate the creation and optimization of ML models, per my discussion here. And, from what we’re seeing in research initiatives such as Stanford’s Snorkel project, ML will also play a growing role in automating the acquisition and labeling of ML training data. What that means is that, in addition to abundant open-source algorithms, models, code, and data, the next-generation developer will also be able to generate ersatz but good-enough labeled training data on the fly to tune new apps for their intended purposes.

As the availability of low-cost generative training data grows, the established software companies’ massive data lakes, in which their developers maintain petabytes of authentic from-the-source training data, may become more of an overhead burden than a strategic asset. Likewise, the need to manage the complex data-preparation logic for use of this source data may become a bottleneck that impedes the ability of developers to rapidly build, train, and deploy new AI apps.

When any developer can routinely make AI apps just as accurate as Google’s or Facebook’s – but with far less expertise, budget, and training data than the big shots – a new era will have dawned. When we reach that tipping point, the next generation of data-science-powered disruptors will start to eat away at yesteryear’s software startups.

Source: InfoWorld Big Data