The AI overlords have already won

The AI overlords have already won

AI and its many subsets, including machine learning and bots, have been incredibly hyped of late, claiming to revolutionize the way humans interact with machines. InfoWorld, for example, has reviewed the machine learning APIs offered by the major clouds. Everyone wonders who will be the big winner in this new world.

Bad new for those who like drama: The war may already be over. If AI is only as good as the algorithms — and more important, the data fed to them — who can hope to compete with Amazon, Apple, Facebook, Google, and Microsoft, all of which continually feast on the data we happily give them every day?

All your bots are belong to us

Former Evernote CEO and current venture capitalist Phil Libin has suggested that bots are on par with browsers 20 years ago: basic command lines control them with minimalistic interfaces. (“Alexa, what is the weather today?”) Bots, however, promise to be far richer than browsers, with fewer limits on how we inject data into the systems and better ways to pull data-rich experiences therefrom — that is, if we can train them with enough quality data.

This isn’t a problem for the fortunate few: Amazon, Apple, Facebook, Google, Microsoft, and a handful of others are swimming in data. In exchange for free services like email or Siri, we gladly give mountains of data to these companies. In so doing, we may be unwittingly building out competitive differentiation for them that could last for a long, long time.

Who, for example, can hope to compete with Google’s sense of location, given its mapping service, which relies on heavily structured data that we feed it every time we ask for directions? Or how about Facebook, which understands unstructured interactions between people better than anyone else?

All trends point to this getting worse (or better, depending on your trust of concentrations of power). Take your smartphone. Originally we exulted in the sea of apps available in the various app stores, unlocking a cornucopia of services. A few years into the app revolution, however, and the vast majority of the apps that consume up to 90 percent of our mobile hours are owned by a handful of companies: Facebook and Google, predominantly. All that data we generate on our devices? Owned by very few companies.

On the back end, these same companies dominate, making up the “megacloud” elite. Tim O’Reilly first pointed out this trend, arguing that megaclouds like Facebook and Microsoft would be difficult to beat because of the economies of scale that make them stronger even as they grow bigger. While he was talking about infrastructure, the same principle applies to data. The rich get richer.

In the case of AI bot interfaces, the data-rich may end up as the only ones capable of delivering experiences that consumers find credible and useful.

CompuServe 3.0

If this seems bleak, it’s because it is. It’s hard to see how any upstart challenger can hope to rend control of consumer data from these megaclouds with the processing power, data science smarts, and treasure trove of user info. The one ray of light, perhaps, is if someone can introduce a superior “curation layer.”

For example, today I might conversationally ask Apple’s Siri or Amazon’s Alexa to point out nearby sushi restaurants. Both are able to tap into a places-of-interest database and spit out an acceptable response. However, what if I really want not merely nearby sushi restaurants, but nearby sushi restaurants recommended by someone whose food preferences I trust?

Facebook appears to be in pole position to use its knowledge of my human interactions to give the best answer, but it actually doesn’t. Just because I’m friends with someone on Facebook doesn’t mean I care about their preferred restaurants. I almost certainly will never have expressed my belief in digital text that their taste in food is terrible. (I don’t want to be rude, after all.) Thus, the field is open to figure out which sources I do trust, then curate accordingly.

This is partly a matter of data, but ultimately it’s a matter of superior algorithms coupled with better interpretation of signals that inform those algorithms. Yes, Google or Facebook might be first to develop such algorithms and interpret the signals, but in the area of data curation there’s still room for hope that new entrants can win.

Otherwise, all our data belongs to these megaclouds for the next 10 years, as it has for the last 10 years. And they’re using it to get smarter all the time.

Source: InfoWorld Big Data

Big data security is a big mess

Big data security is a big mess

Given the pace at which big data software is released, coupled with the sheer volume of data under management, the big data market is ripe for massive security breaches. It’s only a matter of time.

In fact, as a Gartner survey last year uncovered, very few companies have taken security seriously for essential infrastructure like Hadoop. At that time, a mere 2 percent of respondents cited Hadoop security as a significant concern, causing Gartner analyst Merv Adrian to exclaim, “The nearly non-existent response to the security issue is shocking.”

CIOs, in other words, may be willing to close their eyes and pray for big data security, but until they make it a priority, such “prayers” are vain.

What, me worry?

For years enterprises have taken a somewhat blase approach to security in big data infrastructure such as Hadoop, despite the size of big data leading to “origins [that] are not consistently monitored and tracked.” In early 2014, Adrian, noting a lack of interest in Hadoop security, queried, “Can it be that people believe Hadoop is secure? Because it certainly is not. At every layer of the stack, vulnerabilities exist, and at the level of the data itself there numerous concerns.”

The Apache Foundation's incredible rise

The Apache Foundation's incredible rise

The Apache Software Foundation recently released its 28-page annual report for its 2015-2016 year, but here’s the TL;DR in one word: amazing.

What started as a simple HTTP server supported by a handful of developers in 1995 has become an army of 3,425 ASF committers and 5,922 Apache code contributors building 291 top-level projects.

Of course, during this same time, open source in general has grown exponentially. But the ASF has seen particularly impressive growth as it propels big data forward with dozens of popular projects, along with dev tools and more general fare. The reason, as board member Jim Jagielski explained in an interview, is the ASF’s emphasis on neutral, community-focused development.

Not bad for an organization that costs less than $1 million to run each year — especially compared to other open source foundations that put the needs of corporate interests above those of the developer community.

Businesses harbor big data desires, but lack know-how

Businesses harbor big data desires, but lack know-how

Big data has never been bigger, nor more of a crapshoot. At least, that’s the sense one gets from a new survey revealing that 76 percent of all enterprises are looking to maintain or increase their investments in big data over the next few years. This despite a mere 23.5 percent owning up to a clear big data strategy.

That wouldn’t be so bad if things were getting better, but they’re not. Three years ago 64 percent of enterprises told Gartner that they were hopped up on the big data opportunity. But then, as now, the vast majority of big data acolytes didn’t have a clue as to how to get value from their data.

Despite our best attempts to capture signal from all the big data noise, in other words, we’re mostly flying blind.

Bigger and bigger!

The consultancy DNV GL Business Assurance, in partnership with research institute GFK Eurisko, polled 1,189 enterprises across the globe to better understand their big data plans. A majority of these companies — 52 percent — see big data as a big opportunity. That number climbs to 70 percent among large companies (over 1,000 employees) and tops 96 percent of those the report authors categorize as Leaders.

Dear Silicon Valley: Stop saying stupid stuff

Dear Silicon Valley: Stop saying stupid stuff

“Disruption” isn’t the same as “stupid,” but they sometimes sound similar. At least, they do when uttered by a certain strain of Silicon Valley entrepreneur.

This thought struck me while listening to a Valley exec at an enterprise software conference. He stumbled through PowerPoint (“How do you people use this app? I’m a Keynote guy”), agonized over how he could “possibly get used to Exchange after running his startup on Gmail” (his company had recently been acquired by a large software vendor), and generally made it clear that he had no idea how real companies work.

He lives in a bubble that has drones delivering tacos to those not already subsisting on Soylent. He wants to change enterprise computing, but he clearly has no appreciation for the challenges facing enterprises mired in decades of technical debt.

He is, in other words, either the worst or best person to change the world. (My vote: worst.)

HBase: The database big data left behind

HBase: The database big data left behind

A few years ago, HBase looked set to become one of the dominant databases in big data. The primary pairing for Hadoop, HBase saw adoption skyrocket, but it has since plateaued, especially compared to NoSQL peers MongoDB, Cassandra, and Redis, as measured by general database popularity.

The question is why.

That is, why has HBase failed to match the popularity of Hadoop, given its pole position with the popular big data platform?

The answer today may be the same offered here on InfoWorld in 2014: It’s too hard. Though I and others expected HBase to rival MongoDB and Cassandra, its narrow utility and inherent complexity have hobbled its popularity and allowed other databases to claim the big data crown.