Who took the 'no' out of NoSQL?

November 21, 2016 by Matt Asay Posted in Industry Insights & News

For years we’ve seen the database market split between the traditional relational database and new-school NoSQL databases. According to Gartner, however, these two worlds are heading toward further consolidation. As Gartner analyst Nick Huedecker opines, “Each week brings more SQL into the NoSQL market subsegment. The NoSQL term is less and less useful as a categorization.”

Yet that promised “consolidation” may not be all that Gartner predicts. If anything, we may be seeing NoSQL databases—rich in flexibility, horizontal scalability, and high performance—don enough of the RDBMS’s SQL clothing to ultimately displace the incumbents. But the “NoSQL vendor” most likely to dominate over the long term may surprise you.

NoSQL: Wrong name, right idea

“NoSQL” has always been somewhat of a misnomer, both because it purports to exclude SQL and because it lumps together very different databases under a common framework. A graph database like Neo4j, for example, is completely different from a columnar database like Cassandra.

What they share, however, is a three-fold focus, as Kelly Stirman, CMO at a stealth analytics startup and former MongoDB executive, told me in an interview. In his words, “NoSQL introduced three key innovations that the market has embraced and that the traditional vendors are working to add: 1) flexible data model, 2) distributed architecture (critical for cloud), and 3) flexible consistency models (critical for performance).”

Each element was critical to enabling modern, increasingly cloud-based applications, and each has presented traditional RDBMSes with a host of problems. Yes, most RDBMSes have implemented good enough but not great flexible data models. Yes, they’re also attempting flexible consistency models, with varying levels of (non)success. And, yes, they’re all trying to embrace a distributed architecture and finding it a brutally tough slog.

Even so, these attempts by the RDBMSes to become more NoSQL-like has led, in the words of DataStax chief evangelist Patrick McFadin in a conversation, to a “great convergence” that ultimately yields “multimodel” databases. Importantly, McFadin continued, this same convergence is taking place among the NoSQL databases as they add various components of the RDBMS in an attempt to hit massive mainstream adoption.

But make no mistake, such convergence is not without its problems.

Convergence interrupted

As Rohi Jain, CTO at Esgyn, describes it:

It is difficult enough for a query engine to support single operational, BI, or analytical workloads (as evidenced by the fact that there are different proprietary platforms supporting each). But for a query engine to serve all those workloads means it must support a wider variety of requirements than has been possible in the past. So, we are traversing new ground, one that is full of obstacles.

This inability to have one data model rule them all afflicts the RDBMS more than NoSQL, Mat Keep, director of product and market analysis at MongoDB, told me: “Relational databases have been trying to keep up with the times as well. But most of the changes they’ve made have been stopgaps–adding new data types rather than addressing the core inflexibility of the relational data model, for example.”

Meanwhile, he notes, “Our customers have a desire to stop managing many special snowflakes and converge on a single, integrated platform that provides all the new capabilities they want with the reliability and full features that they need.” DataStax has been doing the same with Cassandra, as both companies expand their NoSQL footprints with support for the likes of graph databases, but also going deeper on SQL with connectors that allow SQL queries to be translated into a language that document and columnar databases can understand.

None of these efforts really speaks to NoSQL’s long-term advantage over the venerable RDBMS. Everybody wants to speak SQL because that’s where the primary body of skills reside, given decades of enterprise build-up around SQL queries. But the biggest benefit of NoSQL, and the one that RDBMSes have failed to master, according to Stirman, is its distributed architecture.

Jared Rosoff, chief technologist of Cloud Native Apps at VMware, underlines this point: “Even if all the databases converged on SQL as query language, the NoSQL crowd benefits from a fundamentally distributed architecture that is hard for legacy engines to replace.” He continues, “How long is it going to get MySQL or Postgres or Oracle or SQL Server to support a 100-node distributed cluster?”

Though both the RDBMS and NoSQL camps have their challenges with convergence, “It’s way easier for the NoSQL crowd to become more SQL-like than it is for the SQL crowd to become more distributed” and “a fully SQL compliant database that doesn’t scale that well” will be inferior to “a fully distributed database that supports only some subset of SQL.”

In short, SQL is very useful but replaceable. Distributed computing in our big data world, quite frankly, is not.

Winner take some

In this world of imperfect convergence, NoSQL seems to have the winning hand. But which NoSQL vendor will ultimately dominate?

Early momentum goes to MongoDB and DataStax-fueled Cassandra, but Stirman suggests a different winner entirely:

What the market really wants is an open source database that is easy to use and flexible like MongoDB, scales like Cassandra, is battle hardened like Oracle, all without changing their security and tooling. MongoDB is best positioned to deliver this, but AWS is most likely to capture the market long term.

Yes, AWS, the same company that most threatens to own the Hadoop market, not to mention enterprise infrastructure generally. Amazon, the dominant force in the public cloud, is best positioned to capitalize on the enterprise shift toward the cloud and the distributed applications that live there. Database convergence, in sum, may ultimately be Bezos’ game to lose.

Source: InfoWorld Big Data

AeroVironment's Quantix drone is all about the data

November 18, 2016 by Magdalena Petrova Posted in Industry Insights & News

AeroVironment's Quantix drone is all about the data

In the age of technology, businesses are all chasing efficiency. That’s exactly what AeroVironment promises to deliver with its new Quantix drone.

The technology, a combination of a drone and cloud-based analysis service, can be useful for farmers, says Steve Gitlin, vice president of corporate strategy at AeroVironment.

“In many cases, farmers rely on themselves or their people to walk the fields, and if they’re managing large fields in excess of 100 acres or so, then it’s very difficult to walk the entire field in any given unit of time. So they have to rely on their deep experience and sampling.”

Equipped with RBG and multispectral cameras, Quantix is capable of covering 400 acres of land during a single flight, all the while collecting high-resolution images. The data can be instantly analyzed on the included tablet, which is also used to launch and land the drone with the click of a button.

For a deeper analysis, customers can log into AeroVironment’s cloud service called Decision Support System (DSS), which is compatible with many of the company’s other unmanned systems.

Quantix takes off and lands vertically, making it easy to operate, but transitions to horizontal flight in the air, which gives it a longer range. In the United States, the Federal Aviation Administration still requires that drones fly in operators’ line of sight, but if the regulations are loosened, Quantix could be useful for pipeline, road, and power line inspections, Gitlin says, because the drone can cover 40 linear miles in less than an hour.

Quantix will be available in the spring of 2017. A price has not yet been announced.

Source: InfoWorld Big Data

10 things you need to worry about in 2017

November 17, 2016 by Andrew C Oliver Posted in Industry Insights & News

10 things you need to worry about in 2017

Each year, including last year, I’ve supplied you with “areas of concern”—that is, stuff that might not go well for you or our comrades in the coming 12 months. I’m happy to oblige once again this year with 10 items that may go bump in the night.

Hadoop distributions

Big data, analytics, and machine learning are alive and well, and they’ll eventually transform business in most of the ways they’ve promised. But the big, fat Hadoop distribution is probably toast.

This isn’t to say everyone involved is in trouble, but we’re looking at more of an à la carte situation, or at least a buffet, where you don’t have to swallow the whole elephant. Burned by projects that never completed or met their promise in previous years, companies will be more reluctant to bite off the whole dish and instead look at what they’re trying to do and actually need at the infrastructure level. Technology companies that can adapt to this reality will make even more money.

Hadoop vendors

Three major Hadoop vendors along with big “do everything companies” (especially the Big Blue one) are in this game. We already saw Pivotal essentially exit. It’s hard to see the market continue to support three Hadoop vendors. See the above item to figure out who I’m betting on.

Oracle

Oracle likes to buy companies. It helps make up for the fact that the core Oracle database is old and clunky, and Oracle doesn’t make anything new or great. If it buys something you use, expect the price to go up. Oracle loves the long tail, particularly entrenched, hard-to-remove, older technology. Once it’s in the company’s clutches, you get that famed Oracle technical support, too.

Databricks

Something will change at Databricks, the cloud company built around Spark, the open source distributed computing framework that has essentially supplanted Hadoop. While Spark is great, the Databricks business model isn’t as compelling, and it seems easily disrupted by one of the big three cloud vendors. The company is run by academics, and it needs hard-knuckled business types to sort out its affairs. I hope the change won’t be too disruptive to Spark’s development—and can be accomplished without hurt feelings, so we don’t lose progress.

Deregulation

Now that we have the Trumpocalypse to look forward to, you can expect “deregulation” of everything, from unlimited poison in your groundwater to the death of Net neutrality. Lest you think that will boost the tech economy, note that software vendors make big money selling compliance solutions, fewer of which will be necessary. Also, the Affordable Care Act (Obamacare) and electronic medical/health records have been a boon for tech. Some of Obamacare may remain, but very likely the digital transformation of health will be scaled way back.

Clinton’s plans had their own problems, but regardless of where you stand politically, the Trump presidency will hit us where it hurts—especially after California secedes. (Or will there be six Californias?)

Game consoles

How is this related to enterprise software? Well, the game industry is a good chunk of the tech sector, and some giants depend on console games as blockbusters. Game consoles are specialized computers with a very specific programming models and guaranteed upgrades. Everyone is doing “pro” versions to get shorter-term revenue grabs—instead of waiting, say, seven years to sell new consoles—which comes at the cost of a stable platform that game developers can depend on.

Meanwhile, mobile games are huge, Steam keeps rising, and people are playing computer games again. I suspect this will start to depress the console business. Game developers will struggle with how many platforms they need to keep up with, and some giants will stumble.

Yet another hacking scandal

Once again, tech, government, and business will fail to learn the lesson that security can’t be bought and deployed like a product. They will persist in hiring the cheapest developers they can find, flail at project management, and suffer nonexistent or hapless QA. If a program runs, then it has stmt.execute(“select something from whatever where bla =”+ sql_injection_opportunity) throughout the code. That’s in business—government is at least 20 years behind. Sure, we’re giving Putin a big hug, but don’t expect him to stop hacking us.

The economy

It seems like the Great Recession was just yesterday, but we’re due for another. At the same time, we don’t have a lot of big, new enterprise tech to brag about. I’m not saying it’s time to climb in the lifeboat, but you might want to make sure you have a safety net in case we’re hit with another downturn. My guess is it will be smaller than the dot-bomb collapse, so don’t fret too much.

Telco-cable mergers

With Google dialing back Google Fiber and an impending AT&T-Time Warner merger, our overpriced connections to the internet are unlikely to get cheaper—and speed increases will probably be less frequent.

Your math skills

Thanks to machine learning, it will be harder to command a six-figure developer salary without a mathematical background. As companies figure out what machine learning is and what it can do, before paying a premium for talent, they’ll start to require that developers understand probability, linear algebra, multivariable calculus, and all that junk. For garden-variety programming, they’ll continue to accelerate their plan to buy talent in “low-cost countries.”

Now let’s crank it to 11: As you may have heard, we’ve elected a narcissistic agent of the white supremacist (now rebranded “alt-right”) movement who doesn’t even know how to use a computer, and we’ve put him in charge of the nukes. This is going to be a disaster for everyone, of course, but for tech in particular if we all survive. But hey, next week I’ll try looking on the bright side.

Source: InfoWorld Big Data

Hadoop, we hardly knew ye

November 16, 2016 by Matt Asay Posted in Industry Insights & News

Hadoop, we hardly knew ye

It wasn’t long ago that Hadoop was destined to be the Next Big Thing, driving the big data movement into every enterprise. Now there are clear signs that we’ve reached “peak Hadoop,” as Ovum analyst Tony Baer styles it. But the clearest indicator of all may simply be that “Hadoop” doesn’t actually have any Hadoop left in it.

Or, as InfoWorld’s Andrew Oliver says it, “The biggest thing you need to know about Hadoop is that it isn’t Hadoop anymore.”

Nowhere is this more true than in newfangled cloud workloads, which eschew Hadoop for fancier options like Spark. Indeed, as with so much else in enterprise IT, the cloud killed Hadoop. Or perhaps Hadoop, by moving too fast, killed Hadoop. Let me explain.

Is Hadoop and the cloud a thing of the past?

The fall of Hadoop has not been total, to be sure. As Baer notes, Hadoop’s “data management capabilities are not yet being matched by Spark or other fit-for-purpose big data cloud services.” Furthermore, as Oliver describes, “Even when you’re not using Hadoop because you’re focused on in-memory, real-time analytics with Spark, you still may end up using pieces of Hadoop here and there.”

By and large, however, Hadoop is looking decidedly retro in these cloudy days. Even the Hadoop vendors seem to have moved on. Sure, Cloudera still tells the world that Cloudera Enterprise is “powered by Apache Hadoop.” But if you look at the components of its cloud architecture, it’s not Hadoop all the way down. IBM, for its part, still runs Hadoop under the hood of its BigInsights product line, but if you use its sexier new Watson Data Platform, Hadoop is missing in action.

The reason? Cloud, of course.

As such, Baer is spot on to argue, “The fact that IBM is creating a cloud-based big data collaboration hub is not necessarily a question of Spark vs. Hadoop, but cloud vs. Hadoop.” Hadoop still has brand relevance as a marketing buzzword that signifies “big data,” but its component parts (HDFS, MapReduce, and YARN) are largely cast aside for newer and speedier cloud-friendly alternatives as applications increasingly inhabit the cloud.

Change is constant, but should it be?

Which is exactly as it should be, argues Hadoop creator Doug Cutting. Though Cutting has pooh-poohed the notion that Hadoop has been replaced by Spark or has lost its relevance, he also recognizes the strength that comes from software evolution. Commenting on someone’s observation that Cloudera’s cloud stack no longer has any Hadoop components in it, Cutting tweeted: “Proof that an open source platform evolves and improves more rapidly. Entire stack replacement in a decade! Wonderful to see.”

It’s easy to overlook what a powerful statement this is. If Cutting were a typical enterprise software vendor, not only would he not embrace the implicit accusation that his Hadoop baby is ugly (requiring replacement), but also he’d do everything possible to lock customers into his product. Software vendors get away with selling Soviet-era technology all the time, even as the market sweeps past them. Customers locked into long-term contracts simply can’t or don’t want to move as quickly as the market does.

For an open source project like Hadoop, however, there is no inhibition to evolution. In fact, the opposite is true: Sometimes the biggest problem with open source is that it moves far too quickly for the market to digest.

We’ve seen this to some extent with Hadoop, ironically. A year and a half ago, Gartner called out Hadoop adoption as “fairly anemic,” despite its outsized media attention. Other big data infrastructure quickly marched past it, including Spark, MongoDB, Cassandra, Kafka, and more.

Yet there’s a concern buried in this technological progress. One of the causes of Hadoop’s market adoption anemia has been its complexity. Hadoop skills have always fallen well short of Hadoop demand. Such complexity is arguably exacerbated by the fast-paced evolution of the big data stack. Yes, some of the component parts (like Spark) are easier to use, but not if they must be combined with an ever-changing assortment of other component parts.

In this way, we might have been better off with a longer shelf life for Hadoop, as we’ve had with Linux. Yes, in Linux the modules are constantly changing. But there’s a system-level fidelity that has enabled a “Linux admin” to actually mean something over decades, whereas keeping up with the various big data projects is much more difficult. In short, rapid Hadoop evolution is both testament to its flexibility and cause for concern.

Source: InfoWorld Big Data

Review: Spark lights up machine learning

November 16, 2016 by Martin Heller Posted in Industry Insights & News

Review: Spark lights up machine learning

As I wrote in March of this year, the Databricks service is an excellent product for data scientists. It has a full assortment of ingestion, feature selection, model building, and evaluation functions, plus great integration with data sources and excellent scalability. The Databricks service provides a superset of Spark as a cloud service. Databricks the company was founded by the original developer of Spark, Matei Zaharia, and others from U.C. Berkeley’s AMPLab. Meanwhile, Databricks continues to be a major contributor to the Apache Spark project.

In this review, I’ll discuss Spark ML, the open source machine learning library for Spark. To be more accurate, Spark ML is the newer of two machine learning libraries for Spark. As of Spark 1.6, the DataFrame-based API in the Spark ML package was recommended over the RDD-based API in the Spark MLlib package for most functionality, but was incomplete. Now, as of Spark 2.0, Spark ML is primary and complete and Spark MLlib is in maintenance mode.

Source: InfoWorld Big Data

How IBM's Watson will change cybersecurity

November 15, 2016 by Fahmida Y Rashid Posted in Industry Insights & News

How IBM's Watson will change cybersecurity

IBM captured our imaginations when it unveiled Watson, the artificial intelligence computer capable of playing—and winning—the “Jeopardy” game show. Since then, Big Blue has been introducing Watson’s analytics and learning capabilities across various industries, including health care and information security.

Cognitive security technology such as Watson for Cybersecurity can change how information security professionals defend against attacks by helping them digest vast amounts of data. IBM Security is currently in the middle of a year-long research project working with eight universities to help train Watson to tackle cybercrime. Watson has to learn the “language of cybersecurity” to understand what a threat is, what it does, and what indicators are related.

“Generally we learn by examples,” says Nasir Memon, professor of computer science and engineering at NYU Tandon School of Engineering. We get an algorithm and examples, and we learn when we are able to look at a problem and recognize it as similar to other incidents.

Information security is no stranger to machine learning. Many next-generation security defenses already incorporate machine learning, big data, and natural language processing. What’s different with cognitive computing is the fact that it can blend human-generated security knowledge with more traditional security data. Consider how much security knowledge passes through the human brain and comes out in the form of research documents, industry publications, analyst reports, and blogs.

Someone saw or read something and thought it was important enough to write a blog post or a paper about it, says Jeb Linton, the chief security architect of IBM Watson. Cognitive systems can recognize the rich contextual significance of that piece of knowledge and apply traditional machine-generated data to help analysts get a better understanding of what they are seeing.

“It’s about learning how to take human expertise [in the form of blog posts, articles] mostly in the form of language, and to use it as training data for machine learning algorithms,” Linton says.

Technology innovation has to actually address the challenges security professionals are currently facing, or it remains on the fringes as a cool but not practical option. Cognitive security has the potential to reduce incident response times, optimize accuracy of alerts, and stay current with threat research.

“We need to make sure these technologies are actually solving the problems that security professionals are facing, both today and in the future,” wrote Diana Kelley on IBM’s Security Intelligence.

According to recent statistics from IBM Institute of Business Value, 40 percent of security professionals believe cognitive security will improve detection and incident response decision-making capabilities, and 37 percent believe cognitive security solutions will significantly improve incident response time. Another 36 percent of respondents think cognitive security will provide increased confidence to discriminate between innocuous events and true incidents. If security analysts were able to stay current on threats and increase accuracy of alerts, they could also reduce response time.

More than half (57 percent) of security leaders believed that cognitive security solutions can significantly slow the efforts of cybercriminals.

These are high expectations for Watson for Cybersecurity, and IBM is working with eight different universities to feed up to 15,000 new documents into Watson every month, including threat intelligence reports, cybercrime strategies, threat databases, and materials from its own X-Force research library. In the video below, IBM’s Linton and NYU’s Memon talk about how machines learn and what the future of cognitive security technology looks like.

It’s easy to dismiss cognitive technology and its promises of dramatically changing how information security professionals defend themselves from attackers as more buzzwords. But interest from other fields is growing: Cognitive computing is slated to become a $47 billion industry by 2020, according to recent figures from IDC. While cognitive security is still in early stages, information security professionals see how the technology will help analysts make better and faster decision using vast amounts of data.

Source: InfoWorld Big Data

Deep learning is already altering your reality

November 14, 2016 by James Kobielus Posted in Industry Insights & News

Deep learning is already altering your reality

We now experience life through an algorithmic lens. Whether we realize it or not, machine learning algorithms shape how we behave, engage, interact, and transact with each other and with the world around us.

Deep learning is the next advance in machine learning. While machine learning has traditionally been applied to textual data, deep learning goes beyond that to find meaningful patterns within streaming media and other complex content types, including video, voice, music, images, and sensor data.

Deep learning enables your smartphone’s voice-activated virtual assistant to understand spoken intentions. It drives the computer vision, face recognition, voice recognition, and natural language processing features that we now take for granted on many mobile, cloud, and other online apps. And it enables computers—such as the growing legions of robots, drones, and self-driving vehicles—to recognize and respond intelligently and contextually to the environment patterns that any sentient creature instinctively adapts to from the moment it’s born.

But those analytic applications only scratch the surface of deep learning’s world-altering potential. The technology is far more than analytics that see deeply into environmental patterns. Increasingly, it’s also being used to mint, make, and design fresh patterns from scratch. As I discussed in this recent post, deep learning is driving the application logic being used to create new video, audio, image, text, and other objects. Check out this recent Medium article for a nice visual narrative of how deep learning is radically refabricating every aspect of human experience.

These are what I’ve referred to as the “constructive” applications of the technology, which involve using it to craft new patterns in new artifacts rather than simply introspecting historical data for pre-existing patterns. It’s also being used to revise, restore, and annotate found content and even physical objects so that they can be more useful for downstream uses.

You can’t help but be amazed by all this until you stop to think how it’s fundamentally altering the notion of “authenticity.” The purpose of deep learning’s analytic side is to identify the authentic patterns in real data. But if its constructive applications can fabricate experiences, cultural artifacts, the historical record, and even our bodies with astonishing verisimilitude, what is the practical difference between reality and illusion? At what point are we at risk of losing our awareness of the pre-algorithmic sources that should serve as the bedrock of all experience?

This is not a metaphysical meditation. Deep learning has advanced to the point where:

You can autocorrect images by generating and superimposing onto the original any visual elements that were missing, obscure, or misleading.
You can transform any rough doodle into an impressive drawing that seems to have been created by expert human artists who were depicting real-world models.
You can take hand-drawn sketches of human faces and algorithmically transform them into photorealistic images.
You can transform any low-resolution original image into a natural-looking high-resolution version.
You can instruct a computer to render any image so that it appears it was composed by a specific human artist in a specific style.
You can organically conjure from any image any patterns, figures, and other details that were not present in the source.
You can automatically generate captions, annotations, and other narratives from images and other source content so that it appears they were composed by authentic eyewitnesses or subject matter experts.
You can render any computer-generated voice into one that truly sounds like it was naturally produced in a human vocal tract.
You can rely on a computer to compose music that feels like it expresses some authentic feeling deep in the soul of an actual human musician.
You can fabricate highly functional physical objects, such as prosthetic limbs and organic molecules, from scratch through 3D printing, CRISPR, and other new technologies.

Clearly, the power to construct is also the power to reconstruct, and that’s tantamount to having the power to fabricate and misdirect. Though we needn’t sensationalize this, deep learning’s reconstructive potential can prove problematic in cognitive applications, given the potential for algorithmic biases to cloud decision support. If those algorithmic reconstructions skew environmental data too far from bedrock reality, the risks may be considerable for deep learning applications such as self-driving cars and prosthetic limbs upon which people’s very lives depend.

Though there’s no stopping the advance of deep learning into every aspect of our lives, we can in fact bring greater transparency into how those algorithms achieve their practical magic. As I discussed in this post, we should be instrumenting deep learning applications to facilitate identification of the specific algorithmic path (such as the end-to-end graph of source information, transformations, statistical models, metadata, and so on) that was used to construct a specific artifact or take a particular action in a particular circumstance.

Just as important, every seemingly realistic but algorithmically generated artifact that we encounter should have that fact flagged in some salient way so that we can take that into account as we’re interacting with it. Just as some people wish to know if they’re consuming genetically modified organisms, many might take interest in whether they’re engaging with algorithmically modified objects.

If we’re living in an algorithmic bubble, we should at the very least know how it’s bending and coloring whatever rays of light we’re able to glimpse through it.

Source: InfoWorld Big Data

Get started with TensorFlow

November 14, 2016 by Martin Heller Posted in Industry Insights & News

Get started with TensorFlow

Machine learning couldn’t be hotter, with several heavy hitters offering platforms aimed at seasoned data scientists and newcomers interested in working with neural networks. Among the more popular options is TensorFlow, a machine learning library that Google open-sourced a year ago.

In my recent review of TensorFlow, I described the library and discussed its advantages, but only had about 300 words to devote to how to begin using Google’s “secret sauce” for machine learning. That isn’t enough to get you started.

In this article, I’ll give you a very quick gloss on machine learning, introduce you to the basics of TensorFlow, and walk you through a few TensorFlow models in the area of image classification. Then I’ll point you to additional resources for learning and using TensorFlow.

Source: InfoWorld Big Data

The best brains: AI systems that predicted Trump's win

November 11, 2016 by Caroline Craig Posted in Industry Insights & News

The best brains: AI systems that predicted Trump's win

The shock of Donald Trump’s upset victory has begun to wear off. Now the search for answers begins. In particular: How in this age of big data collection and data-crunching analytics could so many polls, economic election models, and surveys–even those by top Republican pollsters—have been so wrong going into election day?

Some got it right—Geda, the mystic monkey from China, and Felix, a Russian polar bear, for starters. A survey of Halloween presidential candidate masks also predicted a Trump presidency, as did “The Simpsons” back in 2000. And there are a lot of Democratic strategists wishing they’d given more credence this past summer to Michael Moore’s analysis of the political landscape, especially in the Rust Belt.

Looking for signs of intelligence

For those who like their predictions brewed with a dash more data, an artificial intelligence system developed by Indian startup Genic.ai successfully predicted not only the Democratic and Republican primaries, but each presidential election since 2004. To come up with its predictions, the MogIA system uses 20 million data points from online platforms such as Google, YouTube, and Twitter to gauge voter engagement.

MogIA found that Trump was topping Barack Obama’s online engagement numbers during the 2008 election by a margin of 25 percent—impressive even after factoring in the greater participation in social media today.

Sanjiv Rai, founder of Genic.ai, admits there are limitations to the data—MogIA can’t always analyze whether a post is positive or negative. Nonetheless, it has been right in predicting that the candidate with the most engagement online wins.

“If you look at the primaries, in the primaries, there were immense amounts of negative conversations that happen with regard to Trump. However, when these conversations started picking up pace, in the final days, it meant a huge game opening for Trump and he won the primaries with a good margin,” Rai told CNBC.

Artificial intelligence has advantages over more traditional data analysis programs. “While most algorithms suffer from programmers/developer’s biases, MoglA aims at learning from her environment, developing her own rules at the policy layer, and developing expert systems without discarding any data,” Rai said. His system could also be improved by more granular data, he told CNBC—for instance, if Google gave MogIA access to the unique internet addresses assigned to each digital device.

“If someone was searching for a YouTube video on how to vote, then looked for a video on how to vote for Trump, this could give the AI a good idea of the voter’s intention,” CNBC wrote. Given the amount of data available online, using social media to predict election results is likely to become increasingly popular.

Still not convinced and wanting to blame James Comey for Clinton’s loss? MogIA predicted a Trump victory before the FBI announced it was examining new Clinton emails.

Answer me this

There are also less data-intensive ways of making accurate predictions. American University professor Allan Lichtman doesn’t rely on social media, poll results, or demographics to predict elections, but he has an even better track record than MogIA: Lichtman has correctly predicted every presidential election since 1984.

Using earthquake prediction methods that gauge stability vs. upheaval, Lichtman says he developed a set of 13 true/false statements that predict elections based on the performance of the party currently in the White House.

“There’s a real theory behind this. And the theory is presidential elections don’t work the way we think they do,” Lichtman told CBSNews. “They’re not decided by the turns of the campaigns, the speeches, the debates, the fundraising. Rather, presidential elections are fundamentally referenda on the performance of the party holding the White House. If that performance is good enough, they get four more years. If it’s not, they’re turned out and the challenging party wins.”

Lichtman says his 13 keys (explained in more depth by the Washington Post) are a historically based system founded on the study of every presidential election from 1860 to 1980. His keys are simply ways of “mathematically and specifically” measuring the incumbent party’s performance based on the following factors:

Party mandate
Contest
Incumbency
Third party
Short-term economy
Long-term economy
Policy change
Social unrest
Scandal
Foreign/military success
Foreign/military failure
Incumbent charisma
Challenger charisma

If six of his statements are false, Lichtman says, the incumbent party loses the presidency.

“Donald Trump’s severe and unprecedented problems bragging about sexual assault and then having 10 or more women coming out and saying, ‘Yes, that’s exactly what you did’—this is without precedent,” Lichtman pointed out in an interview with the Washington Post. “But it didn’t change a key. By the narrowest of possible margins, the keys still point to a Trump victory.”

Here’s predicting that MogIA and Lichtman will be closely watched in the next election—in addition to Geda and Felix, of course.

Source: InfoWorld Big Data

Could Google or Facebook decide an election?

November 10, 2016 by Andrew C Oliver Posted in Industry Insights & News

Could Google or Facebook decide an election?

At this writing, it’s Wednesday morning after the U.S. election. None of my friends is sober, probably including my editor.

I had a different article scheduled originally, which it made the assumption that I’d been wrong all along, because that’s what everyone said. The first article in which I mentioned President Trump posted on Sept. 10, 2015, and covered data analytics in the marijuana industry. Shockingly, both Trump and marijuana won big.

I thought I was being funny. Part of the reason I was sure “President Trump” was a joke was that Facebook kept nagging me to go vote. First, it wanted me to vote early; eventually it wanted me to vote on Election Day. It wasn’t only Facebook—my Android phone kept nagging me to vote. (You’d think it would have noticed that I’d already voted or at least hung out at one of the polling places it offered to find for me, but whatever.)

This made me think. With the ubiquity of Google and Facebook, could they eventually decide elections? Politics are regional. In my state, North Carolina, if you turn out votes in the center of the state it goes Democratic. If you turn out votes in the east and west, it goes Republican. Political operatives have geographically targeted voters in this manner for years, but they have to pay to get in your face. Google and Facebook are already there.

What if instead of telling everyone to vote, they were to target voters by region? Let’s say Google and Facebook support a fictitious party we’ll call Fuchsia. In districts that swing heavily Fuchsia, they push notifications saying “go vote.” In districts that go for the other guys, they simply don’t send vote notifications and ads and instead provide scant information on polling station locations. That alone could swing some areas.

Targeted notifications could have an even more dramatic effect in districts that could go either way. Google and Facebook collect tons of psychometric data; Facebook even got caught doing it. Facebook and Google don’t only know what you “like” but what you hate and what you fear. Existing political operations know this too, but Google and Facebook have it at a much much more granular level.

To go a step further, what if Facebook manipulated your feed to increase your fear level if fear is the main reason you vote? What if your personalized Google News focused on your candidates’ positives or negatives depending on whether they want you to stay home or go to the polls? In fact, if you incorporate search technology against current events and the news, you could even have articles on other topics that passively mention either your candidate or the candidate you fear.

The point I’m trying to make is that the same technology used to manipulate you into buying stuff can be used to manipulate how or if you vote. We’re still a little away from this, but not far. Even a small amount of targeting could turn a close vote in a key state.

Source: InfoWorld Big Data

Bare Metal Servers and Cloud Server Hosting

Monthly Archives: November 2016
Home / 2016 / November

Who took the 'no' out of NoSQL?

NoSQL: Wrong name, right idea

Convergence interrupted

Winner take some

AeroVironment's Quantix drone is all about the data

10 things you need to worry about in 2017

Hadoop distributions

Hadoop vendors

Oracle

Databricks

Deregulation

Game consoles

Yet another hacking scandal

The economy

Telco-cable mergers

Your math skills

Hadoop, we hardly knew ye

Is Hadoop and the cloud a thing of the past?

Change is constant, but should it be?

Review: Spark lights up machine learning

How IBM's Watson will change cybersecurity

Deep learning is already altering your reality

Get started with TensorFlow

The best brains: AI systems that predicted Trump's win

Looking for signs of intelligence

Answer me this

Could Google or Facebook decide an election?