IBM sets up a machine learning pipeline for z/OS

IBM sets up a machine learning pipeline for z/OS

If you’re intrigued by IBM’s Watson AI as a service, but reluctant to trust IBM with your data, Big Blue has a compromise. It’s packaging Watson’s core machine learning technology as an end-to-end solution available behind your firewall.

Now the bad news: It’ll only be available to z System / z/OS mainframe users … for now.

From start to finish

IBM Machine Learning for z/OS  isn’t a single machine learning framework. It’s  a collection of popular frameworks — in particular Apache SparkML, TensorFlow, and H2O — packaged with bindings to common languages used in the trade (Python, Java, Scala), and with support for “any transactional data type.” IBM is pushing it as a pipeline for building, managing, and running machine learning models through visual tools for each step of the process and RESTful APIs for deployment and management.

There’s a real need for this kind of convenience. Even as the number of frameworks for machine learning mushrooms, developers still have to perform a lot of heavy labor to create end-to-end production pipelines for training and working with models. This is why Baidu outfitted its PaddlePaddle deep learning framework with support for Kubernetes; in time the arrangement could serve as the underpinning for a complete solution that would cover every phase of machine learning.

Other components in IBM Machine Learning fit into this overall picture. The Cognitive Automation for Data Scientists element “assists data scientists in choosing the right algorithm for the data by scoring their data against the available algorithms and providing the best match for their needs,” checking metrics like performance and fitness to task for a given algorithm and workload.

Another function “schedule[s] continuous re-evaluations on new data to monitor model accuracy over time and be alerted when performance deteriorates.” Models trained on data, rather than algorithms themselves, are truly crucial in any machine learning deployment, so IBM’s wise to provide such utilities.

z/OS for starters; Watson it ain’t

The decision to limit the offering to z System machines for now makes the most sense as part of a general IBM strategy where machine learning advances are paired directly with branded hardware offerings. IBM’s PowerAI system also pairs custom IBM hardware — in this case, the Power8 processor — with commodity Nvidia GPUs to train models at high speed. In theory, PowerAI devices could run side by side with a mix of other, more mainstream hardware as part of an overall machine learning hardware array.

The z/OS incarnation of IBM Machine Learning is aimed at an even higher and narrower market: existing z/OS customers with tons of on-prem data. Rather than ask those (paying) customers to connect to something outside of their firewalls, IBM offers them first crack at tooling to help them get more from the data. The wording of IBM’s announcement — “initially make [IBM Machine Learning] available [on z/OS]” — implies that other targets are possible later on.

It’s also premature to read this as “IBM Watson behind the firewall,” since Watson’s appeal isn’t the algorithms themselves or the workflow IBM’s put together for them, but rather the volumes of pretrained data assembled by IBM, packaged into models and deployed through APIs. Those will remain exactly where IBM can monetize them best: behind its own firewall of IBM Watson as a service.

Source: InfoWorld Big Data

HPE acquires security startup Niara to boost its ClearPass portfolio

HPE acquires security startup Niara to boost its ClearPass portfolio

Hewlett Packard Enterprise has acquired Niara, a startup that uses machine learning and big data analytics on enterprise packet streams and log streams to detect and protect customers from advanced cyberattacks that have penetrated perimeter defenses.

The financial terms of the deal were not disclosed.

Operating in the User and Entity Behavior Analytics (UEBA) market, Niara’s technology starts by automatically establishing baseline characteristics for all users and devices across the enterprise and then looking for anomalous, inconsistent activities that may indicate a security threat, Keerti Melkote, senior vice president and general manager of HPE Aruba and cofounder of Aruba Networks, wrote in a blog post on Wednesday.

The time taken to investigate individual security incidents has been reduced from up to 25 hours using manual processes to less than a minute by using machine learning, Melkote added. 

Hewlett Packard acquired wireless networking company Aruba Networks in May 2015, ahead of its corporate split into HPE, an enterprise-focused business and HP, a business focused on PCs and printers.

The strategy now is to integrate Niara’s behavioral analytics technology with Aruba’s ClearPass Policy Manager, a role and device-based network access control platform, so as to to offer customers advanced threat detection and prevention for network security in wired and wireless environments, and internet of things (IoT) devices, Melkote wrote.

For Niara’s CEO Sriram Ramachandran and Vice President for Engineering Prasad Palkar and several other engineers it is a homecoming. They are part of the team that developed the core technologies in the ArubaOS operating system.

Niara technology addresses the need to monitor a device after it is on the internal network, following authentication by a network access control platform like ClearPass. Niara claims that it detects compromised users, systems or devices by aggregating and putting into context even subtle changes in typical IT access and usage.

Most networks today allow the traffic to flow freely between source and destination once devices are on the network, with internal controls, such as Access Control Lists, used to protect some types of traffic, while others flow freely, Melkote wrote.

“More importantly, none of this traffic is analyzed to detect advanced attacks that have penetrated perimeter security systems and actively seek out weaknesses to exploit on the interior network,” she added.

Source: InfoWorld Big Data

New big data tools for machine learning spring from home of Spark and Mesos

New big data tools for machine learning spring from home of Spark and Mesos

If the University of California, Berkeley’s AMPLab doesn’t ring bells, perhaps some of its projects will: Spark and Mesos.

AMPLab was planned all along as a five-year computer science research initiative, and it closed down as of last November after running its course. But a new lab is opening in its wake: RISELab, another five-year project at UC Berkeley with major financial backing and the stated goal of “focus[ing] intensely for five years on systems that provide Real-time Intelligence with Secure Execution [RISE].”

AMPLab was created with “a vision of understanding how machines and people could come together to process or to address problems in data — to use data to train rich models, to clean data, and to scale these things,” said Joseph E. Gonzalez, Assistant Professor in the Department of Electrical Engineering and Computer Science at UC Berkeley.

RISELab’s web page describes the group’s mission as “a proactive step to move beyond big data analytics into a more immersive world,” where “sensors are everywhere, AI is real, and the world is programmable.” One example cited: Managing the data infrastructure around “small, autonomous aerial vehicles,” whether unmanned drones or flying cars, where the data has to be processed securely at high speed.

Other big challenges Gonzalez singled out include security, but not the conventional focus on access controls. Rather, it involves concepts like “homomorphic” encryption, where encrypted data can be worked without first having to decrypt it. “How can we make predictions on data in the cloud,” said Gonzalez, “without the cloud understanding what it is it’s making predictions about?”

Though the lab is in its early days, a few projects have already started to emerge:

Clipper

Machine learning involves two basic kinds of work: Creating models from which predictions can be derived and serving up those predictions from the models. Clipper focuses on the second task and is described as a “general-purpose low-latency prediction serving system” that takes predictions from machine learning frameworks and serves them up with minimal latency.

Clipper has three aims that ought to draw the attention of anyone working with machine learning: One, it accelerates serving up predictions from a trained model. Two, it provides an abstraction layer across multiple machine learning frameworks, so a developer only has to program to a single API. Three, Clipper’s design makes it possible to respond dynamically to how individual models respond to requests — for instance, to allow a given model that works better for a particular class of problem to receive priority. Right now there’s no explicit mechanism for this, but it is a future possibility.

Opaque

It seems fitting that a RISELab projects would complement work done by AMPLab, and one does: Opaque works with Apache Spark SQL to enable “very strong security for DataFrames.” It uses Intel SGX processor extensions to allow DataFrames to be marked as encrypted and have all their operations performed within an “SGX enclave,” where data is encrypted in-place using the AES algorithm and is only visible to the application using it via hardware-level protection.

Gonzalez says this delivers the benefits of homomorphic encryption without the performance cost. The performance hit for using SGX is around 50 percent, but the fastest current implementations of homomorphic algorithms run 20,000 times slower. On the other hand, SGX-enabled processors are not yet offered in the cloud, although Gonzalez said this is slated to happen “in the near future.” The biggest stumbling block, though, may be the implementation, since in order for this to work, “you have to trust Intel,” as Gonzalez pointed out.

Ground

Ground is a context management system for data lakes. It provides a mechanism, implemented as a RESTful service in Java, that “enables users to reason about what data they have, where that data is flowing to and from, who is using the data, when the data changed, and why and how the data is changing.”

Gonzalez noted that data aggregation has moved away from strict, data-warehouse-style governance and toward “very open and flexible data lakes,” but that makes it “hard to track how the data came to be.” In some ways, he pointed out, knowing who changed a given set of data and how it was changed can be more important than the data itself. Ground provides a common API and meta model for track such information, and it works with many data repositories. (The Git version control system, for instance, is one of the supported data formats in the early alpha version of the project.)

Gonzalez admitted that defining RISELab’s goals can be tricky, but he noted that “at its core is this transition from how we build advanced analytics models, how we analyze data, to how we use that insight to make decisions — connecting the products of Spark to the world, the products of large-scale analytics.”

Source: InfoWorld Big Data

Review: The best frameworks for machine learning and deep learning

Review: The best frameworks for machine learning and deep learning

Over the past year I’ve reviewed half a dozen open source machine learning and/or deep learning frameworks: Caffe, Microsoft Cognitive Toolkit (aka CNTK 2), MXNet, Scikit-learn, Spark MLlib, and TensorFlow. If I had cast my net even wider, I might well have covered a few other popular frameworks, including Theano (a 10-year-old Python deep learning and machine learning framework), Keras (a deep learning front end for Theano and TensorFlow), and DeepLearning4j (deep learning software for Java and Scala on Hadoop and Spark). If you’re interested in working with machine learning and neural networks, you’ve never had a richer array of options.  

There’s a difference between a machine learning framework and a deep learning framework. Essentially, a machine learning framework covers a variety of learning methods for classification, regression, clustering, anomaly detection, and data preparation, and it may or may not include neural network methods. A deep learning or deep neural network (DNN) framework covers a variety of neural network topologies with many hidden layers. These layers comprise a multistep process of pattern recognition. The more layers in the network, the more complex the features that can be extracted for clustering and classification.

Source: InfoWorld Big Data