Dimension Data Launches Managed Cloud Service For Microsoft

Dimension Data Launches Managed Cloud Service For Microsoft

Dimension Data has announced the availability of its Managed Cloud Services for Microsoft. The new offering provides organizations with a cloud-based managed service for Microsoft Exchange, SharePoint, Skype for Business, and Office 365, whether deployed in the public cloud, on premises, in a private cloud or as a hybrid model. Combined with its planning and deployment services, Dimension Data is now able to provide its clients with a complete end-to-end managed service, with the added benefits of meeting client-specific security and compliance.

Based on statistics presented during Microsoft’s 2016 Q3 earnings call, there were over 70 million commercial Microsoft Office 365 monthly active users. In addition, *Microsoft Office 365 is one of the fastest growing areas of technology in the enterprise sector today.

According to Tony Walt, group executive of Dimension Data’s end-user computing business, for these productivity solutions to be effective in an enterprise environment, they need to be managed and supported. However, managing the administrative complexity, while at the same time extracting the full value from each of the applications, is a challenge for CIOs and enterprises today.

“With the feature-rich cloud productivity suite of applications increasingly becoming the foundation for organizations transitioning to a digital business, Dimension Data’s Managed Cloud Services for Microsoft and the software-as-a-service Cloud Control management platform is a game changer. It is revolutionizing the automation and management of enterprise Microsoft messaging and collaboration.  Managed Cloud Services for Microsoft provides the services and technology you need to plan, deploy and manage your Microsoft messaging and collaboration suite, ensuring that the complexity inherent in an integrated environment is seamlessly addressed, while also guaranteeing you have the flexibility to grow and adapt to the migration to cloud,” says Walt.

Managed Cloud Services for Microsoft is a solution that manages the automation and management of enterprise Microsoft messaging and collaboration applications. Cloud Control™, Dimension Data’s administrative platform and portal enables help desk representatives to perform tasks to resolve issues that would normally require escalation to second or third level administrators, delivering reduced costs and ensuring customer satisfaction.

Managed Cloud Services for Microsoft builds on Dimension Data’s expertise with Microsoft’s productivity solutions, with the company having deployed more than one million seats of Office 365 globally, over 1.5 million seats of Exchange on-premise, and another two million cloud-based Exchange seats. Dimension Data has also completed more than 400 SharePoint projects and over 500 Skype for Business projects globally.

“We have combined more than 25 years of expertise in delivering Microsoft solutions and managed services with our own intellectual property and management tools,” said Phil Aldrich, director of end-user computing, Dimension Data. “Managed Cloud Services for Microsoft provides our clients with the best of both worlds, delivering enterprise performance and management for their Microsoft workloads with the flexibility and scalability of cloud.”

The service is being rolled out globally to address the needs of Dimension Data’s worldwide client base.

Source: CloudStrategyMag

Report: Nearly One-Third Of Consumers Aren’t Aware They Use The Cloud

Report: Nearly One-Third Of Consumers Aren’t Aware They Use The Cloud

There is confusion over what qualifies as ‘the cloud,’ according to a new consumer survey. Despite marking that they use at least one of several popular cloud-based applications, such as Google Drive, Dropbox or Microsoft OneDrive, over 30% of consumers subsequently responded that they do not use or access information in the cloud. Clutch designed the survey to gauge consumers’ knowledge and habits regarding cloud usage.

When it comes to understanding which applications are part of the cloud, experts say that the confusion is understandable.

“From private cloud, managed private cloud, to in-house and public cloud, there are many different technologies which can be referred to as cloud, but are very general,” said Alexander Martin-Bale, director of Cloud and Data Platforms at adaware, an anti-spyware and anti-virus software program. “The reality is that knowing exactly when you’re using it, even for a technical professional, is not always simple.”

Over half of the respondents (55%) say they are “very” or “somewhat” confident in their cloud knowledge. However, 22% of respondents who do consider themselves very confident in their cloud knowledge did not know or were unsure about whether or not they use the cloud.

Lucas Roh, CEO at Bigstep, says this is likely due to the overuse of the cloud as a buzzword. “It boils down to the fact that the word ‘cloud’ has been used everywhere in the press,” he said. “People have heard about it, and think that they conceptually know how it works, even if they don’t… They’re only thinking in terms of the application being used, not the actual technology behind those applications.”

When it comes to the security of the cloud, 42% of respondents believe the responsibility falls equally on the cloud provider and user.

Chris Steffen, technical director at Cryptzone, says there has been a shift in how people view the security of the cloud. “I think the dynamic is changing along with the information security paradigm,” said Steffen. “People are realizing ­— ‘Hey, maybe I do need to change my password every five years,’ or something similar. You can’t expect everything to be secure forever.”

Based on the findings, Clutch recommends that consumers seek more education on cloud computing, as well as implement simple additional security measures, such as two-factor authentication.

These and other security measures are increasingly important as the cloud becomes more ubiquitous. “The cloud is not going anywhere. If anything, it’s going to become more and more an integral part of the stuff that we do every single day, whether we know that we’re using it or not,” said Steffen.

Clutch surveyed 1,001 respondents across the United States. All respondents indicated that they use one of the following applications: iCloud, Google Drive, Dropbox, Box, Microsoft OneDrive, iDrive, and Amazon Cloud Drive.

 

Source: CloudStrategyMag

5 Python libraries to lighten your machine learning load

5 Python libraries to lighten your machine learning load

Machine learning is exciting, but the work is complex and difficult. It typically involves a lot of manual lifting — assembling workflows and pipelines, setting up data sources, and shunting back and forth between on-prem and cloud-deployed resources.

The more tools you have in your belt to ease that job, the better. Thankfully, Python is a giant tool belt of a language that’s widely used in big data and machine learning. Here are five Python libraries that help relieve the heavy lifting for those trades.

PyWren

A simple package with a powerful premise, PyWren lets you run Python-based scientific computing workloads as multiple instances of AWS Lambda functions. A profile of the project at The New Stack describes PyWren using AWS Lambda as a giant parallel processing system, tackling projects that can be sliced and diced into little tasks that don’t need a lot of memory or storage to run.

One downside is that lambda functions can’t run for more than 300 seconds max. But if you need a job that takes only a few minutes to complete and need to run it thousands of times across a data set, PyWren may be a good option to parallelize that work in the cloud at a scale unavailable on user hardware.

Tfdeploy

Google’s TensorFlow framework is taking off big-time now that it’s at a full 1.0 release. One common question about it: How can I make use of the models I train in TensorFlow without using TensorFlow itself?

Tfdeploy is a partial answer to that question. It exports a trained TensorFlow model to “a simple NumPy-based callable,” meaning the model can be used in Python with Tfdeploy and the the NumPy math-and-stats library as the only dependencies. Most of the operations you can perform in TensorFlow can also be performed in Tfdeploy, and you can extend the behaviors of the library by way of standard Python metaphors (such as overloading a class).

Now the bad news: Tfdeploy doesn’t support GPU acceleration, if only because NumPy doesn’t do that. Tfdeploy’s creator suggests using the gNumPy project as a possible replacement.

Luigi

Writing batch jobs is generally only one part of processing heaps of data; you also have to string all the jobs together into something resembling a workflow or a pipeline. Luigi, created by Spotify and named for the other plucky plumber made famous by Nintendo, was built to “address all the plumbing typically associated with long-running batch processes.”

With Luigi, a developer can take several different unrelated data processing tasks — “a Hive query, a Hadoop job in Java, a Spark job in Scala, dumping a table from a database” — and create a workflow that runs them, end to end. The entire description of a job and its dependencies are created as Python modules, not as XML config files or another data format, so it can be integrated into other Python-centric projects.

Kubelib

If you’re adopting Kubernetes as an orchestration system for machine learning jobs, the last thing you want is for the mere act of using Kubernetes to create more problems than it solves. Kubelib provides a set of Pythonic interfaces to Kubernetes, originally to aid with Jenkins scripting. But it can be used without Jenkins as well, and it can do everything exposed through the kubectl CLI or the Kubernetes API.

PyTorch

Let’s not forget about this recent and high-profile addition to the Python world, an implementation of the Torch machine learning framework. PyTorch doesn’t only port Torch to Python, but adds many other conveniences, such as GPU acceleration and a library that allows multiprocessing to be done with shared memory (for partitioning jobs across multiple cores). Best of all, it can provide GPU-powered replacements for some of the unaccelerated functions in NumPy.

Source: InfoWorld Big Data

IDG Contributor Network: Bringing embedded analytics into the 21st century

IDG Contributor Network: Bringing embedded analytics into the 21st century

Software development has changed pretty radically over the last decade. Waterfall is out, Agile is in. Slow release cycles are out, continuous deployment is in. Developers avoid scaling up and scale out instead. Proprietary integration protocols have (mostly) given way to open standards.

At the same time, exposing analytics to customers in your application has gone from a rare, premium offering to a requirement. Static reports and SOAP APIs that deliver XML files just don’t cut it anymore.

And yet, the way that most embedded analytics systems are designed is basically the same as it was 10 years ago: Inflexible, hard to scale, lacking modern version control, and reliant on specialized, expensive hardware.

Build or Buy?

It’s no wonder that today’s developers often choose to build embedded analytics system in-house. Developers love a good challenge, so when faced with the choice between an outdated, off-the-shelf solution and building for themselves, they’re going to get to work.

But expectations for analytics have increased, and so even building out the basic functionality that customers demand can sidetrack engineers (whose time isn’t cheap) for months. This is to say nothing of the engineer-hours required to maintain a homegrown system down the line. I simply don’t believe that building it yourself is the right solution unless analytics is your core product.

So what do you do?

Honestly, I’m not sure. Given the market opportunity, I think it’s inevitable that more and more vendors will move into the space and offer modern solutions. And so I thought I’d humbly lay out 10 questions embedded analytic buyers should ask about the solutions they’re evaluating.

  1. How does the solution scale as data volumes grow? Does it fall down or require summarization when dealing with big data?
  2. How does the tool scale to large customer bases? Is supporting 1,000 customers different than supporting 10?
  3. Do I need to maintain specialized ETLs and data ingestion flows for each customer? What if I want to change the ETL behavior? How hard is that?
  4. What’s the most granular level that customers can drill to?
  5. Do I have to pay to keep duplicated data in a proprietary analytics engine? If so, how much latency does that introduce? How do things stay in sync?
  6. Can I make changes to the content and data model myself or is the system a black box where every change requires support or paid professional services?
  7. Does it use modern, open frameworks like HTML5, Javascript, iFrame, HTTPS and RESTful APIs?
  8. Does the platform offer version control? If so, which parts of the platform (data, data model, content, etc.) are covered by version control?
  9. How customizable is the front-end? Can fonts, color palettes, language, timezones, logos, and caching behavior all be changed? Can customization be done on a customer-by-customer basis or is it one template for all customers?
  10. How much training is required for admins and developers? And how intuitive is the end-user interface?

No vendor that I know of has the “right” answer to all these questions (yet), but they should be taking these issues seriously and working toward these goals.

If they’re not, you can bet your engineers are going to start talking about how they could build something better in a week. HINT: They actually can’t, but good luck winning that fight 😉

This article is published as part of the IDG Contributor Network. Want to Join?

Source: InfoWorld Big Data

6 reasons stores can't give you real-time offers (yet)

6 reasons stores can't give you real-time offers (yet)

Like most hardcore people, in the car I roll with my windows down and my radio cranked up to 11—tuned to 91.5, my local NPR station, where Terry Gross recently interviewed Joseph Turow, author of “The Aisles Have Eyes.” Turow reports that retailers are using data gathered from apps on your phone and other information to change prices on the fly.

Having worked in this field for a while, I can tell you that, yes, they’re gathering any data they can get. But the kind of direct manipulation Turow claims, where the price changes on the shelf before your eyes, isn’t yet happening on a wide scale. (Full disclosure: I’m employed by LucidWorks, which offers personalized/targeted search and machine-learning-assisted search as features in products we sell.)

Why not? I can think of a number of reasons.

1. Technology changes behavior slowly

Printers used to be a big deal. There were font and typesetting wars (TrueType, PostScript, and so on), and people printed out pages simply to read comfortably. After all, screen resolutions were low and interfaces were clunky; scanners were cumbersome and email was unreliable. Yet even after these obstacles were overcome, the old ways stuck around. There are still paper books (I mailed all of mine to people in prison), and the government still makes me print things and even get them notarized sometimes.

Obviously, change happens: I now tend to use Uber even if a cab is waiting, and I don’t bother to check the price difference, regardless of surge status. Also, today I buy all my jeans from Amazon—yet still use plastic cards for payment. The clickstream data collected on me is mainly used for email marketing and ad targeting, as opposed to real-time sales targeting.

2. Only some people can be influenced

For years I put zero thought into my hand soap purchase because my partner bought it. Then I split with my partner and became a soap buyer again. I did some research and found a soap that didn’t smell bad, didn’t have too many harsh chemicals, and played lip service to the environment. Now, to get me to even try something else you’d probably have to give it to me for free. I’m probably not somebody a soap company wants to bother with. I’m not easily influenced.

I’m more easily influenced in other areas—such as cycling and fitness stuff—but those tend to be more expensive, occasional purchases. To reach me the technique needs to be different than pure retailing.

3. High cost for marginal benefit

Much personalization technology, such as the analytics behind real-time discounts, is still expensive to deploy. Basic techniques such as using my interests or previously clicked links to improve the likelihood of my making a purchase are probably “effective enough” for most online retailers.

As for brick and mortar, I have too many apps on my phone already, so getting me to download yours will require a heavy incentive. I also tend to buy only one item because I forgot to buy it online—then I leave—so the cost to overcome my behavioral inertia and influence me will be high.

4. Pay to play

Business interests limit the effectiveness of analytics in influencing consumers, mainly in the form of slotting fees charged to suppliers who want preferential product placement in the aisles.

Meanwhile, Target makes money no matter what soap I buy there. Unless incentivized, it’s not going to care which brand I choose. Effective targeting may require external data (like my past credit card purchases at other retailers) and getting that data may be expensive. The marketplace for data beyond credit card purchases is still relatively immature and fragmented.

5. Personalization is difficult at scale

For effective personalization, you must collect or buy data on everything I do everywhere and store it. You need to run algorithms against that data to model my behavior. You need to identify different means of influencing me. Some of this is best done for a large group (as in the case of product placement), but doing it for individuals requires lots of experimentation and tuning—and it needs to be done fast.

Plus, it needs to be done right. If you bug me too much, I’m totally disabling or uninstalling your app (or other means of contacting me). You need to make our relationship bidirecitonal. See yourself as my concierge, someone who finds me what I need and anticipates those needs rather than someone trying to sell me something. That gets you better data and stops you from getting on my nerves. (For the last time, Amazon, I’ve already purchased an Instant Pot, and it will be years before I buy another pressure cooker. Stop following me around the internet with that trash!)

6. Machine learning needs to mature

Machine learning is merely math; much of it isn’t even new. But applying it to large amounts of behavioral data—where you have to decide which algorithm to use, which optimizations to apply to that algorithm, and which behavioral data you need in order to apply it—is pretty new. Most retailers are used to buying out-of-the-box solutions. Beyond (ahem) search, some of these barely exist yet, so you’re stuck rolling your own. Hiring the right expertise is expensive and fraught with error.

Retail reality

To influence a specific, individual consumer who walks into a physical store, the cost is high and the effectiveness is low. That’s why most brick-and-mortar businesses tend to use advanced data—such as how much time people spend in which part of the store and what products influenced that decision—at a more statistical level to make systemic changes and affect ad and product placement.

Online retailers have a greater opportunity to influence people at a personal level, but most of that opportunity is in ad placement, feature improvements, and (ahem) search optimization. As for physical stores, eventually, you may well see a price drop before your eyes as some massive cloud determines the tipping point for you to buy on impulse. But don’t expect it to happen anytime soon.

Source: InfoWorld Big Data

IBM sets up a machine learning pipeline for z/OS

IBM sets up a machine learning pipeline for z/OS

If you’re intrigued by IBM’s Watson AI as a service, but reluctant to trust IBM with your data, Big Blue has a compromise. It’s packaging Watson’s core machine learning technology as an end-to-end solution available behind your firewall.

Now the bad news: It’ll only be available to z System / z/OS mainframe users … for now.

From start to finish

IBM Machine Learning for z/OS  isn’t a single machine learning framework. It’s  a collection of popular frameworks — in particular Apache SparkML, TensorFlow, and H2O — packaged with bindings to common languages used in the trade (Python, Java, Scala), and with support for “any transactional data type.” IBM is pushing it as a pipeline for building, managing, and running machine learning models through visual tools for each step of the process and RESTful APIs for deployment and management.

There’s a real need for this kind of convenience. Even as the number of frameworks for machine learning mushrooms, developers still have to perform a lot of heavy labor to create end-to-end production pipelines for training and working with models. This is why Baidu outfitted its PaddlePaddle deep learning framework with support for Kubernetes; in time the arrangement could serve as the underpinning for a complete solution that would cover every phase of machine learning.

Other components in IBM Machine Learning fit into this overall picture. The Cognitive Automation for Data Scientists element “assists data scientists in choosing the right algorithm for the data by scoring their data against the available algorithms and providing the best match for their needs,” checking metrics like performance and fitness to task for a given algorithm and workload.

Another function “schedule[s] continuous re-evaluations on new data to monitor model accuracy over time and be alerted when performance deteriorates.” Models trained on data, rather than algorithms themselves, are truly crucial in any machine learning deployment, so IBM’s wise to provide such utilities.

z/OS for starters; Watson it ain’t

The decision to limit the offering to z System machines for now makes the most sense as part of a general IBM strategy where machine learning advances are paired directly with branded hardware offerings. IBM’s PowerAI system also pairs custom IBM hardware — in this case, the Power8 processor — with commodity Nvidia GPUs to train models at high speed. In theory, PowerAI devices could run side by side with a mix of other, more mainstream hardware as part of an overall machine learning hardware array.

The z/OS incarnation of IBM Machine Learning is aimed at an even higher and narrower market: existing z/OS customers with tons of on-prem data. Rather than ask those (paying) customers to connect to something outside of their firewalls, IBM offers them first crack at tooling to help them get more from the data. The wording of IBM’s announcement — “initially make [IBM Machine Learning] available [on z/OS]” — implies that other targets are possible later on.

It’s also premature to read this as “IBM Watson behind the firewall,” since Watson’s appeal isn’t the algorithms themselves or the workflow IBM’s put together for them, but rather the volumes of pretrained data assembled by IBM, packaged into models and deployed through APIs. Those will remain exactly where IBM can monetize them best: behind its own firewall of IBM Watson as a service.

Source: InfoWorld Big Data

HPE acquires security startup Niara to boost its ClearPass portfolio

HPE acquires security startup Niara to boost its ClearPass portfolio

Hewlett Packard Enterprise has acquired Niara, a startup that uses machine learning and big data analytics on enterprise packet streams and log streams to detect and protect customers from advanced cyberattacks that have penetrated perimeter defenses.

The financial terms of the deal were not disclosed.

Operating in the User and Entity Behavior Analytics (UEBA) market, Niara’s technology starts by automatically establishing baseline characteristics for all users and devices across the enterprise and then looking for anomalous, inconsistent activities that may indicate a security threat, Keerti Melkote, senior vice president and general manager of HPE Aruba and cofounder of Aruba Networks, wrote in a blog post on Wednesday.

The time taken to investigate individual security incidents has been reduced from up to 25 hours using manual processes to less than a minute by using machine learning, Melkote added. 

Hewlett Packard acquired wireless networking company Aruba Networks in May 2015, ahead of its corporate split into HPE, an enterprise-focused business and HP, a business focused on PCs and printers.

The strategy now is to integrate Niara’s behavioral analytics technology with Aruba’s ClearPass Policy Manager, a role and device-based network access control platform, so as to to offer customers advanced threat detection and prevention for network security in wired and wireless environments, and internet of things (IoT) devices, Melkote wrote.

For Niara’s CEO Sriram Ramachandran and Vice President for Engineering Prasad Palkar and several other engineers it is a homecoming. They are part of the team that developed the core technologies in the ArubaOS operating system.

Niara technology addresses the need to monitor a device after it is on the internal network, following authentication by a network access control platform like ClearPass. Niara claims that it detects compromised users, systems or devices by aggregating and putting into context even subtle changes in typical IT access and usage.

Most networks today allow the traffic to flow freely between source and destination once devices are on the network, with internal controls, such as Access Control Lists, used to protect some types of traffic, while others flow freely, Melkote wrote.

“More importantly, none of this traffic is analyzed to detect advanced attacks that have penetrated perimeter security systems and actively seek out weaknesses to exploit on the interior network,” she added.

Source: InfoWorld Big Data

New big data tools for machine learning spring from home of Spark and Mesos

New big data tools for machine learning spring from home of Spark and Mesos

If the University of California, Berkeley’s AMPLab doesn’t ring bells, perhaps some of its projects will: Spark and Mesos.

AMPLab was planned all along as a five-year computer science research initiative, and it closed down as of last November after running its course. But a new lab is opening in its wake: RISELab, another five-year project at UC Berkeley with major financial backing and the stated goal of “focus[ing] intensely for five years on systems that provide Real-time Intelligence with Secure Execution [RISE].”

AMPLab was created with “a vision of understanding how machines and people could come together to process or to address problems in data — to use data to train rich models, to clean data, and to scale these things,” said Joseph E. Gonzalez, Assistant Professor in the Department of Electrical Engineering and Computer Science at UC Berkeley.

RISELab’s web page describes the group’s mission as “a proactive step to move beyond big data analytics into a more immersive world,” where “sensors are everywhere, AI is real, and the world is programmable.” One example cited: Managing the data infrastructure around “small, autonomous aerial vehicles,” whether unmanned drones or flying cars, where the data has to be processed securely at high speed.

Other big challenges Gonzalez singled out include security, but not the conventional focus on access controls. Rather, it involves concepts like “homomorphic” encryption, where encrypted data can be worked without first having to decrypt it. “How can we make predictions on data in the cloud,” said Gonzalez, “without the cloud understanding what it is it’s making predictions about?”

Though the lab is in its early days, a few projects have already started to emerge:

Clipper

Machine learning involves two basic kinds of work: Creating models from which predictions can be derived and serving up those predictions from the models. Clipper focuses on the second task and is described as a “general-purpose low-latency prediction serving system” that takes predictions from machine learning frameworks and serves them up with minimal latency.

Clipper has three aims that ought to draw the attention of anyone working with machine learning: One, it accelerates serving up predictions from a trained model. Two, it provides an abstraction layer across multiple machine learning frameworks, so a developer only has to program to a single API. Three, Clipper’s design makes it possible to respond dynamically to how individual models respond to requests — for instance, to allow a given model that works better for a particular class of problem to receive priority. Right now there’s no explicit mechanism for this, but it is a future possibility.

Opaque

It seems fitting that a RISELab projects would complement work done by AMPLab, and one does: Opaque works with Apache Spark SQL to enable “very strong security for DataFrames.” It uses Intel SGX processor extensions to allow DataFrames to be marked as encrypted and have all their operations performed within an “SGX enclave,” where data is encrypted in-place using the AES algorithm and is only visible to the application using it via hardware-level protection.

Gonzalez says this delivers the benefits of homomorphic encryption without the performance cost. The performance hit for using SGX is around 50 percent, but the fastest current implementations of homomorphic algorithms run 20,000 times slower. On the other hand, SGX-enabled processors are not yet offered in the cloud, although Gonzalez said this is slated to happen “in the near future.” The biggest stumbling block, though, may be the implementation, since in order for this to work, “you have to trust Intel,” as Gonzalez pointed out.

Ground

Ground is a context management system for data lakes. It provides a mechanism, implemented as a RESTful service in Java, that “enables users to reason about what data they have, where that data is flowing to and from, who is using the data, when the data changed, and why and how the data is changing.”

Gonzalez noted that data aggregation has moved away from strict, data-warehouse-style governance and toward “very open and flexible data lakes,” but that makes it “hard to track how the data came to be.” In some ways, he pointed out, knowing who changed a given set of data and how it was changed can be more important than the data itself. Ground provides a common API and meta model for track such information, and it works with many data repositories. (The Git version control system, for instance, is one of the supported data formats in the early alpha version of the project.)

Gonzalez admitted that defining RISELab’s goals can be tricky, but he noted that “at its core is this transition from how we build advanced analytics models, how we analyze data, to how we use that insight to make decisions — connecting the products of Spark to the world, the products of large-scale analytics.”

Source: InfoWorld Big Data

Review: The best frameworks for machine learning and deep learning

Review: The best frameworks for machine learning and deep learning

Over the past year I’ve reviewed half a dozen open source machine learning and/or deep learning frameworks: Caffe, Microsoft Cognitive Toolkit (aka CNTK 2), MXNet, Scikit-learn, Spark MLlib, and TensorFlow. If I had cast my net even wider, I might well have covered a few other popular frameworks, including Theano (a 10-year-old Python deep learning and machine learning framework), Keras (a deep learning front end for Theano and TensorFlow), and DeepLearning4j (deep learning software for Java and Scala on Hadoop and Spark). If you’re interested in working with machine learning and neural networks, you’ve never had a richer array of options.  

There’s a difference between a machine learning framework and a deep learning framework. Essentially, a machine learning framework covers a variety of learning methods for classification, regression, clustering, anomaly detection, and data preparation, and it may or may not include neural network methods. A deep learning or deep neural network (DNN) framework covers a variety of neural network topologies with many hidden layers. These layers comprise a multistep process of pattern recognition. The more layers in the network, the more complex the features that can be extracted for clustering and classification.

Source: InfoWorld Big Data

SAP adds new enterprise information management

SAP adds new enterprise information management

SAP yesterday renewed its enterprise information management (EIM) portfolio with a series of updates aimed at helping organizations better manage, govern and strategically use and control their data assets.

“By effectively managing enterprise data to deliver trusted, complete and relevant information, organizations can ensure data is always actionable to gain business insight and drive innovation,” says Philip On, vice president of Product Marketing at SAP.

The additions to the EIM portfolio are intended to provide customers with enhanced support and connectivity for big data sources, improved data stewardship and metadata management capabilities and a pay-as-you-go cloud data quality service, he adds.

The updates to the EIM portfolio include the following features:

  • SAP Data Services. Providing extended support and connectivity for integrating and loading large and diverse data types, SAP Data Services includes a data extraction capability for fast data transfer from Google BigQuery to data processing systems like Hadoop, SAP HANA Vora, SAP IQ, SAP HANA and other cloud storage. Other enhancements include optimizing data extraction from a HIVE table using Spark and new connectivity support for Amazon Redshift and Apache Cassandra.
  • SAP Information Steward. The latest version helps speed data resolution issues with better usability, policy and workflow processes. You can immediately view and share data quality scorecards across devices without having to log into the application. You can also more easily access information policies while viewing rules, scorecards, metadata and terms to immediately verify compliance. New information policy web services allow policies outside of the application to be viewed anywhere such as corporate portals. Finally, new and enhanced metadata management capabilities provide data stewards and IT users a way to quickly search metadata and conduct more meaningful metadata discovery.
  • SAP Agile Data Preparation. To improve collaboration capabilities between business users and data stewards, SAP Agile Data Preparation focuses on the bridge between agile business data mash-ups and central corporate governance. It allows you to share, export and import rules between different worksheets or between different data domains. The rules are shared through a central and managed repository as well as through the capability to import or export the rules using flat files. New data remediation capabilities were added allowing you to change the values of a given cell by just double clicking it, add a new column and populate with relevant data values, or add or remove records in a single action.
  • SAP HANA smart data integration and smart data quality. The latest release of the SAP HANA platform features new performance and connectivity functionality to deliver faster, more robust real-time replication, bulk/batch data movement, data virtualization and data quality through one common user interface.
  • SAP Data Quality Management microservices. This new cloud-based offering is available as a beta on SAP HANA Cloud Platform, developer edition. It’s a pay-as-you-go cloud-based service that ensures clean data by providing data validation and enrichment for addresses and geocodes within any application or environment.

“As organizations are moving to the cloud and digital business, the data foundation is so important,” On says. “It’s not just having the data, but having the right data. We want to give them a suite of solutions that truly allow them to deliver information excellence from the beginning to the end.”

On says SAP Data Quality Management microservices will be available later in the first quarter. The other offerings are all immediately available.

This story, “SAP adds new enterprise information management” was originally published by CIO.

Source: InfoWorld Big Data