Review: Amazon QuickSight covers the BI basics

Review: Amazon QuickSight covers the BI basics

When I reviewed self-service exploratory business intelligence (BI) products in 2015, I covered the strengths and weaknesses of Tableau 9.0, Qlik Sense 2.0, and Microsoft Power BI. As I pointed out at the time, these three products offer a range of data access, discovery, and visualization capabilities at a range of prices, with Tableau the most capable and expensive, Qlik Sense in the middle, and Power BI the least capable but a very good value.

A new entry, Amazon QuickSight, runs entirely in the AWS cloud, has good access to Amazon data sources and fair access to other data sources, and offers basic analysis and data manipulation at a basic price. Of the three products I reviewed in 2015, QuickSight most closely resembles Power BI, only without the dependence on a desktop product to create data sets—or the level of analysis power provided by the Power BI Desktop/Service combination.

Source: InfoWorld Big Data

Review: The best frameworks for machine learning and deep learning

Review: The best frameworks for machine learning and deep learning

Over the past year I’ve reviewed half a dozen open source machine learning and/or deep learning frameworks: Caffe, Microsoft Cognitive Toolkit (aka CNTK 2), MXNet, Scikit-learn, Spark MLlib, and TensorFlow. If I had cast my net even wider, I might well have covered a few other popular frameworks, including Theano (a 10-year-old Python deep learning and machine learning framework), Keras (a deep learning front end for Theano and TensorFlow), and DeepLearning4j (deep learning software for Java and Scala on Hadoop and Spark). If you’re interested in working with machine learning and neural networks, you’ve never had a richer array of options.  

There’s a difference between a machine learning framework and a deep learning framework. Essentially, a machine learning framework covers a variety of learning methods for classification, regression, clustering, anomaly detection, and data preparation, and it may or may not include neural network methods. A deep learning or deep neural network (DNN) framework covers a variety of neural network topologies with many hidden layers. These layers comprise a multistep process of pattern recognition. The more layers in the network, the more complex the features that can be extracted for clustering and classification.

Source: InfoWorld Big Data

Review: Scikit-learn shines for simpler machine learning

Review: Scikit-learn shines for simpler machine learning

Scikits are Python-based scientific toolboxes built around SciPy, the Python library for scientific computing. Scikit-learn is an open source project focused on machine learning: classification, regression, clustering, dimensionality reduction, model selection, and preprocessing. It’s a fairly conservative project that’s pretty careful about avoiding scope creep and jumping on unproven algorithms, for reasons of maintainability and limited developer resources. On the other hand, it has quite a nice selection of solid algorithms, and it uses Cython (the Python-to-C compiler) for functions that need to be fast, such as inner loops.

Among the areas Scikit-learn does not cover are deep learning, reinforcement learning, graphical models, and sequence prediction. It is defined as being in and for Python, so it doesn’t have APIs for other languages. Scikit-learn doesn’t support PyPy, the fast just-in-time compiling Python implementation because its dependencies NumPy and SciPy don’t fully support PyPy.

Source: InfoWorld Big Data

Review: Caffe deep learning conquers image classification

Review: Caffe deep learning conquers image classification

Like superheroes, deep learning packages usually have origin stories. Yangqing Jia created the Caffe project while earning his doctorate at U.C. Berkeley. The project continues as open source under the auspices of the Berkeley Vision and Learning Center (BVLC), with community contributions. The BVLC is now part of the broader Berkeley Artificial Intelligence Research (BAIR) Lab. Similarly, the scope of Caffe has been expanded beyond vision to include nonvisual deep learning problems, although the published models for Caffe are still overwhelmingly related to images and video.

Caffe is a deep learning framework made with expression, speed, and modularity in mind. Among the frameworks strengths are the way Caffe’s models and optimization are defined by configuration without hard-coding, as well as the option to switch between CPU and GPU by setting a single flag to train on a GPU machine, then deploy to commodity clusters or mobile devices.

Source: InfoWorld Big Data

MXNet review: Amazon's scalable deep learning

MXNet review: Amazon's scalable deep learning

Deep learning, which is basically neural network machine learning with multiple hidden layers, is all the rage—both for problems that justify the complexity and high computational cost of deep learning, such as image recognition and natural language parsing, and for problems that might be better served by careful data preparation and simple algorithms, such as forecasting the next quarter’s sales. If you actually need deep learning, there are many packages that could serve your needs: Google TensorFlow, Microsoft Cognitive Toolkit, Caffe, Theano, Torch, and MXNet, for starters.

I confess that I had never heard of MXNet (pronounced “mix-net”) before Amazon CTO Werner Vogels noted it in his blog. There he announced that in addition to supporting all of the deep learning packages I mentioned above, Amazon decided to contribute significantly to one in particular, MXNet, which it selected as its deep learning framework of choice. Vogels went on to explain why: MXNet combines the ability to scale to multiple GPUs (across multiple hosts) with good programmability and good portability.

Source: InfoWorld Big Data

Get started with Azure Machine Learning

Get started with Azure Machine Learning

Machine learning is fast becoming the go-to predictive paradigm for data scientists and developers alike. Of the many tools available for tapping neural networks, Microsoft’s Azure ML Studio offers a quick learning curve that won’t take deep data or coding chops to get up and running.

Microsoft Azure Machine Learning Studio is a cloud service for performing value prediction (regression), anomaly detection, structure discovery (clustering), and category prediction (classification). While my previous tutorial for TensorFlow revealed how Google’s open source machine learning and deep neural network library requires you to roll up your sleeves a bit before digging in, Azure ML Studio’s graphical, modular approach will have you testing machine learning models quickly, as you will see below.

Let’s get started.

Source: InfoWorld Big Data

Review: Spark lights up machine learning

Review: Spark lights up machine learning

As I wrote in March of this year, the Databricks service is an excellent product for data scientists. It has a full assortment of ingestion, feature selection, model building, and evaluation functions, plus great integration with data sources and excellent scalability. The Databricks service provides a superset of Spark as a cloud service. Databricks the company was founded by the original developer of Spark, Matei Zaharia, and others from U.C. Berkeley’s AMPLab. Meanwhile, Databricks continues to be a major contributor to the Apache Spark project.

In this review, I’ll discuss Spark ML, the open source machine learning library for Spark. To be more accurate, Spark ML is the newer of two machine learning libraries for Spark. As of Spark 1.6, the DataFrame-based API in the Spark ML package was recommended over the RDD-based API in the Spark MLlib package for most functionality, but was incomplete. Now, as of Spark 2.0, Spark ML is primary and complete and Spark MLlib is in maintenance mode.

Source: InfoWorld Big Data

Get started with TensorFlow

Get started with TensorFlow

Machine learning couldn’t be hotter, with several heavy hitters offering platforms aimed at seasoned data scientists and newcomers interested in working with neural networks. Among the more popular options is TensorFlow, a machine learning library that Google open-sourced a year ago.

In my recent review of TensorFlow, I described the library and discussed its advantages, but only had about 300 words to devote to how to begin using Google’s “secret sauce” for machine learning. That isn’t enough to get you started.

In this article, I’ll give you a very quick gloss on machine learning, introduce you to the basics of TensorFlow, and walk you through a few TensorFlow models in the area of image classification. Then I’ll point you to additional resources for learning and using TensorFlow.

Source: InfoWorld Big Data

Review: Microsoft takes on TensorFlow

Review: Microsoft takes on TensorFlow

Like Google, Microsoft has been differentiating its products by adding machine learning features. In the case of Cortana, those features are speech recognition and language parsing. In the case of Bing, speech recognition and language parsing are joined by image recognition. Google’s underlying machine learning technology is TensorFlow. Microsoft’s is the Cognitive Toolkit. 

Both TensorFlow and Cognitive Toolkit have been released to open source. Both are complex frameworks that implement many neural network and deep learning algorithms. Both present challenges to developers new to the area. Cognitive Toolkit has recently become easier to install and deploy than it was, thanks to an automatic installation script. Cognitive Toolkit may be a little easier to use than TensorFlow right now, but that is balanced by TensorFlow’s wider applicability.

Source: InfoWorld Big Data

Review: TensorFlow shines a light on deep learning

Review: TensorFlow shines a light on deep learning

What makes Google Google? Arguably it is machine intelligence, along with a vast sea of data to apply it to. While you may never have as much data to process as Google does, you can use the very same machine learning and neural network library as Google. That library, TensorFlow, was developed by the Google Brain team over the past several years and released to open source in November 2015.

TensorFlow does computation using data flow graphs. Google uses TensorFlow internally for many of its products, both in its datacenters and on mobile devices. For example, the Translate, Maps, and Google apps all use TensorFlow-based neural networks running on our smartphones. And TensorFlow underpins the applied machine learning APIs for Google Cloud Natural Language, Speech, Translate, and Vision.

Source: InfoWorld Big Data