What is data mining? How analytics uncovers insights

What is data mining? How analytics uncovers insights

Organizations today are gathering ever-growing volumes of information from all kinds of sources, including websites, enterprise applications, social media, mobile devices, and increasingly the internet of things (IoT).

The big question is: How can you derive real business value from this information? That’s where data mining can contribute in a big way. Data mining is the automated process of sorting through huge data sets to identify trends and patterns and establish relationships, to solve business problems or generate new opportunities through the analysis of the data.

It’s not just a matter of looking at data to see what has happened in the past to be able to act intelligently in the present. Data mining tools and techniques let you predict what’s going to happen in the future and act accordingly to take advantage of coming trends.

The term “data mining” is used quite broadly in the IT industry. It often applied to a variety of large-scale data-processing activities such as collecting, extracting, warehousing, and analyzing data. It can also encompass decision-support applications and technologies such as artificial intelligence, machine learning, and business intelligence.

Data mining is used in many areas of business and research, including product development, sales and marketing, genetics, and cybernetics—to name a few. If it’s used in the right ways, data mining combined with predictive analytics can give you a big advantage over competitors that are not using these tools.

Deriving business value from data mining

The real value of data mining comes from being able to unearth hidden gems in the form of patterns and relationships in data, which can be used to make predictions that can have a significant impact on businesses.

For example, if a company determines that a particular marketing campaign resulted in extremely high sales of a particular model of a product in certain parts of the country but not in others, it can refocus the campaign in the future to get the maximum returns.

The benefits of the technology can vary depending on the type of business and its goals. For example, sales and marketing managers in retail might mine customer information in different ways to improve conversion rates than those in the airline orfinancial services industries.

Regardless of the industry, data mining that’s applied to sales patterns and client behavior in the past can be used to create models that predict future sales and behavior.

There’s also the potential for data mining to help eliminate activities that can harm businesses. For example, you can use data mining to enhance product safety, or detect fraudulent activity in insurance and financial services transactions.

The applications of data mining

Data mining can be applied to a variety of applications in virtually every industry.

  • Retailers can deploy data mining to better identify which products people are likely to purchase based on their past buying habits, or which goods are likely to sell at certain times of the year. This can help merchandisers plan inventories and store layouts.
  • Banks and other financial services providers can mine data related to their clients’ accounts, transactions, and channel preferences to better meet their needs. They can also gather then analyzed data from their websites and social media interactions to help increase the loyalty of existing customers and attract new ones.
  • Manufacturing companies can use data mining to look for patterns in the production process, so they can precisely identify bottlenecks and flawed methods and find ways to increase efficiencies. They can also apply knowledge from data mining to the design of products, and make tweaks based on feedback from customer experiences.
  • Educational institutions can benefit from data mining such as analyzing data sets to predict the future learning behaviors and performance of students, and then using this knowledge to make improvements in teaching methods or curricula.
  • Health care providers can mine and analyze data to determine better ways of delivering care to patients and cutting costs. With the help of data mining, they can predict how many patients they will need to care for and what type of services those patients will need. In the life sciences, mining can be used to glean insights from massive biological data, to help develop new medicines and other treatments.
  • In multiple industries, including health care and retail, you can use data mining to detect fraud and other abuses—much more quickly than with traditional methods for identifying such activities.

The key components of data mining

The process of data mining includes several distinct components that address different needs:

  • Preprocessing. Before you can apply data mining algorithms, you need to build a target data set. One common source for data is a data mart or warehouse. You need to perform preprocessing to be able to analyze the data sets.
  • Data cleansing and preparation. The target data set must be cleaned and otherwise prepared, to remove “noise,” address missing values, filter outlying data points (for anomaly detection) to remove errors or do further exploration, create segmentation rules, and perform other functions related to data preparation.
  • Association rule learning (also known as market basket analysis). These tools search for relationships among variables in a data set, such as determining which products in a store are often purchased together.
  • Clustering. This feature of data mining is used to discover groups and structures in data sets that are in some way similar to each other, without using known structures in the data.
  • Classification. Tools that perform classification generalize known structures to apply to new data points, such as when an email application tries to classify a message as legitimate mail or spam.
  • Regression. This data mining technique tis used to predict a range of numeric values, such as sales, housing values, temperatures, or prices when given a particular data set.
  • Summarization. This technique provides a compact representation of a data set, including visualization and report generation.

Dozens of vendors provide data mining software tools, some offering proprietary software and others delivering products via open source efforts.

Among the key vendors that offer proprietary data-mining software applications are Angoss, Clarabridge, IBM, Microsoft, Open Text, Oracle, RapidMiner, SAS Institute, and SAP.

Organizations that provide open source data mining software and applications include Carrot2, Knime, Massive Online Analysis, ML-Flex, Orange, UIMA, and Weka.

The risks and challenges of data mining

Data mining comes with its share of risks and challenges. As with any technology that involves the use of potentially sensitive or personally identifiable information, security and privacy are among the biggest concerns.

At a fundamental level, the data being mined needs to be complete, accurate, and reliable; after all, you’re using it to make significant business decisions and often to interact with the public, regulators, investors, and business partners. Modern forms of data also require new kinds of technologies, such as for bringing together data sets from a variety of distributed computing environments (aka big data integration) and for more complex data, such as images and video, temporal data, and spatial data.

Getting the right data and then pulling it together so it can be mined isn’t the end of the challenge for IT. The cloud, storage, and network systems need to enable high performance of the data mining tools. And the resulting information from the data mining needs to be presented clearly to the wide range of users expected to act on and interpret it. You’ll need people with skills in data science and related areas.

From a privacy standpoint, the idea of mining information that relates to how people behave, what they buy, what websites they visit, and so on can set off concerns about companies gathering too much information. That affects not just your technological implementation but your business strategy and risk profile.

Beyond the ethics of tracking individuals so thoroughly, there are also legal requirements about how data can be gathered, identified to a person, and shared. The United States’ Health Insurance Portability and Accountability Act (HIPAA) and the European Union’s General Data Protection Directive (GDPR) are among the best known.

In data mining, the initial act of preparation itself, such as aggregating and then rationalizing data, can disclose information or patterns the might compromise the confidentiality of the data. Thus, it’s possible to inadvertently run afoul of ethical concerns or legal requirements.

Data mining also requires data protection every step of the way, to make sure data is not stolen, altered, or accessed secretly. Security tools include encryption, access controls and network security mechanisms.

Data mining is a key differentiator

Despite these challenges, data mining has become a vital component of the IT strategies at many organizations that seek to gain value from all the information they’re gathering or can access. This drive will no doubt accelerate with ongoing advancements in predictive analytics, artificial intelligence, machine learning, and other related technologies.

Source: InfoWorld Big Data

How to avoid big data analytics failures

How to avoid big data analytics failures

Big data and analytics initiatives can be game-changing, giving you insights to help blow past the competition, generate new revenue sources, and better serve customers.

Big data and analytics initiatives can also be colossal failures, resulting in lots of wasted money and time—not to mention the loss of talented technology professionals who become fed up at frustrating management blunders.

How can you avoid big data failures? Some of the best practices are the obvious ones from a basic business management standpoint: be sure to have executive buy-in from the most senior levels of the company, ensure adequate funding for all the technology investments that will be needed, and bring in the needed expertise and/or having good training in place. If you don’t address these basics first, nothing else really matters.

But assuming that you have done the basics, what separates success from failure in big data analytics is how you deal with the technical issues and challenges of big data analytics. Here’s what you can do to stay on the success side of the equation.

Source: InfoWorld Big Data

Machine learning proves its worth to business

Machine learning proves its worth to business

Machine learning couldn’t be hotter. A type of artificial intelligence that enables computers to learn to perform tasks and make predictions without explicit programming, machine learning has caught fire among the hip tech set, but remains a somewhat futuristic concept for most enterprises. But thanks to technological advances and emerging frameworks, machine learning may soon hit the mainstream.

Consulting firm Deloitte expects to see a big increase in the use and adoption of machine learning in the coming year. This is in large part because the technology is becoming much more pervasive. The firm’s latest research shows that worldwide more than 300 million smartphones, or more than one-fifth of units sold in 2017, will have machine learning capabilities on board.

“New chip technology in the form of central processing units, graphics processing units, or dedicated field-programmable gate arrays will be able to provide neural network processing at prices, sizes, and power consumption that fit smartphones,” says Stuart Johnston, leader of the technology, media, and telecommunications practice at Deloitte.

“This hardware added to machine learning software will enable native programs designed to mimic aspects of the human brain’s structure and function, and will be applied to areas such as indoor navigation, image classification, augmented reality, speech recognition, and language translation,” Johnston says. “What that means from a day-to-day user perspective is that complicated tasks will be easier, will be more personalized, faster, and have greater privacy.”

Companies in various industries are already using or experimenting with machine learning technologies. Here is a look at how three companies are tapping machine learning to great business effect.

Pinning hopes on data-rich images

Social media site Pinterest began dabbling with machine learning in 2014, when it started investing heavily in computer vision technology and created a small team of engineers focused on reinventing the ways people find images.

Less than a year later the company launched “visual search,” a new tool that does not require text queries to search for information. “For the first time, visual search gave people a way to get results even when they can’t find the right words to describe what they’re looking for,” says Mohammad Shahangian, head of data science at Pinterest.

Visual search is powered by deep learning, a version of machine learning that taps into deeper neural networks, and allows Pinterest to automatically detect objects, colors, and patterns in any pin’s image and recommend related objects. There are more than 200 million visual searches on Pinterest every month, in addition to 2 billion text searches, Shahangian says.

In the summer of 2016, visual search evolved as Pinterest introduced object detection, which finds all the objects in a pin’s image in real time and provides related results.

“Today, visual search has become one of our most-used features, with hundreds of millions of visual searches every month, and billions of objects detected,” Shahangian says. “Now, we’re introducing three new products on top of our visual discovery infrastructure.”

Pinterest has one of the largest collections of data-rich images on the internet. “We use machine learning to constantly rank and scale 75 billion dynamic objects, from buyable pins to video, and show the right pin to the right person at the best time,” Shahangian says. “Our core focus is helping people discover compelling content, such as products to buy, recipes to make, and projects to try, and machine learning helps us provide a more personalized experience.”

As Pinterest expands its international audience, it’s vital that its service be personalized for people regardless of where they live, what language they speak, or what their interests are, Shahangian says. “Using machine-learned models, we’ve increased the number of localized pins for countries outside the U.S. by 250 percent over the past year,” he says. “Now each of the more than 150 million people who visit Pinterest monthly see pins most relevant to their country and language.”

In additional, machine learning predicts the relevance of a promoted pin on the site as well as its performance, helping improve the user experience with promoted ideas from businesses.

“We recently added deep learning to our recommendations candidate pipeline to make related pins even more relevant,” Shahangian says. “Pinterest engineers have developed a scalable system that evolves with our product and people’s interests, so we can surface the most relevant recommendations. By applying this new deep learning model, early tests show an increase in engagement with related pins by 5 percent globally.”

Pinterest is constantly developing technologies with the latest in machine learning “to build a visual discovery engine, including making advancements in object detection and scaling an ever-growing corpus of data and the world’s data-rich set of images, to people around the world,” Shahangian says.

Building high-dimensional models

Another company using machine learning, software provider Adobe Systems, has worked with supervised and unsupervised machine learning, as well as statistical models to help run its business for years, according to Anandan Padmanabhan, vice president of Adobe Research.

With the transition of Adobe’s business to a cloud-based subscription offering, there were two fundamental drivers that resulted in a need for large-scale machine learning within the company: online channels becoming the primary source for acquiring customers, and the need for driving product engagement and retention at scale across millions of customers. In addition, the data captured on customer engagement with a particular product are far more detailed through machine learning.

“Adobe captures this event-level longitudinal data across product usage, marketing, and customer support to build various types of predictive models,” Padmanabhan says. These include paid conversion and retention models, customer retention models, automated feature extraction and segmentation, upsell and cross-sell models, and optimal allocation and segment-based forecasting models.

The tools the company has used for its machine learning efforts include Python Scikit-learn, Spark ML, SAS, and proprietary in-house methods.

Machine learning methods have helped the company build individual-level, high-dimensional models, Padmanabhan says. “Previously, Adobe leveraged statistical tools for building more aggregated models that would ignore individual-level heterogeneity altogether,” he says.

Among the key benefits of machine learning for Adobe is a greater understanding of the marginal impact of paid media, which has resulted in the improved allocation of media touchpoints across various selling channels; and the ability to understand individual customer propensities and lifecycle stages, which helps drive marketing campaigns.

The company has also seen improved customer engagement through a better understanding of how individual products are used and through responses to marketing campaigns, which has led to more customized products and customer support experiences. That, in turn, has helped with customer retention.

In addition, Adobe has seen improvements in enterprise sales and territory planning, which drive higher sales efficiencies; and the development of a consistent way of defining and analyzing key performance indicators across the business, which has allowed the company to evaluate all campaigns in a common framework.

Given the success so far, the company is looking for other options to take advantage of machine learning. “There is a strong push within Adobe to leverage machine learning in managing all aspects of the customer experience,” Padmanabhan says.

Managing risk for customers

At LexisNexis Risk Solutions (LNRS), a provider of financial risk management services, machine learning helps customers protect against identity theft, money laundering, benefit scams, health care fraud, bad debt, and other risks.

LNRS began using machine learning several years ago to analyze and extract information from extremely large and heterogeneous data pools, to create graphs and make predictions about events, says Flavio Villanustre, vice president of technology architecture and product at LNRS.

The company uses mostly homegrown machine learning tools based on HPCC Systems, an open source, massive parallel-processing computing platform for big data processing and analytics.

The platform “gives us advantages when dealing with complex models and needing scalability to apply to very large and diverse data sets,” Villanustre says. On top of the HPCC platform, LNRS designed its own domain-specific abstractions in the form of domain-specific languages such as Scalable Automated Linking Technology, a sophisticated record linkage tool, and Knowledge Engineering Language, which combines graph analysis with machine learning capabilities.

Prior to machine learning, modeling through algorithms required people to understand the particular problem domain, extract facts from the existing data, and write large, “heuristics based” programs that used conditional rules to model different possible outcomes from the incoming data, Villanustre says. “These earlier systems required experts to sift through data to understand reality and describe it through conditional statements that a computer could understand,” he says. “This was very tedious, hard work, and better left to computers.”

Machine learning changed that by letting computers extract those facts and represent reality through statistical equations-based models instead, Villanustre says. “This saves countless hours of domain experts’ time and allows them to work with data sets that humans would struggle to deal with otherwise,” he says. “The resulting computer programs are more compact, easier to implement, and more efficient.”

LNRS uses machine learning to describe complete networks of organizations and individuals to identify fraud rings. It also uses the technology to assess and make predictions on credit and insurance risk, identify fraud in health-care-related transactions, and help capture criminals.

“Machine learning is at the core of everything that we do,” Villanustre says. And the company is looking into the latest iterations of the technology. Some of the recent developments around deep belief networks — generative graphical models composed of multiple layers of latent variables with connections between the layers — and deep learning are proving to be promising fields of applications, he says.

“It is always important for us to validate these new methodologies with the laws and regulations of the respective countries in which we work to ensure that they can be used in ways that maximize the benefit to individuals and society,” Villanustre says.

Machine learning in the mainstream

The adoption of machine learning is likely to be diverse and across a range of industries, including retail, automotive, financial services, and health care, says Johnston of Deloitte.

In some cases, it will help transform the way companies interact with customers, Johnston says. For example, in the retail industry, machine learning could completely reshape the retail customer experience. The improved ability to use facial recognition as a customer identification tool is being applied in new ways by companies such as Amazon at its Amazon Go stores or through its Alexa platform.

“Amazon Go removes the need for checkouts through the use of computer vision, sensor fusion, and deep or machine learning, and I expect many shopping centers and retailers to start exploring similar options this year,” Johnston said.

The fact that common devices such as smartphones will be equipped with machine learning capabilities means the technology will no longer be limited to theoretical or highly selective applications.

“Examples of emerging smartphone technologies powered by machine learning include things like programs that determine users’ moods and emotions through pressure sensors, programs that make health and life predictions using health data and programs that detect surrounding objects,” Johnston says.

Outside of smartphones, we will also see machine learning emerge in drones, tablets, cars, virtual or augmented reality devices, medical tools, and a range of IoT devices, making it available to industries that use those products, Johnston says.

Related articles

Source: InfoWorld Big Data