7 big data tools to ditch in 2017

7 big data tools to ditch in 2017

We’ve been on this big data adventure for a while. Not everything is still shiny and new anymore. In fact, some technologies may be holding you back. Remember, this is the fastest-moving area of enterprise tech — so much so that some software acts as a placeholder until better bits arrive.

Those upgrades — or replacements — can make the difference between a successful big data initiative and one you’ll be living down for the next few years. Here’s are some elements of the stack you should start to think about replacing:

1. MapReduce. MapReduce is slow. It’s rarely the best way to go about a problem. There are other algorithms to choose from — the most common is DAG, of which MapReduce can be considered a subset. If you’ve done a bunch of custom MapReduce jobs, the performance difference compared to Spark is worth the cost and trouble of switching.

2. Storm. I’m not saying Spark will eat the streaming world, although it might, but with technologies like Apex and Flink there are better, lower-latency alternatives to Spark than Storm. Besides, you should probably evaluate your latency tolerance and whether the bugs you have in your lower-level, more complicated code are worth a few extra milliseconds. Storm doesn’t have the support that it could, with Hortonworks as the only real backer — and with Hortonworks facing increasing market pressure, Storm is unlikely to get more attention.

3. Pig. Pig kind of blows. You can do anything it does with Spark or other technologies. At first Pig seems like a nice “PL/SQL for big data,” but you quickly find out it’s a little bizarre.

4. Java. No, not the JVM, but the language. The syntax is clunky for big data jobs. Plus, newer constructs like Lambda have been bolted onto the side in a somewhat awkward manner. The big data world has largely moved to Scala and Python (the latter when you can afford the performance hit and need Python libraries or are infested with Python developers). Of course, you can use R for stats, until you rewrite it in Python because R doesn’t have all the fun scale features.

5. Tez. This is another Hortonworks pet project. It’s a DAG implementation, but unlike Spark, Tez is described by one of its developers as like writing in “assembly language.” At the moment, with a Hortonworks distribution, you’ll end up using Tez behind Hive and other tools — but you can already use Spark as the engine in other distributions. Tez has always been kind of buggy anyhow. Again, this is one vendor’s project and doesn’t have the industry or community support of other technologies. It doesn’t have any runaway advantages over other solutions. This is an engine I’d look to consolidate out.

6. Oozie. I’ve long hated on Oozie. It isn’t much of a workflow engine or much of a scheduler — yet it’s both and neither at the same time! It is, however, a collection of bugs for a piece of software that shouldn’t be that hard to write. Between StreamSets, DAG implementations, and all, you should have ways to do most of what Oozie does.

7. Flume. Between StreamSets and Kafka and other solutions, you probably have an alternative to Flume. That May 20, 2015, release is looking a bit rusty. You can track the year-on-year activity level. Hearts and minds have left. It’s probably time to move on.

Maybe by 2018 …

What’s left? Some technology is showing its age, but complete viable alternatives have not arrived yet. Think ahead about replacing these:

1. Hive. This is overly snarky, but Hive is like the least performant distributed database on the planet. If we hadn’t as an industry decided RDBMSes were the greatest thing since sliced bread for like 40 years, then would we really have created this monster?

2. HDFS. Writing a system-level service in Java is not the greatest of ideas. Java’s memory management also makes pushing massive amounts of bytes around a bit slow. The way the HDFS NameNode works is not ideal for anything and constitutes a bottleneck. Various vendors have workarounds to make this better, but honestly, nicer things are available. There are other distributed filesystems. MaprFS is a pretty well-designed one. There’s also Gluster and a slew of others.

Your gripes here

With an eye to the future, it’s time to cull the herd of technologies that looked promising but have grown either obsolete or rusty. This is my list. What else should I add?

Source: InfoWorld Big Data

Review: TensorFlow shines a light on deep learning

Review: TensorFlow shines a light on deep learning

What makes Google Google? Arguably it is machine intelligence, along with a vast sea of data to apply it to. While you may never have as much data to process as Google does, you can use the very same machine learning and neural network library as Google. That library, TensorFlow, was developed by the Google Brain team over the past several years and released to open source in November 2015.

TensorFlow does computation using data flow graphs. Google uses TensorFlow internally for many of its products, both in its datacenters and on mobile devices. For example, the Translate, Maps, and Google apps all use TensorFlow-based neural networks running on our smartphones. And TensorFlow underpins the applied machine learning APIs for Google Cloud Natural Language, Speech, Translate, and Vision.

Source: InfoWorld Big Data

OneNeck® IT Solutions Moves Luck Stone Corporation From Public To Private Cloud

OneNeck® IT Solutions Moves Luck Stone Corporation From Public To Private Cloud

OneNeck® IT Solutions has announced that Luck Stone Corporation contracted for conversion of their IT infrastructure. OneNeck is moving Luck Stone’s Microsoft Dynamics AX environment from public cloud onto OneNeck’s private hosted cloud known as ReliaCloud®. OneNeck will continue providing Luck Stone with enterprise application, network, security and infrastructure management, along with database and operating system administration.

“During the last five years, OneNeck has proven they have the ability, expertise and experience to manage our Microsoft Dynamics AX environment,” said Donald Jones, VP of IT at Luck Stone. “We began noticing performance issues within our public cloud environment and realized we needed a more scalable, agile and secure solution. In talking with OneNeck about their Infrastructure as a Service solution, AX on ReliaCloud, we realized it could enhance our performance and security. At the same time, moving to ReliaCloud would allow us to take advantage of a more cost-effective solution. Adding ReliaCloud to the portfolio and moving to a private cloud environment just made sense.”

Luck Stone is one of the nation’s largest family-owned and operated producers of crushed stone, sand and gravel. Based in Richmond, Va., Luck Stone prides itself on being a dependable, responsive partner who continually strives to innovate and deliver consistent, quality material customers can count on for project and business success.

By moving their Microsoft Dynamics AX environment onto ReliaCloud (an enterprise-level hosted, private cloud that delivers the power and flexibility of a public cloud solution), Luck Stone will:

  • Avoid unscheduled downtime and unplanned maintenance.
  • Optimize their IT cost structure between capital and operating costs.
  • Control their Microsoft Dynamics AX license base, while still maintaining full-platform certification and supportability.
  • Leverage OneNeck’s depth of Microsoft application expertise.
  • Have access to flexible resource pools to deploy (and re-deploy) as their IT environment changes.
  • Continually meet security and compliance requirements.
  • Securely connect with a variety of access points.
  • Achieve advanced disaster recovery capabilities using resources in multiple data centers owned and operated by OneNeck.

“We’re proud to continue our partnership with Luck Stone and to help them move to an environment that enhances the performance of their IT and delivers greater security,” says Terry Swanson, Senior VP of Sales and Marketing at OneNeck. “We appreciate their business and attribute it to the dedication and experience of our employees. Having Luck Stone expand their contract to include ReliaCloud is a huge testament to the commitment of our entire team.”

Source: CloudStrategyMag

OffsiteDataSync Ranked Among Top 100 Cloud Services Providers

OffsiteDataSync Ranked Among Top 100 Cloud Services Providers

OffsiteDataSync ranks among the world’s Top 100 cloud services providers (CSPs), according to Penton’s sixth-annual Talkin’ Cloud 100 report.

Based on data from Talkin’ Cloud’s online survey, conducted through June to August 2016, the Talkin’ Cloud 100 list recognizes top cloud services providers (CSPs), considering annual cloud services revenue growth, and input from Penton Technology’s Channel editors.

“OffsiteDataSync is honored to be included among the 2016 TC100,” said Matthew Chesterton, CEO, OffsiteDataSync. “Our steady rise in the rankings to 22nd is a testament to our depth of experience, commitment to continuous improvement, and strong partner relationships.”

“On behalf of Penton and Talkin’ Cloud, I would like to congratulate OffsiteDataSync for its recognition as a Talkin’ Cloud 100 honoree,” said Nicole Henderson, editor in chief, Talkin’ Cloud. “Cloud services providers on the Talkin’ Cloud 100 set themselves apart through innovative cloud offerings and new support models, demonstrating a deep understanding of their customers’ needs and future cloud opportunities.”

Source: CloudStrategyMag

Salesforce will buy Krux to expand behavioral tracking capabilities

Salesforce will buy Krux to expand behavioral tracking capabilities

Salesforce.com has agreed to buy user data management platform Krux Digital, potentially allowing businesses to process even more data in their CRM systems.

Krux describes its business as “capturing, unifying, and activating data signatures across every device and every channel, in real time.”

Essentially, it performs the tracking underlying behavioral advertising, handling 200 billion “data collection events” on three billion browsers and devices (desktop, mobile, tablet and set-top) each month.

With that staggering volume of data, “Krux will extend the Salesforce Marketing Cloud’s audience segmentation and targeting capabilities to power consumer marketing with even more precision, at scale,” Krux CEO and co-founder Tom Chavez wrote on the company blog.

The acquisition will also allow joint customers of Salesforce and Crux to feed “billions of new signals” to Salesforce Einstein, a suite of AI-based tools for building predictive models, Chavez said.

Unveiled two weeks ago, Salesforce Einstein will include functions such as predictive lead scoring and recommended case classification. Some functions will be available for free, while others will be charged for based on data volume and user numbers.

Krux is part of the Salesforce ecosystem, but also works with other vendors including Oracle, Google’s DoubleClick, Criteo and a host of other advertising networks. According to Chavez, it won’t be cutting those ties following the acquisition. “Openness remains a guiding principal,” he said. “We expect to continue supporting our thriving partner ecosystem and integrating with a wide variety of platforms.”

Businesses already using Krux to track their customers include media companies BBC, HBO, NBCUniversal and DailyMotion; publishers The Guardian and Financial Times, and food and drink companies ABInBev, Mondelez International, Kelloggs and Keurig.

Salesforce will pay around $340 million in cash and a similar amount in shares for Krux, according to a filing it made with the SEC Tuesday. It expects to close the deal by the end of January.

Later Tuesday, Salesforce will open its Dreamforce customer and partner conference in San Francisco. Krux is one of the exhibitors.

Source: InfoWorld Big Data