AWS Takes Down Hundreds of Sites in Massive S3 Outage

AWS Takes Down Hundreds of Sites in Massive S3 Outage

Availability issues with the US-EAST-1 region of AWS’ S3 storage service caused downtime or slow performance for many websites on Tuesday.

Affected sites include Airbnb, Business Insider, Chef, Docker, Expedia, Heroku, Mailchimp, News Corp, Pantheon, Pinterest, Slack, and Trello, as well as parts of AWS’ own site, and ironically and Down Detector, VentureBeat reports.

AWS acknowledged the issues before 7:30 a.m. Pacific, saying it was investigating. Shortly after 10:30 a.m. Pacific, the company updated the statement on its status page.

“We’re continuing to work to remediate the availability issues for Amazon S3 in US-EAST-1. AWS services and customer applications depending on S3 will continue to experience high error rates as we are actively working to remediate the errors in Amazon S3,” AWS service health dashboard said.

An hour later, AWS updated the message: “We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue.”

AWS suffered a service disruption lasting over five hours in 2015Google App Engine was down for nearly 2 hours in August, and problems at Telia Carrier affected many popular sites and services in June of last year.

Source: TheWHIR

Report: Hackers Take Less than 6 Hours on Average to Compromise Targets

Report: Hackers Take Less than 6 Hours on Average to Compromise Targets

Most hackers can compromise a target in less than six hours, according to a survey of hackers and penetration testers released Tuesday by security awareness training firm KnowBe4.

The Black Report was compiled from 70 surveys taken at Black Hat USA and Defcon, and shows that phishing is the preferred method for 40 percent of hackers. A further 43 percent said they sometimes use social engineering, while only 16 percent do not use social engineering at all. Forty percent sometimes use vulnerability scanners, 60 percent use open-source tools, and just over 20 percent use custom tools for hacking.

A majority of those surveyed (53 percent) said they sometimes encounter systems they are unable to crack, while 9 percent say they never do, and 22 percent said they “rarely” encounter such targets. KnowBe4 chief hacking officer Kevin Mitnick performs penetration testing with a separate company (Mitnick Security), with a 100 percent success rate. Mitnick will present the keynote address at the upcoming HostingCon Global 2017 in Los Angeles. [Register now for HostingCon Global and save $100 on your all-access pass]

Once they have gained access to a system, one in three penetration testers said their presence was never detected, and only 2 percent say they are detected more than half of the time. Exfiltrating data after a compromise takes less than 2 hours for 20 percent of respondents, and two to six hours for 29 percent, while 20 percent take longer than 12 hours.

See also: Pentagon Hires Hackers to Target Sensitive Internal Systems

When asked about effective protection against breaches, endpoint protection was named by 36 percent of those surveyed, while 29 percent identified intrusion detection and prevention systems.  Only 2 percent consider anti-virus software an obstruction to hacking networks.

One-quarter of those surveyed said their advice to corporate boards would be to recognize that it is inevitable that they will be hacked, it is only a question of when it will happen. Roughly the same number urged boards to consider the return on investment in security, while 10 percent said boards should realize that detection capability is much more important than deflection capability.

KnowBe4 also commissioned a study from Forrester on the Total Economic Impact of breaches to put numbers to the potential return on investment (ROI) of security spending. The study is available from the KnowBe4 website.

See also: Data Breaches Hit Record in 2016 as DNC, Wendy’s Co. Hacked

Source: TheWHIR

Unlike big data, IoT may live up to the hype

Unlike big data, IoT may live up to the hype

Big data has long promised more than it delivers, at least for most enterprises. While a shift to cloud pledges to help, big data deployments are still more discussed than realized, with Gartner insisting that only 14 percent of enterprises have gotten Hadoop off the ground.

Will the other darling of the chattering class, IoT (internet of things), meet the same fate? In fact, IoT might deliver, according to new data from Talend compiled in conjunction with O’Reilly. Dubbing 2016 “the year IoT ‘grew up,'” the report declares 2017 the year that “IoT starts to become essential to modern business.”

How and where IoT gets real, however, may surprise you.

The new hyped kid on the block

IoT has been proclaimed the $11 trillion savior of the global economy, which has translated into IoT becoming even bigger than big data, at least in terms of general interest. This Google Trends chart shows IoT surpassing big data in search instances around the middle of last year:

iot trendGoogle Trends

If we get more specific on “big data” and instead use Apache Hadoop, Apache Spark, or MongoDB, all hugely popular big data technologies, the crossover is even more pronounced. IoT has arrived (without its security intact, but why quibble?). Indeed, as the Talend report avers, “[W]hile the buzz around big data is louder, the actual adoption of big data in industry isn’t much larger than the adoption of IoT.”

That’s right: IoT is newer, yet sees nearly as much adoption as big data. In fact, IoT, as the source for incredible amounts of data, could actually be what makes big data real. The question is where.

Betting on boring

The answer to that question, according to the Talend report, which trawled through more than 300TB of live data to glean its insights, is not where the analysts keep insisting:

We found that IoT spending today is for use cases that are much different than those predicted by McKinsey, Gartner, and others. For example, the greatest value/consumer surplus predicted by McKinsey was in factories around predictive maintenance and inventory management, followed by healthcare and smart city–related use cases like public safety and monitoring. While these use cases may be the top producers of surplus in 2025, we do not see much spend on those use cases today. In contrast, home energy and security is low on the McKinsey list, but that’s where the market is today, in addition to defense and retail.

It’s not that the analysts are wrong when they pick out details like industrial automation as incredibly ripe for IoT disruption, so long as we don’t assume “ripe” means “developed to the point of readiness for harvesting or eating.” Given the complexity of introducing significant changes into something like factory automation, such industries most definitely are not “ripe” for IoT. The potential is huge, but so are the pitfalls holding back change.

Home energy and security, by contrast, are relatively straightforward. Or, as the report continues, areas like health care are in desperate need of disruption, but the likes of online patient monitoring “seems 100 times more complex than simple home monitoring or personalized displays for in-store customers.”

Hence, home energy (9 percent) and security (25 percent) accounts for the biggest chunk of IoT deployments in 2016, with defense (14 percent) and retail (11 percent) also significant. Health care? A mere 4 percent.

Given that regulation and complexity are inimical to real-world IoT adoption, it’s perhaps not surprising that unlike big data, which is mostly a big company phenomenon, IoT shows “more continuous adoption … across large and small companies.” As such, IoT deployments are more evenly spread across geographies, rather than following big data’s concentration on the coasts.

In sum, IoT could well end up being a truly democratizing trend, a “bottom-up” approach to innovation.

Source: InfoWorld Big Data

Webair Partners With Data Storage Corporation

Webair Partners With Data Storage Corporation

Webair has announced a partnership with Data Storage Corporation to enhance its high availability (HA) Disaster Recovery and overall support capabilities for IBM Power Systems (iSeries, AS/400, AIX) environments. The demand for Webair’s Disaster Recovery-as-a-Service (DRaaS) solution has grown exponentially over recent years, and the addition of IBM Power Systems support positions it for even further expansion.

Many companies require both x86 and IBM Power Systems platforms to run mission-critical applications, making disaster recovery critical to these environments. The partnership between Webair and Data Storage Corporation provides x86 and IBM Power Systems users with:

  • Recovery point objectives (RPO) and recovery time objectives (RTO) of one hour, including continuous replication, network automation and orchestration
  • Seventy-two hours of monthly recovery site usage before incurring additional fees
  • Fully managed quarterly recovery site testing with attestation report
  • Per-IP failover, public BGP failover, DNS failover, L2 stretch, and VPN(s)
  • One week of checkpoints
  • Recovery site network architecture customization to enable customer infrastructure integration
  • The ability to replicate data to any Webair DR location, including U.S. East and West Coasts, Canada, Europe, and Asia. 

Through this new partnership, Webair now also offers customers fully managed IBM Power Systems solutions and services backed by a premier IBM Managed Services Provider.

“Webair’s strategic partnership with Data Storage Corporation broadens our Disaster Recovery support capabilities,” explains Michael Christopher Orza, CEO of Webair. “While most providers only offer a limited range of services encompassing specific operating systems and workloads, this partnership delivers a true DRaaS solution that is customized to mirror customers’ specific production environments and supports both state-of-the-art and legacy platforms.”

Webair customers can also take advantage of ancillary services available at its data centers as part of their larger DRaaS solutions, including public and private cloud infrastructure, snapshot-based storage replication, colocation, authoritative DNS, third-party cloud connectivity, Backups-as-a-Service, network connectivity, and Seeding. These services are fully managed and can be tied directly into DRaaS infrastructure via private and secure cross-connects.

“I am excited about the Data Storage Corporation / Webair partnership and the opportunities it will provide to both companies as our services are in high demand across all markets and industries with tremendous growth forecasted,” says Hal Schwartz, President DSC.  “Because of this partnership, the combined teams can now provide first-class management and technical support, allowing them to deliver and fully manage cloud, hybrid cloud and cloud backup solutions with the highest confidence and service levels.”


Source: CloudStrategyMag

Dimension Data Launches Managed Cloud Service For Microsoft

Dimension Data Launches Managed Cloud Service For Microsoft

Dimension Data has announced the availability of its Managed Cloud Services for Microsoft. The new offering provides organizations with a cloud-based managed service for Microsoft Exchange, SharePoint, Skype for Business, and Office 365, whether deployed in the public cloud, on premises, in a private cloud or as a hybrid model. Combined with its planning and deployment services, Dimension Data is now able to provide its clients with a complete end-to-end managed service, with the added benefits of meeting client-specific security and compliance.

Based on statistics presented during Microsoft’s 2016 Q3 earnings call, there were over 70 million commercial Microsoft Office 365 monthly active users. In addition, *Microsoft Office 365 is one of the fastest growing areas of technology in the enterprise sector today.

According to Tony Walt, group executive of Dimension Data’s end-user computing business, for these productivity solutions to be effective in an enterprise environment, they need to be managed and supported. However, managing the administrative complexity, while at the same time extracting the full value from each of the applications, is a challenge for CIOs and enterprises today.

“With the feature-rich cloud productivity suite of applications increasingly becoming the foundation for organizations transitioning to a digital business, Dimension Data’s Managed Cloud Services for Microsoft and the software-as-a-service Cloud Control management platform is a game changer. It is revolutionizing the automation and management of enterprise Microsoft messaging and collaboration.  Managed Cloud Services for Microsoft provides the services and technology you need to plan, deploy and manage your Microsoft messaging and collaboration suite, ensuring that the complexity inherent in an integrated environment is seamlessly addressed, while also guaranteeing you have the flexibility to grow and adapt to the migration to cloud,” says Walt.

Managed Cloud Services for Microsoft is a solution that manages the automation and management of enterprise Microsoft messaging and collaboration applications. Cloud Control™, Dimension Data’s administrative platform and portal enables help desk representatives to perform tasks to resolve issues that would normally require escalation to second or third level administrators, delivering reduced costs and ensuring customer satisfaction.

Managed Cloud Services for Microsoft builds on Dimension Data’s expertise with Microsoft’s productivity solutions, with the company having deployed more than one million seats of Office 365 globally, over 1.5 million seats of Exchange on-premise, and another two million cloud-based Exchange seats. Dimension Data has also completed more than 400 SharePoint projects and over 500 Skype for Business projects globally.

“We have combined more than 25 years of expertise in delivering Microsoft solutions and managed services with our own intellectual property and management tools,” said Phil Aldrich, director of end-user computing, Dimension Data. “Managed Cloud Services for Microsoft provides our clients with the best of both worlds, delivering enterprise performance and management for their Microsoft workloads with the flexibility and scalability of cloud.”

The service is being rolled out globally to address the needs of Dimension Data’s worldwide client base.

Source: CloudStrategyMag

Report: Nearly One-Third Of Consumers Aren’t Aware They Use The Cloud

Report: Nearly One-Third Of Consumers Aren’t Aware They Use The Cloud

There is confusion over what qualifies as ‘the cloud,’ according to a new consumer survey. Despite marking that they use at least one of several popular cloud-based applications, such as Google Drive, Dropbox or Microsoft OneDrive, over 30% of consumers subsequently responded that they do not use or access information in the cloud. Clutch designed the survey to gauge consumers’ knowledge and habits regarding cloud usage.

When it comes to understanding which applications are part of the cloud, experts say that the confusion is understandable.

“From private cloud, managed private cloud, to in-house and public cloud, there are many different technologies which can be referred to as cloud, but are very general,” said Alexander Martin-Bale, director of Cloud and Data Platforms at adaware, an anti-spyware and anti-virus software program. “The reality is that knowing exactly when you’re using it, even for a technical professional, is not always simple.”

Over half of the respondents (55%) say they are “very” or “somewhat” confident in their cloud knowledge. However, 22% of respondents who do consider themselves very confident in their cloud knowledge did not know or were unsure about whether or not they use the cloud.

Lucas Roh, CEO at Bigstep, says this is likely due to the overuse of the cloud as a buzzword. “It boils down to the fact that the word ‘cloud’ has been used everywhere in the press,” he said. “People have heard about it, and think that they conceptually know how it works, even if they don’t… They’re only thinking in terms of the application being used, not the actual technology behind those applications.”

When it comes to the security of the cloud, 42% of respondents believe the responsibility falls equally on the cloud provider and user.

Chris Steffen, technical director at Cryptzone, says there has been a shift in how people view the security of the cloud. “I think the dynamic is changing along with the information security paradigm,” said Steffen. “People are realizing ­— ‘Hey, maybe I do need to change my password every five years,’ or something similar. You can’t expect everything to be secure forever.”

Based on the findings, Clutch recommends that consumers seek more education on cloud computing, as well as implement simple additional security measures, such as two-factor authentication.

These and other security measures are increasingly important as the cloud becomes more ubiquitous. “The cloud is not going anywhere. If anything, it’s going to become more and more an integral part of the stuff that we do every single day, whether we know that we’re using it or not,” said Steffen.

Clutch surveyed 1,001 respondents across the United States. All respondents indicated that they use one of the following applications: iCloud, Google Drive, Dropbox, Box, Microsoft OneDrive, iDrive, and Amazon Cloud Drive.


Source: CloudStrategyMag

5 Python libraries to lighten your machine learning load

5 Python libraries to lighten your machine learning load

Machine learning is exciting, but the work is complex and difficult. It typically involves a lot of manual lifting — assembling workflows and pipelines, setting up data sources, and shunting back and forth between on-prem and cloud-deployed resources.

The more tools you have in your belt to ease that job, the better. Thankfully, Python is a giant tool belt of a language that’s widely used in big data and machine learning. Here are five Python libraries that help relieve the heavy lifting for those trades.


A simple package with a powerful premise, PyWren lets you run Python-based scientific computing workloads as multiple instances of AWS Lambda functions. A profile of the project at The New Stack describes PyWren using AWS Lambda as a giant parallel processing system, tackling projects that can be sliced and diced into little tasks that don’t need a lot of memory or storage to run.

One downside is that lambda functions can’t run for more than 300 seconds max. But if you need a job that takes only a few minutes to complete and need to run it thousands of times across a data set, PyWren may be a good option to parallelize that work in the cloud at a scale unavailable on user hardware.


Google’s TensorFlow framework is taking off big-time now that it’s at a full 1.0 release. One common question about it: How can I make use of the models I train in TensorFlow without using TensorFlow itself?

Tfdeploy is a partial answer to that question. It exports a trained TensorFlow model to “a simple NumPy-based callable,” meaning the model can be used in Python with Tfdeploy and the the NumPy math-and-stats library as the only dependencies. Most of the operations you can perform in TensorFlow can also be performed in Tfdeploy, and you can extend the behaviors of the library by way of standard Python metaphors (such as overloading a class).

Now the bad news: Tfdeploy doesn’t support GPU acceleration, if only because NumPy doesn’t do that. Tfdeploy’s creator suggests using the gNumPy project as a possible replacement.


Writing batch jobs is generally only one part of processing heaps of data; you also have to string all the jobs together into something resembling a workflow or a pipeline. Luigi, created by Spotify and named for the other plucky plumber made famous by Nintendo, was built to “address all the plumbing typically associated with long-running batch processes.”

With Luigi, a developer can take several different unrelated data processing tasks — “a Hive query, a Hadoop job in Java, a Spark job in Scala, dumping a table from a database” — and create a workflow that runs them, end to end. The entire description of a job and its dependencies are created as Python modules, not as XML config files or another data format, so it can be integrated into other Python-centric projects.


If you’re adopting Kubernetes as an orchestration system for machine learning jobs, the last thing you want is for the mere act of using Kubernetes to create more problems than it solves. Kubelib provides a set of Pythonic interfaces to Kubernetes, originally to aid with Jenkins scripting. But it can be used without Jenkins as well, and it can do everything exposed through the kubectl CLI or the Kubernetes API.


Let’s not forget about this recent and high-profile addition to the Python world, an implementation of the Torch machine learning framework. PyTorch doesn’t only port Torch to Python, but adds many other conveniences, such as GPU acceleration and a library that allows multiprocessing to be done with shared memory (for partitioning jobs across multiple cores). Best of all, it can provide GPU-powered replacements for some of the unaccelerated functions in NumPy.

Source: InfoWorld Big Data

IDG Contributor Network: Bringing embedded analytics into the 21st century

IDG Contributor Network: Bringing embedded analytics into the 21st century

Software development has changed pretty radically over the last decade. Waterfall is out, Agile is in. Slow release cycles are out, continuous deployment is in. Developers avoid scaling up and scale out instead. Proprietary integration protocols have (mostly) given way to open standards.

At the same time, exposing analytics to customers in your application has gone from a rare, premium offering to a requirement. Static reports and SOAP APIs that deliver XML files just don’t cut it anymore.

And yet, the way that most embedded analytics systems are designed is basically the same as it was 10 years ago: Inflexible, hard to scale, lacking modern version control, and reliant on specialized, expensive hardware.

Build or Buy?

It’s no wonder that today’s developers often choose to build embedded analytics system in-house. Developers love a good challenge, so when faced with the choice between an outdated, off-the-shelf solution and building for themselves, they’re going to get to work.

But expectations for analytics have increased, and so even building out the basic functionality that customers demand can sidetrack engineers (whose time isn’t cheap) for months. This is to say nothing of the engineer-hours required to maintain a homegrown system down the line. I simply don’t believe that building it yourself is the right solution unless analytics is your core product.

So what do you do?

Honestly, I’m not sure. Given the market opportunity, I think it’s inevitable that more and more vendors will move into the space and offer modern solutions. And so I thought I’d humbly lay out 10 questions embedded analytic buyers should ask about the solutions they’re evaluating.

  1. How does the solution scale as data volumes grow? Does it fall down or require summarization when dealing with big data?
  2. How does the tool scale to large customer bases? Is supporting 1,000 customers different than supporting 10?
  3. Do I need to maintain specialized ETLs and data ingestion flows for each customer? What if I want to change the ETL behavior? How hard is that?
  4. What’s the most granular level that customers can drill to?
  5. Do I have to pay to keep duplicated data in a proprietary analytics engine? If so, how much latency does that introduce? How do things stay in sync?
  6. Can I make changes to the content and data model myself or is the system a black box where every change requires support or paid professional services?
  7. Does it use modern, open frameworks like HTML5, Javascript, iFrame, HTTPS and RESTful APIs?
  8. Does the platform offer version control? If so, which parts of the platform (data, data model, content, etc.) are covered by version control?
  9. How customizable is the front-end? Can fonts, color palettes, language, timezones, logos, and caching behavior all be changed? Can customization be done on a customer-by-customer basis or is it one template for all customers?
  10. How much training is required for admins and developers? And how intuitive is the end-user interface?

No vendor that I know of has the “right” answer to all these questions (yet), but they should be taking these issues seriously and working toward these goals.

If they’re not, you can bet your engineers are going to start talking about how they could build something better in a week. HINT: They actually can’t, but good luck winning that fight 😉

This article is published as part of the IDG Contributor Network. Want to Join?

Source: InfoWorld Big Data

6 reasons stores can't give you real-time offers (yet)

6 reasons stores can't give you real-time offers (yet)

Like most hardcore people, in the car I roll with my windows down and my radio cranked up to 11—tuned to 91.5, my local NPR station, where Terry Gross recently interviewed Joseph Turow, author of “The Aisles Have Eyes.” Turow reports that retailers are using data gathered from apps on your phone and other information to change prices on the fly.

Having worked in this field for a while, I can tell you that, yes, they’re gathering any data they can get. But the kind of direct manipulation Turow claims, where the price changes on the shelf before your eyes, isn’t yet happening on a wide scale. (Full disclosure: I’m employed by LucidWorks, which offers personalized/targeted search and machine-learning-assisted search as features in products we sell.)

Why not? I can think of a number of reasons.

1. Technology changes behavior slowly

Printers used to be a big deal. There were font and typesetting wars (TrueType, PostScript, and so on), and people printed out pages simply to read comfortably. After all, screen resolutions were low and interfaces were clunky; scanners were cumbersome and email was unreliable. Yet even after these obstacles were overcome, the old ways stuck around. There are still paper books (I mailed all of mine to people in prison), and the government still makes me print things and even get them notarized sometimes.

Obviously, change happens: I now tend to use Uber even if a cab is waiting, and I don’t bother to check the price difference, regardless of surge status. Also, today I buy all my jeans from Amazon—yet still use plastic cards for payment. The clickstream data collected on me is mainly used for email marketing and ad targeting, as opposed to real-time sales targeting.

2. Only some people can be influenced

For years I put zero thought into my hand soap purchase because my partner bought it. Then I split with my partner and became a soap buyer again. I did some research and found a soap that didn’t smell bad, didn’t have too many harsh chemicals, and played lip service to the environment. Now, to get me to even try something else you’d probably have to give it to me for free. I’m probably not somebody a soap company wants to bother with. I’m not easily influenced.

I’m more easily influenced in other areas—such as cycling and fitness stuff—but those tend to be more expensive, occasional purchases. To reach me the technique needs to be different than pure retailing.

3. High cost for marginal benefit

Much personalization technology, such as the analytics behind real-time discounts, is still expensive to deploy. Basic techniques such as using my interests or previously clicked links to improve the likelihood of my making a purchase are probably “effective enough” for most online retailers.

As for brick and mortar, I have too many apps on my phone already, so getting me to download yours will require a heavy incentive. I also tend to buy only one item because I forgot to buy it online—then I leave—so the cost to overcome my behavioral inertia and influence me will be high.

4. Pay to play

Business interests limit the effectiveness of analytics in influencing consumers, mainly in the form of slotting fees charged to suppliers who want preferential product placement in the aisles.

Meanwhile, Target makes money no matter what soap I buy there. Unless incentivized, it’s not going to care which brand I choose. Effective targeting may require external data (like my past credit card purchases at other retailers) and getting that data may be expensive. The marketplace for data beyond credit card purchases is still relatively immature and fragmented.

5. Personalization is difficult at scale

For effective personalization, you must collect or buy data on everything I do everywhere and store it. You need to run algorithms against that data to model my behavior. You need to identify different means of influencing me. Some of this is best done for a large group (as in the case of product placement), but doing it for individuals requires lots of experimentation and tuning—and it needs to be done fast.

Plus, it needs to be done right. If you bug me too much, I’m totally disabling or uninstalling your app (or other means of contacting me). You need to make our relationship bidirecitonal. See yourself as my concierge, someone who finds me what I need and anticipates those needs rather than someone trying to sell me something. That gets you better data and stops you from getting on my nerves. (For the last time, Amazon, I’ve already purchased an Instant Pot, and it will be years before I buy another pressure cooker. Stop following me around the internet with that trash!)

6. Machine learning needs to mature

Machine learning is merely math; much of it isn’t even new. But applying it to large amounts of behavioral data—where you have to decide which algorithm to use, which optimizations to apply to that algorithm, and which behavioral data you need in order to apply it—is pretty new. Most retailers are used to buying out-of-the-box solutions. Beyond (ahem) search, some of these barely exist yet, so you’re stuck rolling your own. Hiring the right expertise is expensive and fraught with error.

Retail reality

To influence a specific, individual consumer who walks into a physical store, the cost is high and the effectiveness is low. That’s why most brick-and-mortar businesses tend to use advanced data—such as how much time people spend in which part of the store and what products influenced that decision—at a more statistical level to make systemic changes and affect ad and product placement.

Online retailers have a greater opportunity to influence people at a personal level, but most of that opportunity is in ad placement, feature improvements, and (ahem) search optimization. As for physical stores, eventually, you may well see a price drop before your eyes as some massive cloud determines the tipping point for you to buy on impulse. But don’t expect it to happen anytime soon.

Source: InfoWorld Big Data