news

Post-Hadoop Data and Analytics Head to the Cloud

Spread the love
Data and analytics in the cloud era may favor platforms based on platforms that are more flexible than Hadoop. But that doesn’t mean there’s no future for the early big data technology.

Has Hadoop gone the way of 8-track tapes and Betamax? The technology that engendered so much excitement and optimism about the potential of big data has, at the very least, hit a speed bump as the two remaining independent providers — Cloudera and MapR — are each facing their own crisis.

MapR’s problems are existential. The company won a reprieve from imminent shut down last week, saying that it has signed a letter of intent with a potential buyer. That buyer is performing due diligence now and MapR now faces a July 3 deadline for shutting down if the deal with this buyer doesn’t go through.  

Image: 4x-image - iStockphoto

Cloudera suffered a couple of disappointing quarters and announced its CEO is stepping down — news that was not well received What happened to Hadoop?

“Hadoop’s biggest problem is that it was built to be a giant single source of data,” Hyoun Park, founder and CEO of research firm Amalgam Insights told InformationWeek in an interview. But it’s challenging to use Hadoop across multiple data centers or multiple clouds. “The assumption with Hadoop is that you have it, and it holds everything you own. That’s a problem in today’s world where you have hundreds of apps.”

Today’s more modern set up has data coming in from hundreds of sources, according to Park, who noted that Looker and Tableau are both adept at handling that kind of data. Both companies were acquired in the last few weeks, Ali Ghodsi, co-founder and CEO of platform-as-a-service company Databricks said Hadoop is not meant for the cloud, because it is not elastic in the same way the cloud is elastic. Databricks was founded as a PaaS distribution of the big data streaming technology Spark but has since evolved to also include many other big data technologies. Ghodsi said that going forward, Hadoop will be more of a niche solution.

“Hadoop is dead in the cloud for sure,” he told InformationWeek. Like mainframes, Hadoop will remain in place where it makes sense. “There are still IBM mainframes around 50 years later. But they are not something you buy and invest in these days.”

Ghodsi said that the cloud offers cheaper and more reliable storage options than are available in the Hadoop File System. He also believes that the old RedHat open source business model of offering on-premises software and selling support for it is headed towards extinction.

“The modern open source model is managed open source software in the cloud,” he said — the kind of service that is offered The model has been a successful one for Databricks so far. While the company is still venture-backed and privately held, the CEO said that Databricks just saw its biggest quarter ever, beating its internal number Cloud-based Hadoop?

But what about Hadoop? Could it operate in the cloud like that? Gartner analyst Adam Ronthal said that while there are some native Hadoop options available in public clouds like AWS, they may not be the best solution for many applications.

“There’s a fair bit of complexity that goes into managing a Hadoop cluster,” he told InformationWeek. Non-Hadoop-based cloud solutions may look simpler and easier to organizations that are evaluating data and analytics solutions. But that doesn’t mean there’s not a place for Hadoop in the future.

Ronthal said that Hadoop is experiencing a “market correction” rather than an existential crisis. There are use cases that Hadoop is really good at, he said. But a few years back, Hadoop was the rock star technology that was the solution to every problem.

“The promises out there 3, 4, or 5 years ago were that Hadoop was going to change the world and redefine how we did data management,” he said. “That statement overpromised and underdelivered. What we are really seeing now is recognition of workloads that Hadoop is really good at, like the data science exploration workloads.”