Today, at the Strata+Hadoop World conference, we are pleased to announce the first Microsoft Azure managed service to be available on Linux, HDInsight. In addition to this, we are announcing updates to HDInsight, like the general availability of Apache Storm, that continue to make Hadoop simpler and easier to use.
Microsoft is committed to openness, underscored by recent announcements including 20% of Azure Virtual Machines run on Linux, “Microsoft loves Linux”, open sourcing the .NET Core, our contributions to Apache Hadoop, and support for Docker containers. This approach to openness made it a natural decision to give customers the choice of running their big data Hadoop workloads on both Windows and Linux. With HDInsight growing 20 percent every month and being one of the fastest growing services in Azure, we had a steady stream of requests to support Linux so our customers can take advantage of the ease-of-use and productivity benefits of Hadoop-as-a service. Responding to this demand, customers can now choose Windows or Linux operating systems when they deploy Hadoop in Azure. Both options are first class citizens, offering simple deployment, SLA, technical support for the entire stack, ranging from Hadoop to the operating system.
We are excited about what this means for customers who might already be familiar with Hadoop on Linux on-premises. Customers can now augment their current deployments with the cloud and leverage all of their existing skillsets and tooling (documentation, samples, and templates) to do so. Within a hybrid environment, customers can mix the control and flexibility of on-premises deployments with the elasticity and redundancy of the cloud. In conjunction with Hortonworks HDP 2.2’s out-of-the-box Falcon connectors to Azure, HDInsight on Linux can make hybrid scenarios an easy transition.
Being committed to openness at Microsoft means we will continue to collaborate with others in the industry and being open in how we listen to our customers. That is what makes us able to say “Microsoft ♥ Linux” and “Microsoft ♥ Hadoop on Linux”!
Additionally, we are announcing a series of updates for the Azure HDInsight service that continue to make Hadoop simpler and easier to use. This includes making Storm generally available giving customers a simple way to deploy real-time capabilities on Hadoop in a few clicks, Visual Studio IDE integration that empower developers to write Storm topologies in .NET or JAVA, support for Hadoop 2.6, new Virtual Machine sizes, cluster scaling, and built-in integration to Azure DocumentDB.
Last October we announced the public preview of Apache Storm as clusters on HDInsight. Storm is an open source project in the Hadoop ecosystem which gives users access to a stream analytics platform that can reliably process millions of events in real-time. It is ideal to solve real-time challenges like fraud detection, click stream analysis, financial alerts, telemetry from connected sensors & devices (IoT), and social analytics.
Today, we are announcing the general availability of Storm as a deployment option. Customers can now benefit from a fast and easy way to deploy real-time analytics in just a few clicks and within minutes. You will also have Microsoft’s 99.9% service level agreement for uptime and elastic scale powered by the Azure cloud. Any amount of data can be ingested through Event Hubs or Apache Kafka and processed through Storm.
Part of making big data more accessible includes enabling developers to be productive in the environments they know best. This is why we are also making Storm available for both .NET and Java and the ability to develop, deploy, and debug real-time applications for Storm directly in Visual Studio. Developers can even mix spouts written in other languages like Java, meaning you can leverage the vast universe of existing spouts and bolts as part of your topology.
In conjunction with Hortonworks availability of HDP 2.2, Microsoft is also releasing the next version of HDInsight built on Hadoop 2.6, Hive and Pig 0.14, HBase 0.98.4 and more. Teaming up with Hortonworks and the open source community, this version of HDInsight includes work done on Stinger.next to speed up Hadoop queries with the goal of achieving sub-second response times. The first phase of Stinger.next is now in HDInsight running Hive 0.14. Pig can now process data in ORC files, and can leverage Tez as an execution engine.
To better support customers who are running increasingly large big data workloads in Azure, we’re increasing HDInsight availability on a greater number of Virtual Machines types and sizes. HDInsight can now utilize A2 to A7 sizes built for general purposes, D-Series nodes that feature solid-state drives (SSDs) and 60-percent faster processors, and A8 and A9 sizes that have InfiniBand support for fast networking. HBase for HDInsight customers can benefit from the higher memory from the D-Series to increase performance. Storm for HDInsight customers can also benefit from higher memory for loading larger reference data and faster CPU’s for higher throughput. Pricing details can be found here.
HDInsight is also delivering the general availability of a highly requested feature, “Cluster Scaling” in Azure HDInsight. With this feature, you will be able to easily change the number of nodes of a running HDInsight cluster without having to delete and recreate a new cluster. Initially, only Hadoop query and Storm will have this ability with HBase to follow shortly after. This feature is available in the Azure Management Portal today.
Finally, we are also integrating Microsoft’s first party NoSQL Azure service, DocumentDB with HDInsight through a Hadoop connector. This will allow DocumentDB to either be an input source to run a Hadoop query or a place where the output of Hive, Pig, and MapReduce jobs are sent to. DocumentDB is Microsoft’s fully managed, scalable NoSQL document database service designed from the ground up to support JSON and JavaScript directly inside the database engine. With HBase and DocumentDB, HDInsight will have various NoSQL technologies it can integrate with as part of a broader solution.
Exciting times are ahead for Azure HDInsight! Stay tuned on this blog for more details in the days to come. Also, read the Official Microsoft Blog and the resources below:
HDInsight on Linux
- Channel 9 Video: HDInsight on Linux
- Introduction to HDInsight which gives recommendations when to use Windows or Linux
- Getting started using HDInsight on Linux.
- Tips for HDInsight on Linux
- Develop Python MapReduce jobs for Azure HDInsight on Linux
- Use SSH keys with Linux-based Hadoop on HDInsight
Storm for HDInsight
- Channel 9 Video General Availability of Storm
- Channel 9 Video Introduction to Storm
- Overview of Storm in Azure HDInsight
- Getting started with Storm for Azure HDInsight
- Deploy and monitor Storm topologies
- Develop C# topologies for Storm using Visual Studio
- Develop Java-based topologies for Storm
- Determine Twitter trending topics with Storm
- Analyze real-time sensor data using Storm and HBase
- Develop streaming data processing applications with SCP.NET and C# on Storm
HDInsight on Hadoop 2.6
Cluster Scaling
DocumentDB
General information on HDInsight