Today we are pleased to announce the refresh of the Apache Spark support on Azure HDInsight clusters. Spark is available on HDInsight through custom script action and today we are updating it to support the latest version of Spark 1.2. The previous version supported version 1.0. This update also adds Spark SQL support to the package.

Spark 1.2 script action requires latest version of HDInsight clusters 3.2. Older HDInsight clusters will get previous version of Spark 1.0 when customized with Spark script action.

Follow the below steps to create Spark cluster using Azure Portal:

Chose New HDinsight Hadoop cluster (other cluster types are also supported) using the custom create option.
Select version 3.2 of the cluster.
Complete the rest of the steps of the wizard to specify cluster name, such as it’s storage account and other configuration.
In the last step of the configuration wizard, add the Spark script action: https://hdiconfigactions.blob.core.windows.net/sparkconfigactionv03/spark-installer-v03.ps1.

Click the check mark to create the cluster. When the operation completes, your HDInsight cluster will have Spark 1.2 installed on it.

Running Spark SQL queries in Spark Shell

The new version of Spark package includes Spark SQL. Spark SQL allows you to use Spark to run relational queries expressed in SQL, HiveQL, or Scala. Using this functionality, you can run Hive queries in Spark shell.

Open Remote Desktop Connection to the cluster. For instructions, see Connect to HDInsight clusters using RDP.
Open the Hadoop Command Line using a Desktop shortcut, and navigate to the location where Spark is installed, C:\apps\dist\spark-1.2.0.
Run the following command to start the Spark shell.

.\bin\spark-shell −−master yarn

On the Scala prompt, set the Hive context. This is required to work with Hive queries using Spark.

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

Run a Hive query and print the output to the console. The query retrieves data from a sample Hive table that exists on every HDInsight cluster. It queries devices of a specific make and limits the number of records retrieved to 20. The triple quotes is a Scala syntax to allow embedded quotes in the string.

hiveContext.sql(“””SELECT * FROM hivesampletable WHERE devicemake LIKE “HTC%” LIMIT 20″””).collect().foreach(println)

You should see an output like the following:

You can find more information on installation steps and usage of Spark in the documentation links below. You can also install R, Solr and Giraph using script actions as well as create your own:

How to install Spark 1.2 on Azure HDInsight clusters

Running Spark SQL queries in Spark Shell

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112