Introduction
Running R on Spark brings the advantage of distributed in memory computing and access to data stored in HDFS. In this context, it is convenient to have access to R-studio hosted from the R-server. In this post I will show a few ways how to install R-studio on the edge node of a HDInsight Spark cluster.
Prerequisites:
- Azure HDInsight Premium cluster.
- During creation the HDInsight cluster should be configured to run Spark (the only other alternative is Hadoop).
- An R-server running on the cluster's edge node.
Method 1: Manual installation
This method is described by Microsoft.
I will not repeat what they have already written on that page, but in short the steps consist of using ssh to log in to the edge node and run an installation script.
Method 2: Use Script Actions from the portal
Another way, which does not require connecting by ssh, is to use Script Actions from the Azure portal.
From the Azure portal, navigate to the HDInsight cluster, and click “Script Actions”. You should already see the R-server in the script action history.
- Click on “Submit New”.
- Give the action a name, e.g., RStudio.
- Fill the “Bash script URI” with http://mrsactionscripts.blob.core.windows.net/rstudio-server-community-v01/InstallRStudio.sh
- Make sure that only “Edge nodes” is ticked.
- Click “Create”.
Give it a short while, then you are done!
Method 3: Use Script Actions from a template
A fully automatic way to install R-studio is to include the installation as a Script Action in a cluster deployment script.
Look for the the document that configures the R-server in the array of resources. What you want to edit is the installScriptActions section. Add the same link to the R-studio installation script that we used in the previous method as the uri. Provide the other parameters as in the code extract below, or modify to fit your purpose.
"resources": [{
"name": "[concat(parameters('clusterName'),'/R-Server')]",
"type": "Microsoft.HDInsight/clusters/applications",
"apiVersion": "[variables('clusterApiVersion')]",
"dependsOn": ["[concat('Microsoft.HDInsight/clusters/',parameters('clusterName'))]"],
"properties": {
"marketPlaceIdentifier": "Microsoft.RServerForHDInsight.8.0.3",
"computeProfile": {
"roles": [{
"name": "edgenode",
"targetInstanceCount": 1,
"hardwareProfile": {
"vmSize": "Standard_D4_v2"
}
}]
},
"installScriptActions": [
{
"name": "[concat('rstudio-install-v0','-' ,uniquestring(variables('applicationName')))]",
"uri": "http://mrsactionscripts.blob.core.windows.net/rstudio-server-community-v01/InstallRStudio.sh",
"roles": ["edgenode"]
}
],
"uninstallScriptActions": [],
"httpsEndpoints": [],
"applicationType": "RServer"
}
},
ssh -L localhost:8787:localhost:8787 <USER NAME>@r-server.<CLUSTER NAME>-ssh.azurehdinsight.net
Open a web browser and enter http://localhost:8787/. You will be prompted to enter the SSH username and password to connect to the cluster.
You are now ready to run some analysis!
Comments !