Fork me on GitHub

Installing R-studio on Azure HDInsight

Introduction

Running R on Spark brings the advantage of distributed in memory computing and access to data stored in HDFS. In this context, it is convenient to have access to R-studio hosted from the R-server. In this post I will show a few ways how to install R-studio on the edge node of a HDInsight Spark cluster.

Prerequisites:

  • Azure HDInsight Premium cluster.
  • During creation the HDInsight cluster should be configured to run Spark (the only other alternative is Hadoop).
  • An R-server running on the cluster's edge node.

Method 1: Manual installation

This method is described by Microsoft.

I will not repeat what they have already written on that page, but in short the steps consist of using ssh to log in to the edge node and run an installation script.

Method 2: Use Script Actions from the portal

Another way, which does not require connecting by ssh, is to use Script Actions from the Azure portal.

From the Azure portal, navigate to the HDInsight cluster, and click “Script Actions”. You should already see the R-server in the script action history. drawing

Give it a short while, then you are done!

Method 3: Use Script Actions from a template

A fully automatic way to install R-studio is to include the installation as a Script Action in a cluster deployment script.

Look for the the document that configures the R-server in the array of resources. What you want to edit is the installScriptActions section. Add the same link to the R-studio installation script that we used in the previous method as the uri. Provide the other parameters as in the code extract below, or modify to fit your purpose.

In [ ]:
    "resources": [{
        "name": "[concat(parameters('clusterName'),'/R-Server')]",
        "type": "Microsoft.HDInsight/clusters/applications",
        "apiVersion": "[variables('clusterApiVersion')]",
        "dependsOn": ["[concat('Microsoft.HDInsight/clusters/',parameters('clusterName'))]"],
        "properties": {
            "marketPlaceIdentifier": "Microsoft.RServerForHDInsight.8.0.3",
            "computeProfile": {
                "roles": [{
                    "name": "edgenode",
                    "targetInstanceCount": 1,
                    "hardwareProfile": {
                        "vmSize": "Standard_D4_v2"
                    }
                }]
            },
            "installScriptActions": [
                {
                 "name": "[concat('rstudio-install-v0','-' ,uniquestring(variables('applicationName')))]",
                 "uri": "http://mrsactionscripts.blob.core.windows.net/rstudio-server-community-v01/InstallRStudio.sh",
                 "roles": ["edgenode"]
             }
            ],
            "uninstallScriptActions": [],
            "httpsEndpoints": [],
            "applicationType": "RServer"
        }
    },

This way every time you use this template to deploy a cluster R-studio will be set up for you. No further action is required.

What is next

To make use of the R-studio we need to access it through a browser. From your workstation, make a tunnel for port 8787 to the edge node.

In [ ]:
ssh -L localhost:8787:localhost:8787 <USER NAME>@r-server.<CLUSTER NAME>-ssh.azurehdinsight.net

Open a web browser and enter http://localhost:8787/. You will be prompted to enter the SSH username and password to connect to the cluster.

You are now ready to run some analysis!

Share on: LinkedInTwitterFacebookGoogle+Email

Comments !

blogroll

social