Connecting to a 3.0.1+ Hortonworks Sandbox
As of this writing the latest available version of Hortonworks Data Platform (HDP) on Hortonworks Sandbox VM is 3.0.1. This guide was created for that.
Start and configure the Sandbox VM
- Download the Hortonworks Sandbox VM for VirtualBox from the Download website. 
- Import the OVA packaged VM to your virtualization environment (Virtualbox is covered in this guide). 
- Start the VM. After powering it on, you have to select the first option from the boot menu, then wait for the boot to complete. 
- Log in to the VM. You can do this by switching to the login console (Alt+F5), or even better via SSH on localhost port - 2122. It is important to note that there are 2 exposed SSH ports on the VM, one belongs to the VM itself (- 2122), while the other (- 2222) belongs to a Docker container running inside the VM. The username is- root, the password is- hadoopfor both.
- Edit the - /sandbox/proxy/generate-proxy-deploy-script.shby include the following ports in the- tcpPortsHDParray 8025, 8030, 8050, 10020, 50010.- vi /sandbox/proxy/generate-proxy-deploy-script.sh
- Find - tcpPortsHDPvariable, leaving the other values in place, add to the hashtable assignment:- [8025]=8025 [8030]=8030 [8050]=8050 [10020]=10020 [50010]=50010
 
- Run the edited generate-proxy-deploy-script.sh via - /sandbox/proxy/generate-proxy-deploy-script.sh- This will re-create the /sandbox/proxy/proxy-deploy.sh script along with config files in /sandbox/proxy/conf.d and /sandbox/proxy/conf.stream.d, thus exposing the additional ports added to the tcpPortsHDPhashtable in previous step.
 
- This will re-create the /sandbox/proxy/proxy-deploy.sh script along with config files in /sandbox/proxy/conf.d and /sandbox/proxy/conf.stream.d, thus exposing the additional ports added to the 
- Run the /sandbox/proxy/proxy-deploy.sh script via - /sandbox/proxy/proxy-deploy.sh- Running the docker pscommand, will show an instance named sandbox-proxy and the ports it has exposed. The inserted values to thetcpPortsHDPhashtable should be shown in the output, looking like 0.0.0.0:10020->10020/tcp.
 
- Running the 
- These changes only made sure that the referenced ports of the Docker container are accessible on the respective ports of the VM. Since the network adapter of the VM is attached to NAT, these ports are not accessible from your local machine. To make them available you have to add the port forwarding rules listed below to the VM. In VirtualBox you can find these settings under Machine / Settings / Network / Adapter 1 / Advanced / Port Forwarding. - Name - Protocol - Host IP - Host Port - Guest IP - Guest Port - resourcetracker - TCP - 127.0.0.1 - 8025 - 8025 - resourcescheduler - TCP - 127.0.0.1 - 8030 - 8030 - resoucemanager - TCP - 127.0.0.1 - 8050 - 8050 - jobhistory - TCP - 127.0.0.1 - 10020 - 10020 - datanode - TCP - 127.0.0.1 - 50010 - 50010 
- Edit your local - hostsfile (on your host operating system, not inside the VM), add- sandbox.hortonworks.comand- sandbox-hdp.hortonworks.comto your localhost entry. At the end it should look something like this:- 127.0.0.1 localhost sandbox.hortonworks.com sandbox-hdp.hortonworks.com
- Reset Ambari access. Use an SSH client to login to localhost as root, this time using port - 2222! (For example, on OS X or Linux, use the command- ssh root@localhost -p 2222, password:- hadoop)- (At first login you have to set a new root password, do it and remember it.)
- Run ambari-admin-password-resetas root user.
- Provide a new admin password for Ambari.
- Run ambari-agent restart.
 
- Open the Ambari website: - http://sandbox.hortonworks.com:8080- Login with adminand the password you chose in the previous step.
- Navigate to the YARN / Configs / Memory configuration page.
- Edit the Memory Node Setting to at least 7 GB and click Override.
- User will be prompted to create a new "YARN Configuration Group", enter a new name.
- On the "Save Configuration Group" dialog, click the Manage Hosts button.
- On the "Manage YARN Configuration Groups page" take the node in the "Default" group and add the node into the group created in the "YARN Configuration Group" name step.
- "Warning" Dialog will open requesting adding notes click the Save button.
- "Dependent Configurations" dialog  will open with Ambari providing recommendations to modify some related properties automatically. If so, untick tez.runtime.io.sort.mbto keep its original value. Click the Ok button.- Ambari may open a "Configurations" page suggesting stuff. Review accordingly, but this is out of the scope of this document, so just click Proceed Anyway.
 
 
- Navigate to the Hive / Configs / Advanced configuration page.
- In the Custom hiveserver2-site section. The - hive.security.authorization.sqlstd.confwhitelist.appendneeds to be added via the Add Property... and be set to the following (it must not contain whitespaces):- radoop\.operation\.id|mapred\.job\.name|hive\.warehouse\.subdir\.inherit\.perms|hive\.exec\.max\.dynamic\.partitions|hive\.exec\.max\.dynamic\.partitions\.pernode|spark\.app\.name|hive\.remove\.orderby\.in\.subquery
- Save the configuration and restart all affected services. More details on - hive.security.authorization.sqlstd.confwhitelist.appendcan be found in Hadoop Security/Configuring Apache Hive SQL Standard-based authorization section.
 
- Login with 
Setup the connection in RapidMiner Studio
- Click on  New Connection button and choose New Connection button and choose Import from Cluster Manager option to create the connection directly from the configuration retrieved from Ambari. Import from Cluster Manager option to create the connection directly from the configuration retrieved from Ambari.
- On the Import Connection from Cluster Manager dialog enter - Cluster Manager URL: http://sandbox-hdp.hortonworks.com:8080
- Username: admin
- Password: password used in Reset Amabari step.
 
- Cluster Manager URL: 
- Click Import Configuration 
- Hadoop Configuration Import dialog will open up - If successful click Next button and Connection Settings dialog will open.
- If failed click Back button and review above steps and logs to solve issue(s).
 
- On the Connection Settings Dialog, which opens when Next button is clicked from step above. 
- Connection Name can stay defaulted or be changed by user. 
- Global tab - Hadoop Version should be Hortonworks HDP 3.x
- Set Hadoop username to hadoop.
 
- Hadoop Version should be 
- Hadoop tab - NameNode Address should be sandbox-hdp.hortonworks.com
- NameNode Port should be 8020
- Resource Manager Address should be sandbox-hdp.hortonworks.com
- Resource Manager Port should be 8050
- JobHistory Server Address should be sandbox-hdp.hortonworks.com
- JobHistory Server Port should be 10020
- Advanced Hadoop Parameters add the following parameters: - Key - Value - dfs.client.use.datanode.hostname- true- (This parameter is not required when using the Import Hadoop Configuration Files option): - Key - Value - mapreduce.map.java.opts- -Xmx256m
 
- NameNode Address should be 
- Spark tab - Spark Version select Spark 2.3 (HDP)
- Check Use default Spark path
 
- Spark Version select 
- Hive tab - Hive Version should be HiveServer3 (Hive 3 or newer)
- Hive High Availability should be checked
- ZooKeeper Quorum should be sandbox-hdp.hortonworks.com:2181
- ZooKeeper Namespace should be hiverserver2
- Database Name should be default
- JDBC URL Postfix should be empty
- Username should be hive
- Password should be empty
- UDFs are installed manually and Use custom database for UDFs are both unchecked
- Hive on Spark/Tez container reuse should be checked
 
- Hive Version should be 
- Click OK button, the Connection Settings dialog will close 
- User can test the connection created above onn Manage Radoop Connections page select the connection created and clicking the Quick Test and  Full Test... buttons. Full Test... buttons.
If errors occur durning testing confirm that necessary Components are started correctly at http://localhost:8080/#/main/hosts/sandbox-hdp.hortonworks.com/summary.
