As you browse the tree structure, you may notice that you do not have access to get to certain files: $ ls /hdfs/org/some/restricted/folder HDFS permissions, by default, are very liberal. If you set up everything correctly for the hdfs command as above, you should be able to mount and use your HDFS filesystem like this: $ sudo hadoop-fuse-dfs dfs://:8020 /hdfsĭrwxr-xr-x 10 99 99 4.0K May 3 23:23 hbaseĭrwxrwxrwx 6 hdfs 99 4.0K May 24 00:12 tmpĭrwxr-xr-x 11 mapred 99 4.0K May 6 00:07 userĪ Note About Permissions (Security by Obscurity!) To do this, first create a mountpoint: $ sudo mkdir -p /hdfs Try FUSEįor the next level, let’s try mounting HDFS as a usable filesystem. NOTE: If something is wrong, you will either get errors, OR the command will simply return the results of ls in your current working directory.įrom here, you can simply read the help for the hdfs command. To make sure your configuration works, lets use the hdfs command to list our top-level directories: $ hdfs dfs -ls /ĭrwxrwxrwx - hdfs supergroup 0 00:12 /tmpĭrwxr-xr-x - mapred supergroup 0 00:07 /user Nameserver 10.10.0.2 <- this is your default DNS server nameserver 10.10.10.250 <- this would be consul For this to happen in Cloudera, ensure that one of your Consul DNS servers is listed before your externally resolving DNS server. Now that you have your configuration in the right place, make sure you can actually resolve the names it uses. This can simply be added to your favorite shell profile config: export HADOOP_CONF_DIR="/etc/hadoop/" $ sudo mv /etc/hadoop/hadoop-conf /etc/hadoop/įor the HDFS tools to use your configuration, the HADOOP_CONF_DIR environment variable needs to be set. Inflating: /etc/hadoop/hadoop-conf/hadoop-env.sh Inflating: /etc/hadoop/hadoop-conf/ssl-client.xml Inflating: /etc/hadoop/hadoop-conf/log4j.properties Inflating: /etc/hadoop/hadoop-conf/topology.py Inflating: /etc/hadoop/hadoop-conf/topology.map Inflating: /etc/hadoop/hadoop-conf/core-site.xml Inflating: /etc/hadoop/hadoop-conf/hdfs-site.xml Once you download this zip file, put its contents into a subfolder of /etc/hadoop as follows: $ sudo unzip hdfs-clientconfig.zip -d /etc/hadoop In Cloudera, you can get the config through the CDH Manager UI: One exception to this rule is a Go – based library/client written by Colin Marc called (drumroll please…) HDFS. One little quirk about working with the Hadoop command-line tools is you need to use local config files – so you can’t just provide the URL to your nameserver and just connect. Let’s put Java on there: sudo apt-get install openjdk-7-jre ![]() If you try running the hdfs command, you’ll get the following error: Error: JAVA_HOME is not set and could not be found. ![]() One prerequisite that apt fails to install is Java. ![]() Since I use both the hdfs command and FUSE, I just install FUSE, which installs both tools. $ sudo dpkg -i cdh5-repository_1.0_all.deb First, we need to add Cloudera’s repo to apt: $ wget ![]() In this tutorial, we are working with Cloudera 5.5.1, using an Ubuntu (Trusty Tahr) instance to connect to it. Here’s how you get started: Install your tools It’s pretty straightforward once you get the appropriate tools working, but it can be a bit counterintuitive to get started (at least it was when I was learning it). For the last month or so, I’ve been working on a couple of projects that have required me to move files in and out of HDFS.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |