第一步:准备一台机器(貌似这个有点多余
)我的是ubuntu11.10第二步:下载一个hadoop的版本,我用的是hadoop0.21.0
第三步:安装java的运行环境:
点击(此处)折叠或打开
- sudo apt-get install openjdk-6-jdk
点击(此处)折叠或打开
- groupadd hadoop
- useradd -g hadoop hadoop
- passwd hadoop
点击(此处)折叠或打开
- # su - hadoop
- $ ssh-keygen -t rsa -P ""
- $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
点击(此处)折叠或打开
- # cd /opt
- # tar xzf hadoop-0.21.0.tar.gz
- # ln -s hadoop-0.21.0 hadoop
- # chown -R hadoop:hadoop hadoop
点击(此处)折叠或打开
- # su - hadoop
- $ vim ~/.bashrc
- 在其中添加:
- export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk
- export HADOOP_HOME=/opt/hadoop
- export PATH=$PATH:$HADOOP_HOME/bin
打开/opt/hadoop/conf/hadoop-env.sh,设置JAVA_HOME环境变量:
点击(此处)折叠或打开
- export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk
配置hadoop.tmp.dir的valuel,将它作为元数据镜像文件的存储目录,可以配置多个以逗号隔开。
点击(此处)折叠或打开
- vi /opt/hadoop/conf/core-site.xml
- 加入:
- <!-- In: conf/core-site.xml -->
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/app/hadoop/tmp</value>
- <description>A base for other temporary directories.</description>
- </property>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:54310</value>
- <description>The name of the default file system. A URI whose
- scheme and authority determine the FileSystem implementation. The
- uri's scheme determines the config property (fs.SCHEME.impl) naming
- the FileSystem implementation class. The uri's authority is used to
- determine the host, port, etc. for a filesystem.</description>
- </property>
点击(此处)折叠或打开
- <!-- In: conf/mapred-site.xml -->
- <property>
- <name>mapred.job.tracker</name>
- <value>localhost:54311</value>
- <description>The host and port that the MapReduce job tracker runs
- at. If "local", then jobs are run in-process as a single map
- and reduce task.
- </description>
- </property>
点击(此处)折叠或打开
- <!-- In: conf/hdfs-site.xml -->
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- <description>Default block replication.
- The actual number of replications can be specified when the file is created.
- The default is used if replication is not specified in create time.
- </description>
- </property>
点击(此处)折叠或打开
- $ /opt/hadoop/bin/hadoop namenode -format
点击(此处)折叠或打开
- $ /opt/hadoop/bin/start-all.sh
点击(此处)折叠或打开
- $ jps
- 12989 SecondaryNameNode
- 12167 DataNode
- 13109 JobTracker
- 11350 NameNode
- 15822 Jps
- 13922 TaskTracker
点击(此处)折叠或打开
- $ /opt/hadoop/bin/stop-all.sh
第一步:准备数据:
找几个文本文件保存到/tmp/test目录中!
第二步:复制数据到hdfs:
点击(此处)折叠或打开
- $ /opt/hadoop/bin/hadoop dfs -copyFromLocal /tmp/test /usr/hadoop/test
点击(此处)折叠或打开
- $ cd /opt/hadoop/
- $ bin/hadoop jar hadoop*examples*.jar wordcount /usr/hadoop/test /usr/hadoop/test-output
这个命令会读取HDFS目录/usr/hadoop/test目录中的所有文件,处理后,将结果保存在/usr/hadoop/test-output目录中。用以下命令查看HDFS目录/usr/hadoop/test-output目录中的文件:
点击(此处)折叠或打开
- $ /opt/hadoop/bin/hadoop dfs -ls /usr/hadoop/test-output
点击(此处)折叠或打开
- $ /opt/hadoop/bin/hadoop dfs -cat /usr/hadoop/test-output/part-r-00000
点击(此处)折叠或打开
- $ mkdir /tmp/test-output
- $ /opt/hadoop/bin/hadoop dfs -copyToLocal /usr/hadoop/test-output /tmp/test-output
首先就是出现了ssh设置本机免密码登录的时候,老是提示输入密码!
尝试了各种策略,包括更改/home/hadoop/.ssh/authorized_keys的权限,有人遇到类似情况的时候更改这个文件的属性为644,就解决了这个问题,第一步大家可以尝试这个办法。但是我改完的之后问题依然存在,怎么办捏?
百思不得其解,想到的办法就是重装ssh,将原来的ssh卸掉:
sudo apt-get autoremove openssh-server
sudo apt-get install openssh-server
然后问题解决!
问题二:有关于hdfs格式化失败,如果要重现格式化的时候一定要将元数据备份目录下的文件清空!