hadoop cdh5安装

2100阅读 0评论2014-09-29 gagagixi
分类:HADOOP

为满足公司程序运行环境,hadoop集群由原来的1.0版本升级到CDH5版本,又一次集群安装经历,分享给有需要的人。

一、机器准备

    Linux版本CentOs 5.8,x86_64,如果你的linux版本是6.x,也可以参照下面步骤安装;
本人此次安装共准备了5台机器:
192.168.32.70(master),192.168.32.71(slave1),192.168.32.72(slave2),192.168.32.73(slave3),192.168.32.79(slave4);
修改
/etc/sysconfig/network文件中的HOSTNAME,修改为方便记忆的名字,当然你也可以不改,只要你觉得方便就好;
修改/etc/hosts文件(五台机器都要修改):
192.168.32.70 master
192.168.32.71 slave1
192.168.32.72 slave2
192.168.32.73 slave3
192.168.32.79 slave4

二、环境准备

1、打通ssh
>所有机器 ssh-keygen -t rsa 一路按回车;
>在master机器上执行:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys;
>scp文件到其他几台机器:
    scp ~/.ssh/authorized_keys root@slave1:~/.ssh/
    scp ~/.ssh/authorized_keys root@slave2:~/.ssh/
                  ......
    scp ~/.ssh/authorized_keys root@slave4:~/.ssh/
>试验下免密码功能是否正常:

点击(此处)折叠或打开

  1. [root@master hadoop-conf]# ssh slave1
  2. Last login: Wed Sep 24 16:07:12 2014 from master
  3. [root@slave1 ~]#
没有提示输入密码,表示成功了;
2、安装JDK7
>官网下载jdk-7u51-linux-x64.rpm包;
>rpm -ivh jdk-7u51-linux-x64.rpm
>添加环境变量;
vi /etc/profile
增加
     JAVA_HOME=/usr/java/latest
     PATH=$PATH:$JAVA_HOME/bin
     CLASSPATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib
     export JAVA_HOME CLASSPATH
>执行source生效;
source /etc/profile
3、创建hadoop用户
    groupadd hdfs
    useradd hadoop -g hdfs

三、安装cdh5

1、下载rpm安装包
>进入目录/data/tools/ (个人习惯的软件存储目录,你可以自己随便选择);
    wget "5/x86_64/cloudera-cdh-5-0.x86_64.rpm"   ---------如果你的Linux版本是6.x这里改为6即可,下同;
    yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
>添加cloudera仓库验证;
    rpm --import 5/x86_64/cdh/RPM-GPG-KEY-cloudera
2、安装
> master 安装NN,NM,DN,MR,hadoop-client
yum clean all; yum install hadoop-hdfs-namenode
yum clean all; yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
yum clean all; yum install hadoop-client
>slave1 安装RM,NM,DN,MR,hadoop-client
yum clean all; yum install hadoop-yarn-resourcemanager
yum clean all; yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
yum clean all; yum install hadoop-client
>slave2 、slave3、slave4安装NM,DN,MR,hadoop-client
yum clean all; yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
yum clean all; yum install hadoop-client
3、创建目录 (本人机器只有一个盘cache1,如果你有多个可以创建多个)
DN:
mkdir -p /data/cache1/dfs/dn
mkdir -p /data/cache1/dfs/mapred/local
chown -R hdfs:hadoop /data/cache1/dfs/dn
chown -R mapred:hadoop /data/cache1/dfs/mapred/local
NN:
mkdir -p /data/cache1/dfs/nn
chown -R hdfs:hadoop /data/cache1/dfs/nn
chmod 700 /data/cache1/dfs/nn
4、修改配置文件
修改master机器上的配置文件,然后scp到各个slave;
1)/etc/hadoop/conf/core-site.xml   红色IP为NN地址;

点击(此处)折叠或打开

  1. [root@master conf]# cat core-site.xml
  2. fs.defaultFS
  3. hdfs://192.168.32.70:8020
  4. dfs.replication
  5. 1
2)/etc/hadoop/conf/hdfs-site.xml/yarn-site.xml

点击(此处)折叠或打开

  1. [root@master conf]# cat /etc/hadoop/conf/hdfs-site.xml
  2. dfs.name.dir
  3. /var/lib/hadoop-hdfs/cache/hdfs/dfs/name
  4. dfs.datanode.data.dir
  5. /data/cache1/dfs/dn/
3)/etc/hadoop/conf  红色的IP为装RM的机器,本例子中是192.168.32.71;

点击(此处)折叠或打开

  1. [root@master conf]# cat yarn-site.xml
  2. yarn.nodemanager.aux-services
  3. mapreduce_shuffle
  4. yarn.nodemanager.aux-services.mapreduce_shuffle.class
  5. org.apache.hadoop.mapred.ShuffleHandler
  6. yarn.log-aggregation-enable
  7. true
  8. List of directories to store localized files in.
  9. yarn.nodemanager.local-dirs
  10. /var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir
  11. yarn.resourcemanager.address
  12. 192.168.32.71:8032
  13. yarn.resourcemanager.scheduler.address
  14. 192.168.32.71:8030
  15. yarn.resourcemanager.webapp.address
  16. 0.0.0.0:8088
  17. yarn.resourcemanager.resource-tracker.address
  18. 192.168.32.71:8031
  19. yarn.resourcemanager.admin.address
  20. 192.168.32.71:8033
  21. Where to store container logs.
  22. yarn.nodemanager.log-dirs
  23. /var/log/hadoop-yarn/containers
  24. Where to aggregate logs to.
  25. yarn.nodemanager.remote-app-log-dir
  26. /var/log/hadoop-yarn/apps
  27. Classpath for typical applications.
  28. yarn.application.classpath
  29. $HADOOP_CONF_DIR,
  30. $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
  31. $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
  32. $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
  33. $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
4)/etc/hadoop/conf/hadoop-env.sh 

点击(此处)折叠或打开

  1. [root@master conf]# cat hadoop-env.sh
  2. # Set Hadoop-specific environment variables here.
  3. # The only required environment variable is JAVA_HOME. All others are
  4. # optional. When running a distributed configuration it is best to
  5. # set JAVA_HOME in this file, so that it is correctly defined on
  6. # remote nodes.
  7. # The maximum amount of heap to use, in MB. Default is 1000.
  8. #export HADOOP_HEAPSIZE=
  9. #export HADOOP_NAMENODE_INIT_HEAPSIZE=""
  10. # Extra Java runtime options. Empty by default.
  11. export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true ${HADOOP_OPTS}"
  12. # Command specific options appended to HADOOP_OPTS when specified
  13. export HADOOP_NAMENODE_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_NAMENODE_OPTS}"
  14. HADOOP_JOBTRACKER_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dmapred.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"
  15. HADOOP_TASKTRACKER_OPTS="-Dsecurity.audit.logger=ERROR,console -Dmapred.audit.logger=ERROR,console ${HADOOP_TASKTRACKER_OPTS}"
  16. HADOOP_DATANODE_OPTS="-Dsecurity.audit.logger=ERROR,DRFAS ${HADOOP_DATANODE_OPTS}"
  17. export HADOOP_SECONDARYNAMENODE_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_SECONDARYNAMENODE_OPTS}"
  18. # The following applies to multiple commands (fs, dfs, fsck, distcp etc)
  19. export HADOOP_CLIENT_OPTS="-Xmx128m ${HADOOP_CLIENT_OPTS}"
  20. #HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData ${HADOOP_JAVA_PLATFORM_OPTS}"
  21. # On secure datanodes, user to run the datanode as after dropping privileges
  22. export HADOOP_SECURE_DN_USER=hdfs
  23. # Where log files are stored. $HADOOP_HOME/logs by default.
  24. export HADOOP_LOG_DIR=/var/local/hadoop/logs
  25. # Where log files are stored in the secure data environment.
  26. export HADOOP_SECURE_DN_LOG_DIR=$HADOOP_LOG_DIR
  27. # The directory where pid files are stored. /tmp by default.
  28. export HADOOP_PID_DIR=/var/local/hadoop/pid
  29. export HADOOP_SECURE_DN_PID_DIR=$HADOOP_PID_DIR
  30. # A string representing this instance of hadoop. $USER by default.
  31. export HADOOP_IDENT_STRING=$USER
  32. export JAVA_HOME=/usr/java/latest
5)修改/etc/hadoop/conf/slave文件;
添加slave:
slave1
slave2
slave3
slave4
6)scp文件到各个slave;
scp /etc/hadoop/conf root@slave1:/etc/hadoop/conf

四、启动

1)NN(master)启动
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
2)DN(slave1)启动(装有RM)
/etc/init.d/hadoop-datanode
/etc/init.d/hadoop-yarn-nodemanager
/etc/init.d/hadoop-yarn-resourcemanager
3)DN(slave2/slave3/slave4)启动
/etc/init.d/hadoop-datanode
/etc/init.d/hadoop-yarn-nodemanager

五、查看

   
  (类似于hadoop1.0的Jobtracker地址,即50030端口

六、安装中出现的问题以及解决办法

启动NN时报:log4j:ERROR Could not find value for key log4j.appender.DRFAAUDIT错误;
解决办法:在/etc/hadoop/conf/log4j.properties 加入以下配置
log4j.appender.DRFAAUDIT=org.apache.log4j.ConsoleAppender
log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout

、总结

hadoop 1.0版本和当前装的cdh5版本,从安装方面差别还是挺大的,不过还好不算麻烦,一步一步来,遇到问题多问就OK;

                                      

上一篇:rsync同步的艺术
下一篇:mysql主从复制