为满足公司程序运行环境,hadoop集群由原来的1.0版本升级到CDH5版本,又一次集群安装经历,分享给有需要的人。
一、机器准备
Linux版本CentOs 5.8,x86_64,如果你的linux版本是6.x,也可以参照下面步骤安装;本人此次安装共准备了5台机器:
192.168.32.70(master),192.168.32.71(slave1),192.168.32.72(slave2),192.168.32.73(slave3),192.168.32.79(slave4);
修改/etc/sysconfig/network文件中的HOSTNAME,修改为方便记忆的名字,当然你也可以不改,只要你觉得方便就好;
修改/etc/hosts文件(五台机器都要修改):
192.168.32.70 master
192.168.32.71 slave1
192.168.32.72 slave2
192.168.32.73 slave3
192.168.32.79 slave4
二、环境准备
1、打通ssh>所有机器 ssh-keygen -t rsa 一路按回车;
>在master机器上执行:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys;
>scp文件到其他几台机器:
scp ~/.ssh/authorized_keys root@slave1:~/.ssh/
scp ~/.ssh/authorized_keys root@slave2:~/.ssh/
......
scp ~/.ssh/authorized_keys root@slave4:~/.ssh/
>试验下免密码功能是否正常:
点击(此处)折叠或打开
- [root@master hadoop-conf]# ssh slave1
- Last login: Wed Sep 24 16:07:12 2014 from master
- [root@slave1 ~]#
2、安装JDK7
>官网下载jdk-7u51-linux-x64.rpm包;
>rpm -ivh jdk-7u51-linux-x64.rpm
>添加环境变量;
vi /etc/profile
增加
JAVA_HOME=/usr/java/latest
PATH=$PATH:$JAVA_HOME/bin
CLASSPATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export JAVA_HOME CLASSPATH
>执行source生效;
source /etc/profile
3、创建hadoop用户
groupadd hdfs
useradd hadoop -g hdfs
三、安装cdh5
1、下载rpm安装包>进入目录/data/tools/ (个人习惯的软件存储目录,你可以自己随便选择);
wget "5/x86_64/cloudera-cdh-5-0.x86_64.rpm" ---------如果你的Linux版本是6.x这里改为6即可,下同;
yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
>添加cloudera仓库验证;
rpm --import 5/x86_64/cdh/RPM-GPG-KEY-cloudera
2、安装
> master 安装NN,NM,DN,MR,hadoop-client
yum clean all; yum install hadoop-hdfs-namenode
yum clean all; yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
yum clean all; yum install hadoop-client
>slave1 安装RM,NM,DN,MR,hadoop-client
yum clean all; yum install hadoop-yarn-resourcemanager
yum clean all; yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
yum clean all; yum install hadoop-client
>slave2 、slave3、slave4安装NM,DN,MR,hadoop-client
yum clean all; yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
yum clean all; yum install hadoop-client
3、创建目录 (本人机器只有一个盘cache1,如果你有多个可以创建多个)
DN:
mkdir -p /data/cache1/dfs/dn
mkdir -p /data/cache1/dfs/mapred/local
chown -R hdfs:hadoop /data/cache1/dfs/dn
chown -R mapred:hadoop /data/cache1/dfs/mapred/local
NN:
mkdir -p /data/cache1/dfs/nn
chown -R hdfs:hadoop /data/cache1/dfs/nn
chmod 700 /data/cache1/dfs/nn
4、修改配置文件
修改master机器上的配置文件,然后scp到各个slave;
1)/etc/hadoop/conf/core-site.xml 红色IP为NN地址;
点击(此处)折叠或打开
- [root@master conf]# cat core-site.xml
-
-
-
fs.defaultFS -
hdfs://192.168.32.70:8020 -
-
dfs.replication -
1
点击(此处)折叠或打开
- [root@master conf]# cat /etc/hadoop/conf/hdfs-site.xml
-
-
-
dfs.name.dir -
/var/lib/hadoop-hdfs/cache/hdfs/dfs/name -
-
dfs.datanode.data.dir -
/data/cache1/dfs/dn/
点击(此处)折叠或打开
- [root@master conf]# cat yarn-site.xml
-
-
-
yarn.nodemanager.aux-services -
mapreduce_shuffle -
-
yarn.nodemanager.aux-services.mapreduce_shuffle.class -
org.apache.hadoop.mapred.ShuffleHandler -
-
yarn.log-aggregation-enable -
true -
-
List of directories to store localized files in. -
yarn.nodemanager.local-dirs -
/var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir -
-
yarn.resourcemanager.address -
192.168.32.71:8032 -
-
yarn.resourcemanager.scheduler.address -
192.168.32.71:8030 -
-
yarn.resourcemanager.webapp.address -
0.0.0.0:8088 -
-
yarn.resourcemanager.resource-tracker.address -
192.168.32.71:8031 -
-
yarn.resourcemanager.admin.address -
192.168.32.71:8033 -
-
Where to store container logs. -
yarn.nodemanager.log-dirs -
/var/log/hadoop-yarn/containers -
-
Where to aggregate logs to. -
yarn.nodemanager.remote-app-log-dir -
/var/log/hadoop-yarn/apps -
-
Classpath for typical applications. -
yarn.application.classpath -
- $HADOOP_CONF_DIR,
- $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
- $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
- $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
- $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
点击(此处)折叠或打开
- [root@master conf]# cat hadoop-env.sh
- # Set Hadoop-specific environment variables here.
- # The only required environment variable is JAVA_HOME. All others are
- # optional. When running a distributed configuration it is best to
- # set JAVA_HOME in this file, so that it is correctly defined on
- # remote nodes.
- # The maximum amount of heap to use, in MB. Default is 1000.
- #export HADOOP_HEAPSIZE=
- #export HADOOP_NAMENODE_INIT_HEAPSIZE=""
- # Extra Java runtime options. Empty by default.
- export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true ${HADOOP_OPTS}"
- # Command specific options appended to HADOOP_OPTS when specified
- export HADOOP_NAMENODE_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_NAMENODE_OPTS}"
- HADOOP_JOBTRACKER_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dmapred.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"
- HADOOP_TASKTRACKER_OPTS="-Dsecurity.audit.logger=ERROR,console -Dmapred.audit.logger=ERROR,console ${HADOOP_TASKTRACKER_OPTS}"
- HADOOP_DATANODE_OPTS="-Dsecurity.audit.logger=ERROR,DRFAS ${HADOOP_DATANODE_OPTS}"
- export HADOOP_SECONDARYNAMENODE_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_SECONDARYNAMENODE_OPTS}"
- # The following applies to multiple commands (fs, dfs, fsck, distcp etc)
- export HADOOP_CLIENT_OPTS="-Xmx128m ${HADOOP_CLIENT_OPTS}"
- #HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData ${HADOOP_JAVA_PLATFORM_OPTS}"
- # On secure datanodes, user to run the datanode as after dropping privileges
- export HADOOP_SECURE_DN_USER=hdfs
- # Where log files are stored. $HADOOP_HOME/logs by default.
- export HADOOP_LOG_DIR=/var/local/hadoop/logs
- # Where log files are stored in the secure data environment.
- export HADOOP_SECURE_DN_LOG_DIR=$HADOOP_LOG_DIR
- # The directory where pid files are stored. /tmp by default.
- export HADOOP_PID_DIR=/var/local/hadoop/pid
- export HADOOP_SECURE_DN_PID_DIR=$HADOOP_PID_DIR
- # A string representing this instance of hadoop. $USER by default.
- export HADOOP_IDENT_STRING=$USER
- export JAVA_HOME=/usr/java/latest
添加slave:
slave1
slave2
slave3
slave4
6)scp文件到各个slave;
scp /etc/hadoop/conf root@slave1:/etc/hadoop/conf
四、启动
1)NN(master)启动
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
2)DN(slave1)启动(装有RM)
/etc/init.d/hadoop-datanode
/etc/init.d/hadoop-yarn-nodemanager
/etc/init.d/hadoop-yarn-resourcemanager
3)DN(slave2/slave3/slave4)启动
/etc/init.d/hadoop-datanode
/etc/init.d/hadoop-yarn-nodemanager
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
2)DN(slave1)启动(装有RM)
/etc/init.d/hadoop-datanode
/etc/init.d/hadoop-yarn-nodemanager
/etc/init.d/hadoop-yarn-resourcemanager
3)DN(slave2/slave3/slave4)启动
/etc/init.d/hadoop-datanode
/etc/init.d/hadoop-yarn-nodemanager
五、查看
(类似于hadoop1.0的Jobtracker地址,即50030端口)
六、安装中出现的问题以及解决办法
启动NN时报:log4j:ERROR Could not find value for key log4j.appender.DRFAAUDIT错误;
解决办法:在/etc/hadoop/conf/log4j.properties 加入以下配置
log4j.appender.DRFAAUDIT=org.apache.log4j.ConsoleAppender
log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout
hadoop 1.0版本和当前装的cdh5版本,从安装方面差别还是挺大的,不过还好不算麻烦,一步一步来,遇到问题多问就OK;解决办法:在/etc/hadoop/conf/log4j.properties 加入以下配置
log4j.appender.DRFAAUDIT=org.apache.log4j.ConsoleAppender
log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout