Hadoop 2.2的安装与配置

mysqlmpp123 · 发表于 2016-10-30 10:09:08

一准备
软件
CentOS-5.8-i386
Hadoop2.2.0 http://apache.claz.org/hadoop/common/hadoop-2.2.0/
VMWare 8
Eclipse JUNO
JDK 7u45
FileZilla
Putty
规划
两台机器组成的集群：
192.168.1.103 master
192.168.1.133 node1
二、安装基础环境
安装操作系统
硬盘为10G

设置网络连接方式为桥连
设置主机名为Master
hostname master
vi /etc/sysconfig/network
同步时间
ntpdate cn.pool.ntp.org
使用setup 命令配置系统环境
配置网络，查看网络
查看主机网络情况
修改IP 与网关
关闭防火墙
检查网络情况
cat /etc/sysconfig/network-scripts/ifcfg-eth0
/sbin/service network restart #重新启动网络服务
ifconfig #检查网络ip 配置
编辑host 文件
vi /etc/hosts
测试
联通互联网
修改DNS
vi /etc/resolv.conf
service network restart
此时可以上网
安装JDK
利用FileZilla 上传软件到/home/software 目录下
tar -zxvf jdk-7u45-linux-i586.tar.gz
tar -zxvf hadoop-2.2.0.tar.gz
vi /etc/profile.d/java.sh
source /etc/profile
创建data、name、temple 文件夹
建立Hadoop 用户并赋权
利用root 用户修改hadoop 用户密码
passwd hadoop
无秘钥ssh
su hadoop
cd /home/hadoop/
ssh-keygen -q -t rsa -N "" -f /home/hadoop/.ssh/id_rsa
cd .ssh
cat id_rsa.pub > authorized_keys
赋权chmod go-wx authorized_keys
集群环境id_ras_pub 复制到node1:/home/hadoop/.ssh/authorized_keys
测试环境
三、hadoop 的单点配置
修改配置文件
hadoop-env.sh 和yarn-env.sh
hadoop-env.sh
修改JAVA_HOME 为本机地址，同时修改yarn-env.sh 的该参数，否则不认
修改HEAPSIZE
修改core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/software/temp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
修改hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>file:/home/software/name</value>
<description> </description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:/home/software/data</value>
</property>
<property>
<name>dfs.http.address</name>
<value>master:9002</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>1073741824</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
修改mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
编辑yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8990</value>
<description>host is the hostname of the resource manager and
port is the port on which the NodeManagers contact the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8991</value>
<description>host is the hostname of the resourcemanager and port is the port
on which the Applications in the cluster talk to the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</
value>
<description>In case you do not want to use the default scheduler</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8993</value>
<description>the host is the hostname of the ResourceManager and the port is the port on
which the clients can talk to the Resource Manager. </description>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/software/tmp/node</value>
<description>the local directories used by the nodemanager</description>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>master:8994</value>
<description>the nodemanagers bind to this port</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>5120</value>
<description>the amount of memory on the NodeManager in GB</description>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/home/software/tmp/app-logs</value>
<description>directory on hdfs where the application logs are moved to </description>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/home/software/tmp/node</value>
<description>the directories used by Nodemanagers as log directories</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run </description>
</property>
</configuration>
修改capacity-scheduler.xml
<configuration>
<property>
<name>yarn.scheduler.capacity.maximum-applications</name>
<value>10000</value>
<description>
Maximum number of applications that can be pending and running.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.1</value>
<description>
Maximum percent of resources in the cluster which can be used to run
application masters i.e. controls number of concurrent running
applications.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
<description>
The ResourceCalculator implementation to be used to compare
Resources in the scheduler.
The default i.e. DefaultResourceCalculator only uses Memory while
DominantResourceCalculator uses dominant-resource to compare
multi-dimensional resources such as Memory, CPU etc.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>unfunded,default</value>
<description>
The queues at the this level (root is the root queue).
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.unfunded.capacity</name>
<value>50</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>50</value>
<description>Default queue target capacity.</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
<value>1</value>
<description>
Default queue user limit a percentage from 0.0 to 1.0.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>100</value>
<description>
The maximum capacity of the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
<value>*</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
<value>*</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.node-locality-delay</name>
<value>-1</value>
<description>
Number of missed scheduling opportunities after which the CapacityScheduler
attempts to schedule rack-local containers.
Typically this should be set to number of racks in the cluster, this
feature is disabled by default, set to -1.
</description>
</property>
</configuration>
修改slaves
格式化hadoop namenode -format
启动
cd /home/software/hadoop-2.2.0/sbin
./start-dfs.sh
./start-yarn.sh
查看Hadoop 资源管理器
http://192.168.1.103:9002/dfshealth.jsp
http://192.168.1.103:8088/cluster
测试Hadoop
上传一个文件test.txt 至/home/software
创建hdfs 目录hdfs dfs -mkdir /tmp
hdfs dfs -copyFromLocal /home/software/test.txt /tmp/
hdfs dfs -ls /tmp
hadoop jar /home/software/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 100
停止Hadoop
若停止hadoop，依次运行如下命令：
$./stop-yarn.sh
$./stop-dfs.sh
四、Hadoop 集群配置
环境准备
拷贝一份master 的虚拟机文件并重新命名为node1
启动两台虚拟机
修改node1 ip
重启服务service network restart
vi /etc/sysconfig/network
修改主机名
hostname node1
删除datanode 和Master 没用的数据文件
修改slaves
运行master 启动节点
cd /home/software/hadoop-2.2.0/sbin/
./start-dfs.sh
./start-yarn.sh
jps
Master：
[hadoop@master sbin]$ jps
5037 DataNode
5419 NodeManager
5312 ResourceManager
5727 Jps
5154 SecondaryNameNode
Node1：
[hadoop@node1 hadoop]$ jps
5418 Jps
5235 DataNode
测试（与单机测试方式一致）
hdfs dfs -mkdir /tmp
莫名报错，删除所有data 和name 文件，重新格式化namenode 后恢复
http://192.168.1.103:9002/dfsnodelist.jsp?whatNodes=LIVE
http://192.168.1.103:8088/cluster/apps

帐号		自动登录	找回密码
密码			注册

Hadoop 2.2的安装与配置

相关帖子