马上注册,结交更多数据大咖,获取更多知识干货,轻松玩转大数据
您需要 登录 才可以下载或查看,没有帐号?立即注册
x
本帖最后由 乔帮主 于 2014-11-23 18:54 编辑
一、实验环境 1.安装环境简介 物理笔记本:i5 2.27GHz (4 CPU) 4G内存 320GB硬盘 32位win7 操作系统 虚拟机:Product VMware® Workstation Version 7.0.0 build-203739 虚拟机安装配置URL:http://ideapad.it168.com/thread-2088751-1-1.html 不会配置的朋友请见 包括(vm tools linux与windows 共享文件 配置) 我的linux虚拟机配置 master slave1 slave2 CPU:1颗2核 内存:512MB 硬盘:10GB Linux ISO:CentOS-6.0-i386-bin-DVD.iso 32位 hadoop software version:hadoop-0.20.205.0.tar.gz root密码:rootroot 系统版本: CentOS Linux release 6.0 (Final) [root@h1 etc]# cat issue CentOS Linux release 6.0 (Final) Kernel \r on an \m 主机名 ip 节点名 备注 h1 192.168.2.102 masters namenode和jobtracker h2 192.168.2.103 slaves namenode和jobtracker h4 192.168.2.105 slaves namenode和jobtracker 二、完全分布式模式安装
现在配置第一台主机 H4 1.配置hosts文件 [grid@h4 .ssh]$ cat /etc/hosts 192.168.2.105 h4 # Added by NetworkManager 127.0.0.1 localhost.localdomain localhost ::1 h4 localhost6.localdomain6 localhost6 192.168.2.102 h1 192.168.2.103 h2 192.168.2.105 h4 2.建立hadoop专用账号 h1 h2 h4 三个虚拟机的 groupadd hadoop 建立grid用户主要属组 useradd grid –g hadoop 建立grid用户 [root@h1 etc]# passwd grid 更改用户 grid 的密码 。 新的 密码:grid 无效的密码:过短 无效的密码:过于简单 重新输入新的密码: grid passwd:所有的身份验证令牌已经成功更新。 Windows 任务管理器 同时开了3个虚拟机 3.设置h1 h2 h4 的ssh H4 [grid@h4 ~]$ ssh-keygen -t rsa 使用RSA加密算法生成密钥对 Generating public/private rsa key pair. 一个公钥和一个私钥 Enter file in which to save the key (/home/grid/.ssh/id_rsa): Created directory '/home/grid/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/grid/.ssh/id_rsa. 这个是私钥 Your public key has been saved in /home/grid/.ssh/id_rsa.pub. 这个是公钥 The key fingerprint is: 50:29:8a:78:ac:0e:a1:72:10:2d:01:66:77:f0:7e:c1 grid@h4 The key's randomart image is: RSA随机图 +--[ RSA 2048]----+ |+= o.. .. | |= o o o.. | | = . o.E | |+ + o .. | |.= . .S | |= . . | |+. | | . | | | +-----------------+ [grid@h4 ~]$ ll –lrta 只要在家目录下生成.ssh隐藏目录就算成功 总用量 -rw-r--r--. 1 grid hadoop 500 1月 24 2007 .emacs drwxr-xr-x. 2 grid hadoop 4096 11月 12 2010 .gnome2 -rw-r--r--. 1 grid hadoop 124 5月 31 2011 .bashrc -rw-r--r--. 1 grid hadoop 176 5月 31 2011 .bash_profile -rw-r--r--. 1 grid hadoop 18 5月 31 2011 .bash_logout drwxr-xr-x. 4 root root 4096 9月 1 21:14 .. drwx------. 5 grid hadoop 4096 9月 1 21:34 . drwx------. 2 grid hadoop 4096 9月 1 21:34 .ssh drwxr-xr-x. 4 grid hadoop 4096 9月 2 2012 .mozilla [grid@h4 ~]$ cd .ssh [grid@h4 .ssh]$ ll 总用量 8 -rw-------. 1 grid hadoop 1675 9月 1 21:34 id_rsa -rw-r--r--. 1 grid hadoop 389 9月 1 21:34 id_rsa.pub [grid@h4 .ssh]$ cp id_rsa.pub authorized_keys 生成授权文件 [grid@h4 .ssh]$ cat authorized_keys 打开authorized_keys查看里面的公钥 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr6+D01KKqeMUrkyakulV3su+9RU+jJ6sNJMlydxFq38oGBsJBwcskVL/I9ds7vE5g7coP+cMzgtRyj1ns+elgF0g3/uhtSerad4QdWXVLZgUjyUxijkm+nI3SSdwLihzsNNgH4GzeKX3HQAH/7S+rLoZSBPi//w9HYfO6VeXdo7N2lkvUxNW2z/h7JuYPMEqiaOIWAeLK7AJXhjJaeJkZh/ccGuEx4uBLRxqce5zjbNsFapoD2bact1w80a7mrgzAN3cVcQuQPzmpdj750negxMtai+QRmPDlSx2ZXtbarI4opSVmBiqpY84PJ/h9m5wptQ3hg/1XIxv4gyqwLSxZw== grid@h4 有了这个公钥还配合私钥,我们就可以免密码登陆了 H2 [grid@h2 ~]$ ssh-keygens -t rsa -bash: ssh-keygens: command not found [grid@h2 ~]$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/grid/.ssh/id_rsa): Created directory '/home/grid/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/grid/.ssh/id_rsa. Your public key has been saved in /home/grid/.ssh/id_rsa.pub. The key fingerprint is: 14:55:b9:d1:4a:60:a1:5c:47:37:30:49:09:aa:30:3d grid@h2 The key's randomart image is: RSA随机图每个都是不一样的 +--[ RSA 2048]----+ | ..BBB*o | | . . * .*o.. | | o E = . + | | o + o | | . S | | | | | | | | | +-----------------+ [grid@h2 ~]$ cd .ssh [grid@h2 .ssh]$ ll 总用量 12 -rw-------. 1 grid hadoop 1675 9月 1 21:59 id_rsa 也生成了私钥和公钥 -rw-r--r--. 1 grid hadoop 389 9月 1 21:59 id_rsa.pub H4 [grid@h4 .ssh]$ scp authorized_keys h2:/home/grid/.ssh/ 把h4的授权文件->h2 H2 [grid@h2 .ssh]$ ll 总用量 12 -rw-r--r--. 1 grid hadoop 778 9月 1 22:02 authorized_keys -rw-------. 1 grid hadoop 1675 9月 1 21:59 id_rsa -rw-r--r--. 1 grid hadoop 389 9月 1 21:59 id_rsa.pub [grid@h2 .ssh]$ cat authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr6+D01KKqeMUrkyakulV3su+9RU+jJ6sNJMlydxFq38oGBsJBwcskVL/I9ds7vE5g7coP+cMzgtRyj1ns+elgF0g3/uhtSerad4QdWXVLZgUjyUxijkm+nI3SSdwLihzsNNgH4GzeKX3HQAH/7S+rLoZSBPi//w9HYfO6VeXdo7N2lkvUxNW2z/h7JuYPMEqiaOIWAeLK7AJXhjJaeJkZh/ccGuEx4uBLRxqce5zjbNsFapoD2bact1w80a7mrgzAN3cVcQuQPzmpdj750negxMtai+QRmPDlSx2ZXtbarI4opSVmBiqpY84PJ/h9m5wptQ3hg/1XIxv4gyqwLSxZw== grid@h4 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA5iKGfOGKh3d8BYr4vkkNaEtZkxCbBzBn6pfD0n3h82/1f9PwEtT4CEgqzBssYvQ2Nbc6dUy2NbDD9j5dIwQENS/fAJDwccdiJjEYMo5+o4ocPABx6OVM0r9nsUkyU7bxeHjap3ZUmcC1UvgW5asOsRMl7ePCze+rnt5D5ldZ+VOKh0NgtY2/CST8qXHmedfZFbQSEhIPf5Lh4A6oSoRHTFQbDN4apvf5s7Cm5/NgPiyhU+KbHBz96pNCxkjuOwj69a7kx4AgQYJoYc0T9O6YfjfVy3l1a7N2aJ6jp4SMv0GaohgzIrBNXwoFK6skuyf10yIxvNlGzkhTYK9GS9hjJw 看现在授权文件中已经有了h4和h2的公钥了,就差h1的 H1 [grid@h1 ~]$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/grid//.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/grid//.ssh/id_rsa. Your public key has been saved in /home/grid//.ssh/id_rsa.pub. The key fingerprint is: b6:4e:a6:05:d3:37:e7:3d:ca:44:7b:cf:2c:d2:5b:a4 grid@h1 The key's randomart image is: +--[ RSA 2048]----+ | | | | | | | . | | o S o o .| | + o = o o | | = +.E .| | * o.oo* | | . . o..o+| +-----------------+ H2 [grid@h2 .ssh]$ scp authorized_keys h1:/home/grid/.ssh/ H1 [grid@h1 .ssh]$ ll 总用量 12 -rw-r--r--. 1 grid hadoop 778 9月 1 22:12 authorized_keys -rw-------. 1 grid hadoop 1675 9月 1 22:12 id_rsa -rw-r--r--. 1 grid hadoop 389 9月 1 22:12 id_rsa.pub [grid@h1 .ssh]$ cat id_rsa.pub >> authorized_keys 把三个节点的公钥相互拷贝到文件中 [grid@h1 .ssh]$ cat authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr6+D01KKqeMUrkyakulV3su+9RU+jJ6sNJMlydxFq38oGBsJBwcskVL/I9ds7vE5g7coP+cMzgtRyj1ns+elgF0g3/uhtSerad4QdWXVLZgUjyUxijkm+nI3SSdwLihzsNNgH4GzeKX3HQAH/7S+rLoZSBPi//w9HYfO6VeXdo7N2lkvUxNW2z/h7JuYPMEqiaOIWAeLK7AJXhjJaeJkZh/ccGuEx4uBLRxqce5zjbNsFapoD2bact1w80a7mrgzAN3cVcQuQPzmpdj750negxMtai+QRmPDlSx2ZXtbarI4opSVmBiqpY84PJ/h9m5wptQ3hg/1XIxv4gyqwLSxZw== grid@h4 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA5iKGfOGKh3d8BYr4vkkNaEtZkxCbBzBn6pfD0n3h82/1f9PwEtT4CEgqzBssYvQ2Nbc6dUy2NbDD9j5dIwQENS/fAJDwccdiJjEYMo5+o4ocPABx6OVM0r9nsUkyU7bxeHjap3ZUmcC1UvgW5asOsRMl7ePCze+rnt5D5ldZ+VOKh0NgtY2/CST8qXHmedfZFbQSEhIPf5Lh4A6oSoRHTFQbDN4apvf5s7Cm5/NgPiyhU+KbHBz96pNCxkjuOwj69a7kx4AgQYJoYc0T9O6YfjfVy3l1a7N2aJ6jp4SMv0GaohgzIrBNXwoFK6skuyf10yIxvNlGzkhTYK9GS9hjJw== grid@h2 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA5V1lyss14a8aWFEkTk/aBgKHFLMX/XZX/xtXVUqJl8NkTQVLQ37+XLyqvTfrcJSja70diqB3TrwBp3K5eXNxp3EOr6EGHsi0B6D8owsg0bCDhxHGHu8RX8WB4DH9UOv1uPL5BESAPHjuemQuQaQzLagqrnXbrKix8CzdIEgmnOknYiS49q9msnzawqo3luQFRU7MQvAU9UZqkxotrnzHqh0tgjJ3Sq6O6nscA7w//Xmb0JGobVQAFCDJQdn/z1kOq7E5WNhVa8ynF9GOF7cMdppug7Ibw1RZ9cKa+igi1KhhavS5H7XCM64NuGfC87aQE9nz0ysS3Kh8PT5h6zlxfw== grid@h1 [grid@h1 .ssh]$ scp authorized_keys h2:/home/grid/.ssh/ 传给H2 [grid@h1 .ssh]$ scp authorized_keys h4:/home/grid/.ssh/ 传给H4 4.安装JDK 【h1 h2 h4上操作】 先把jdk-6u25-ea-bin-b03-linux-i586-27_feb_2011-rpm.bin拷贝到 h1 h2 h4 的/usr目录下 加上执行权限 Chmod 777 jdk-6u25-ea-bin-b03-linux-i586-27_feb_2011-rpm.bin Root用户执行安装命令 ./ jdk-6u25-ea-bin-b03-linux-i586-27_feb_2011-rpm.bin cd命令进入/etc目录vim profile即执行编辑profile文件命令 在umask 022前添加如下内容: 环境变量 export JAVA_HOME=/usr/java/jdk1.6.0_25 export JRE_HOME=/usr/java/jdk1.6.0_25/jre export PATH=$PATH:/usr/java/jdk1.6.0_25/bin export CLASSPATH=./:/usr/java/jdk1.6.0_25/lib:/usr/java/jdk1.6.0_25/jre/lib source profile 加载环境变量使之生效 5.下载hadoop压缩包并解压 【h1 上操作】 下载网址:http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-0.20.2/ 将下载的hadoop-0.20.2.tar.gz包上传到/home/grid/目录 [grid@h1 grid]$ mv hadoop-0.20.2.tar.gz /home/grid/ [grid@h1 grid]$ tar -zxvf hadoop-0.20.2.tar.gz 【还可以先解压在解包】 gzip -d hadoop-0.20.2.tar.gz 解压 -d decompress 代表解压并删除源文件 tar -xvf hadoop-0.20.2.tar 解包 6.修改hadoop-env.sh文件 【h1 上操作】 添加export JAVA_HOME=/usr/java/jdk1.6.0_25 环境变量 [grid@h1 conf]$ pwd /home/grid/hadoop-0.20.2/conf [grid@h1 conf]$ ll hadoop-env.sh [grid@h1 conf]$ vim hadoop-env.sh # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. export JAVA_HOME=/usr/java/jdk1.6.0_25 把前面#号去掉,修改java目录 7.修改core-site.xml 文件 【h1 上操作】 [grid@h1 conf]$ vim core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> --文件系统默认名称节点 <value>hdfs://192.168.2.102:9000</value> --名称节点ip地址和端口号 </property> </configuration> 8.修改hdfs-site.xml文件 【h1 上操作】 [grid@h1 hadoop-0.20.2]$ mkdir data 建立存放数据的目录 [grid@h1 hadoop-0.20.2]$ cd conf [grid@h1 conf]$ pwd /home/grid/hadoop-0.20.2/conf [grid@h1 conf]$ vim hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.data.dir</name> --数据节点存放数据的位置 <value>/home/grid/hadoop-0.20.2/data</value> --存放路径 </property> <property> <name>dfs.replication</name> --分布式文件系统,数据块复制多少份 <value>2</value> --有几个datanode节点就复制几份 </property> --这里有2个,我们复制2份 </configuration> HDFS默认存储目录在/tmp/hadoop-${user.name},这些属性的设置很重要,我们要把它修改成我们设计的位置来存放 9.修改mapred-site.xml文件【h1 上操作】 [grid@h1 conf]$ vim mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> --设置jobtracker作业跟踪器主机ip:端口 <value>192.168.2.102:9001</value> --master节点ip:9001端口 </property> --9001是hadoop喜欢默认端口不用修改 </configuration> 10.修改masters和slaves文件【h1 上操作】 [grid@h1 conf]$ vim masters 记录运行namenode和jobtracker的主机,一行代表一个 h1 [grid@h1 conf]$ vim slaves 记录运行datanode和tasktracker的主机,一行代表一个 h2 h4 11.向h2 h4 复制hadoop-0.20.2目录 [grid@h1 grid]$ scp -r ./hadoop-0.20.2/ h4:/home/grid/ 向h4节点复制 [grid@h1 grid]$ scp -r ./hadoop-0.20.2/ h2:/home/grid/ 向h2节点复制 12.格式化分布式文件系统【h1 操作 格式化名称节点】 [grid@h1 bin]$ pwd /home/grid/hadoop-0.20.2/bin [grid@h1 bin]$ ./hadoop namenode -format 12/09/02 20:19:34 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = h1/192.168.2.102 主机 STARTUP_MSG: args = [-format] 格式化 STARTUP_MSG: version = 0.20.2 版本号 STARTUP_MSG: build = https://svn.apache.org/repos/asf ... ranches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 ************************************************************/ 12/09/02 20:19:35 INFO namenode.FSNamesystem: fsOwner=grid,hadoop 拥有者 12/09/02 20:19:35 INFO namenode.FSNamesystem: supergroup=supergroup 超级组 12/09/02 20:19:35 INFO namenode.FSNamesystem: isPermissionEnabled=true 12/09/02 20:19:36 INFO common.Storage: Image file of size 94 saved in 0 seconds. 12/09/02 20:19:36 INFO common.Storage: Storage directory /tmp/hadoop-grid/dfs/name has been successfully formatted. 存储目录格式化成功 12/09/02 20:19:36 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at h1/192.168.2.102 关闭名称节点 ************************************************************/ 格式化名称节点:建立一系列结构,存放HDFS元数据 13.启动Hadoop 【只在h1上操作就可以】 命令:bin/start-all.sh [grid@h1 bin]$ ./start-all.sh starting namenode, logging to 启动名称节点h1日志路径 /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-namenode-h1.out h4: starting datanode, logging to 启动数据节点h4日志路径 /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-h4.out h2: starting datanode, logging to 启动数据节点h2日志路径 /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-h2.out The authenticity of host 'h1 (::1)' can't be established. RSA key fingerprint is c0:84:4f:27:ef:aa:a8:77:24:b7:00:72:fc:bb:32:aa. Are you sure you want to continue connecting (yes/no)? yes h1: Warning: Permanently added 'h1' (RSA) to the list of known hosts. h1: starting secondarynamenode, logging to 启动辅助名称节点日志路径 /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-secondarynamenode-h1.out starting jobtracker, logging to 启动作业跟追器日志路径 /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-jobtracker-h1.out h4: starting tasktracker, logging to 启动任务跟追器日志路径 /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-h4.out h2: starting tasktracker, logging to 启动任务跟追器日志路径 /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-h2.out 14.检测守护进程启动情况 H1 [grid@h1 bin]$ pwd /usr/java/jdk1.6.0_25/bin [grid@h1 bin]$ ./jps 查看master后台java进程,统计和运行这个就可以查看了 28037 NameNode 名称节点进程 28037是进程号 28950 Jps 28220 SecondaryNameNode 辅助名称节点进程 28220是进程号 28259 JobTracker 作业跟踪器进程 28259是进程号 第二种查看java进程方法 [grid@h1 bin]$ ps -ef | grep java H4 [grid@h4 logs]$ cd /usr/java/jdk1.6.0_25/bin [grid@h4 bin]$ ./jps 查看slave后台java进程,统计和运行这个就可以查看了 9754 DataNode 数据节点进程 31085 Jps java进程 9847 TaskTracker 任务跟踪器进程 H2 [grid@h2 logs]$ cd /usr/java/jdk1.6.0_25/bin [grid@h2 bin]$ ./jps 查看slave后台java进程,统计和运行这个就可以查看了 7435 DataNode 数据节点进程 7535 TaskTracker 任务跟踪器进程 2261 Jps java进程 15.Hadoop测试 (1)创建一个文本leonarding.txt [grid@h1 grid]$ vim leonarding.txt (2)文本内容是I Love You Hadoop [grid@h1 grid]$ cat leonarding.txt I Love You Hadoop [grid@h1 grid]$ cd hadoop-0.20.2/bin (3)在HDFS文件系统上创建一个目录leo [grid@h1 bin]$ ./hadoop fs -mkdir /leo (4)复制文件leonarding.txt到leo目录 [grid@h1 bin]$ ./hadoop fs -copyFromLocal leonarding.txt /leo (5)显示HDSF文件系统目录下的内容 [grid@h1 bin]$ ./hadoop fs -ls /leo Found 1 items -rw-r--r-- 2 grid supergroup 0 2012-09-02 21:08 /leo/leonarding.txt (6)查看在HDFS文件系统上leonarding.txt内容 [grid@h1 bin]$ ./hadoop fs -cat /leo/leonarding.txt 实验完毕 Leonarding
2012.9.2
天津&autumn
分享技术~收获快乐
Blog:http://space.itpub.net/26686207
|