1.入门
HDFS 存储 MapReduce 计算 Spark Flink Yarn 资源作业调度伪分布式部署
要求 环境配置文件 参数文件 ssh无密码 启动jps命令
[hadoop@hadoop002 ~]$ jps 28288 NameNode NN 27120 Jps 28410 DataNode DN 28575 SecondaryNameNode SNN1.MapReduce job on Yarn
[hadoop@hadoop002 hadoop]$ cp mapred-site.xml.template mapred-site.xml [hadoop@hadoop002 hadoop]$Configure parameters as follows:
etc/hadoop/mapred-site.xml:<configuration>
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> etc/hadoop/yarn-site.xml:<configuration>
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration> Start ResourceManager daemon and NodeManager daemon: $ sbin/start-yarn.sh open web: http://47.75.249.8:8088/3.运行MR JOB
Linux 文件存储系统 mkdir ls HDFS 分布式文件存储系统 -format hdfs dfs -???Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfs dfs -mkdir /user $ bin/hdfs dfs -mkdir /user/<username> Copy the input files into the distributed filesystem: $ bin/hdfs dfs -put etc/hadoop input Run some of the examples provided: $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar grep input output 'dfs[a-z.]+' Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them:$ bin/hdfs dfs -get output output
$ cat output/* orView the output files on the distributed filesystem:
$ bin/hdfs dfs -cat output/*
------------------------------------------------- bin/hdfs dfs -mkdir /user/hadoop/input bin/hdfs dfs -put etc/hadoop/core-site.xml /user/hadoop/inputbin/hadoop jar \
share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar \ grep \ /user/hadoop/input \ /user/hadoop/output \ 'fs[a-z.]+' 4.HDFS三个进程启动以hadoop002启动 NN: core-site.xml fs.defaultFS参数 DN: slaves SNN:<property>
<name>dfs.namenode.secondary.http-address</name> <value>hadoop002:50090</value> </property> <property> <name>dfs.namenode.secondary.https-address</name> <value>hadoop002:50091</value> </property>5.jps
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ jps 16188 DataNode 16379 SecondaryNameNode 16566 Jps 16094 NameNode [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$5.1 位置
[hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$ which jps /usr/java/jdk1.7.0_80/bin/jps [hadoop@hadoop002 hadoop-2.6.0-cdh5.7.0]$5.2 其他用户
[root@hadoop002 ~]# jps 16188 -- process information unavailable 16607 Jps 16379 -- process information unavailable 16094 -- process information unavailable [root@hadoop002 ~]#[root@hadoop002 ~]# useradd jepson
[root@hadoop002 ~]# su - jepson [jepson@hadoop002 ~]$ jps 16664 Jps [jepson@hadoop002 ~]$process information unavailable
真正可用的[root@hadoop002 ~]# kill -9 16094
[root@hadoop002 ~]# [root@hadoop002 ~]# jps 16188 -- process information unavailable 16379 -- process information unavailable 16702 Jps 16094 -- process information unavailable [root@hadoop002 ~]# [root@hadoop002 ~]# ps -ef|grep 16094 root 16722 16590 0 22:19 pts/4 00:00:00 grep 16094 [root@hadoop002 ~]# process information unavailable 真正不可用的正确的做法: process information unavailable
1.找到进程号 pid 2.ps -ef|grep pid 是否存在 3.假如存在, 第二步是可以知道哪个用户运行这个进程, su - 用户,进去查看假如删除rm -f /tmp/hsperfdata_${user}/pid文件
进程不挂,但是jps命令不显示了,所依赖的脚本都会有问题 4.假如不存在,怎样清空残留信息 rm -f /tmp/hsperfdata_${user}/pid文件 http://blog.itpub.net/30089851/viewspace-1994344/
6.补充命令
ssh root@ip -p 22 ssh root@47.75.249.8 daterz sz
两个Linux系统怎样传输呢?
hadoop000-->hadoop002 [ruoze@hadoop000 ~]$ scp test.log root@47.75.249.8:/tmp/ 将当前的Linux系统文件 scp到 远程的机器上hadoop000<--hadoop002
[ruoze@hadoop002 ~]$ scp test.log root@hadoop000:/tmp/ 但是 hadoop002属于生产机器 你不可登陆 scp root@47.75.249.8:/tmp/test.log /tmp/rz.log 但是: 生产上 绝对不可能给你密码 ssh多台机器互相信任关系 http://blog.itpub.net/30089851/viewspace-1992210/ 坑: scp 传输 pub文件 /etc/hosts文件里面配置多台机器的ip和name--------------------------------------------
作业: 1.Yarn伪分布式部署 +1 blog 2.MR JOB +1 blog 3.HDFS进程启动 hadoop002 + 1blog 4.jps整理为1blog 5.再装1台VM虚拟机 ssh多台信任关系 1blog6.拓展:
rm -rf ~/.ssh A机器无密码访问B机器,请问谁的pub文件拷贝给谁?