一、首先新建虛擬機
二、配置靜態IP
1、首先查看虛擬網路編輯器 查看起始IP

2.1、修改靜態IP
輸入指令:vi /etc/sysconfig/network-scripts/ifcfg-ens33
修改BOOTPROTO=static
增加IPADDR、NETWASK、GATEWAY、DNS1
![]()

2.2、輸入指令:vi /etc/sysconfig/network增加以下兩條

2.3、輸入指令:vi /etc/hosts 添加上IP和主機名

2.4、輸入:reboot 重啟虛擬機
三、安裝JDK
3.1、在opt目錄下創建module、jdk檔案夾
輸入命令:cd /opt/
輸入命令:mkdir module
輸入命令:mkdir jdk
輸入命令:mkdir hadoop
3.2、卸載當前jdk
輸入命令:java -version 查看當前jdk版本
輸入命令:yum remove java* 卸載所有jdk

3.3、使用FileZilla鏈接虛擬機
將jdk壓縮包上傳到hadoop102的opt/jdk目錄下、
hadoop壓縮包上傳到hadoop102的opt/hadoop目錄下,

3.4、解壓壓縮包到制定目錄
輸入指令:cd /opt/jdk
輸入指令:tar -zxvf jdk(jdk壓縮包) -C /opt/module/

3.5、配置profile檔案并讓其生效
輸入指令:pwd 查看當前目錄
輸入指令:vi /etc/profile 在檔案末尾添加JAVA_HOME

輸入指令:source /etc/profile
輸入指令:java -version

四、安裝hadoop
4.1、解壓hadoop到指定目錄
切換到根目錄
輸入指令:mkdir kkb
輸入指令:cd kkb
輸入指令:mkdir install
輸入指令:cd /opt/hadoop
輸入指令:tar -zxvf hadoop(hadoop壓縮包) -C /kkb/install

4.2、配置profile檔案并使其生效
輸入指令:vi /etc/profile 配置HADOOP_HOME環境

輸入指令:source /etc/profile
輸入指令:hadoop version

五、克隆出hadoop103、hadoop104,并向hadoop102一樣步驟修改靜態IP
六、配置ssh免密登錄
6.1、以102為例配置ssh
輸入指令:cd ~/.ssh
輸入指令:ssh-keygen -t rsa
連續輸入三個回車,生成密匙
6.2、分發密匙,優先分發給自己,再分發給103、104
輸入指令:ssh-copy-id 192.168.88.130
輸入指令:ssh-copy-id 192.168.88.131
輸入指令:ssh-copy-id 192.168.88.132
6.3、在103、104上按照6.1-6.2的步驟配置ssh


七、配置集群分發腳本xsync
7.1、在/usr/local/bin目錄下創建xsync檔案
輸入指令:vi /usr/local/bin/xsync
7.2、xsync內容檔案如下:
#!/bin/bash
#1 獲取輸入引數個數,如果沒有引數,直接退出
pcount=$#
if((pcount==0)); then
echo no args;
exit;
fi
#2 獲取檔案名稱
p1=$1
fname=`basename $p1`
echo fname=$fname
#3 獲取上級目錄到絕對路徑
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir
#4 獲取當前用戶名稱
user=`whoami`
#5 回圈
for((host=103; host<105; host++)); do
#echo $pdir/$fname $user@hadoop$host:$pdir
echo --------------- hadoop$host ----------------
rsync -rvl $pdir/$fname $user@hadoop$host:$pdir
done
7.3、修改檔案權限
輸入指令:chomd a+x xsync
八、hadoop3的集群配置
8.1、執行checknative
輸入指令:cd /kkb/install/hadoop-3.1.4/
輸入指令:bin/hadoop checknative

8.2、安裝openssl-deve1
輸入指令:yum -y install openssl-deve1
8.3、修改dfs、yarn組態檔
輸入指令:cd /kkb/install/hadoop-3.1.4/etc/hadoop
輸入指令:vim /hadoop-env.sh 在末尾添加以下內容
export JAVA_HOME=/kkb/install/jdk1.8.0_162
輸入指令:vim core-site.xml 在標簽內添加以下內容
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop102:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/kkb/install/hadoop-3.1.4/hadoopDatas/tempDatas</value>
</property>
<!-- 緩沖區大小,實際作業中根據服務器性能動態調整;默認值4096 -->
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
<!-- 開啟hdfs的垃圾桶機制,洗掉掉的資料可以從垃圾桶中回收,單位分鐘;默認值0 -->
<property>
<name>fs.trash.interval</name>
<value>10080</value>
</property>
</configuration>
輸入指令:vim /hdfs-site.xml
<configuration>
<!-- NameNode存盤元資料資訊的路徑,實際作業中,一般先確定磁盤的掛載目錄,然后多個目錄用,進行分割 -->
<!-- 集群動態上下線
<property>
<name>dfs.hosts</name>
<value>/kkb/install/hadoop-3.1.4/etc/hadoop/accept_host</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/kkb/install/hadoop-3.1.4/etc/hadoop/deny_host</value>
</property>
-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop102:9868</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop102:9870</value>
</property>
<!-- namenode保存fsimage的路徑 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///kkb/install/hadoop-3.1.4/hadoopDatas/namenodeDatas</value>
</property>
<!-- 定義dataNode資料存盤的節點位置,實際作業中,一般先確定磁盤的掛載目錄,然后多個目錄用,進行分割 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///kkb/install/hadoop-3.1.4/hadoopDatas/datanodeDatas</value>
</property>
<!-- namenode保存editslog的目錄 -->
<property>
<name>dfs.namenode.edits.dir</name>
<value>file:///kkb/install/hadoop-3.1.4/hadoopDatas/dfs/nn/edits</value>
</property>
<!-- secondarynamenode保存待合并的fsimage -->
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///kkb/install/hadoop-3.1.4/hadoopDatas/dfs/snn/name</value>
</property>
<!-- secondarynamenode保存待合并的editslog -->
<property>
<name>dfs.namenode.checkpoint.edits.dir</name>
<value>file:///kkb/install/hadoop-3.1.4/hadoopDatas/dfs/nn/snn/edits</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
</configuration>
輸入指令:vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.job.ubertask.enable</name>
<value>true</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop102:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop102:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>
輸入指令:vi yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop102</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 如果vmem、pmem資源不夠,會報錯,此處將資源監察置為false -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
輸入指令:vi workers
hadoop102
hadoop103
hadoop104
8.4、創建檔案存放目錄
輸入指令:mkdir -p /kkb/install/hadoop-3.1.4/hadoopDatas/tempDatas
輸入指令:mkdir -p /kkb/install/hadoop-3.1.4/hadoopDatas/namenodeDatas
輸入指令:mkdir -p /kkb/install/hadoop-3.1.4/hadoopDatas/datanodeDatas
輸入指令:mkdir -p /kkb/install/hadoop-3.1.4/hadoopDatas/dfs/nn/edits
輸入指令:mkdir -p /kkb/install/hadoop-3.1.4/hadoopDatas/dfs/snn/name
輸入指令:mkdir -p /kkb/install/hadoop-3.1.4/hadoopDatas/dfs/nn/snn/edits
8.5、使用xsync分發組態檔
輸入指令:xsync hadoop-3.1.4 102在install目錄下將hadoop分發給103、104
九、啟動hdfs、yarn
9.1、在102上格式化集群(只能格式一次、不能頻繁格式)
輸入指令:hdfs namenode -format
9.2、在102的hadoop-3.1.4目錄下啟動dfs、yarn
輸入指令:sbin/start-dfs.sh
輸入指令:sbin/start-yarn.sh
9.3、jps命令查看啟動行程



9.4、驗證集群是否啟動成功
在瀏覽器打開:192.168.88.130:8088
在瀏覽器打開:192.168.88.130:9870


十、在Windows中配置hadoop
10.1、修改windows的hosts檔案
地址:C:\Windows\System32\drivers\etc\hosts
10.2、配置Windows本中配置hadoop環境
將集群所用的hadoop-3.1.4.tar.gz解壓到一個沒有中文、空格的目錄下
10.3、配置hadoop的環境變數


10.4、將下圖的hadoop.dll檔案拷貝到C:\\Windows\System32
10.5、將hadoop集群的一下5個組態檔core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、workers,拷貝到windows下hadoop的C:\hadoop-3.1.4\etc\hadoop目錄下
10.6、打開cmd運行hadoop命令
十一、安裝maven
11.1、下載安裝包 apache-maven-3.6.1-bin.zip 并解壓到某目錄、配置環境變數


11.2、cmd中運行mvn -v
11.3、找到maven解壓的目錄,找到settings.xml檔案,添加以下內容


11.4、打開IDEA,新建一個maven工程,配置pom檔案,內容如下:
<properties>
<hadoop.version>3.1.4</hadoop.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>${hadoop.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/junit/junit -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>RELEASE</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
<!-- <verbal>true</verbal>-->
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<minimizeJar>true</minimizeJar>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
十二、詞頻統計程式實作
12.1、撰寫mapper類
package wordcount;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class MyMapper extends Mapper <LongWritable, Text,Text, IntWritable>{
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//獲得當前行的資料
String line = value.toString();
//獲得一個個的單詞
String lineBuffer = line;
String[] keys = new String[]{" ", "\t", " ", ".", "(", ")", "(", ")"};
for (String k : keys){
lineBuffer = lineBuffer.replace(k, ",");
}
String[] wordsBuffer = lineBuffer.split(",");
List<String> words = new ArrayList<>();
for (String w : wordsBuffer){
if (!w.equals("")){
words.add(w);
}
}
//每個單詞編程kv對
for (String word : words) {
//將kv對輸出出去
context.write(new Text(word),new IntWritable(1));
}
}
}
12.2、撰寫Reducer類
package wordcount;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class MyReducer extends Reducer<Text, IntWritable,Text,IntWritable> {
//bear,List(2,3,3)
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum=0;
for (IntWritable value : values) {
int count = value.get();
sum +=count;
}
context.write(key,new IntWritable(sum));
}
}
12.3、組裝main程式
package wordcount;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class WordCount extends Configured implements Tool {
public static void main(String[] args) throws Exception {
int run = ToolRunner.run(new Configuration(), new WordCount(), args); // 集群代碼
System.exit(run);
}
@Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(super.getConf(), "wordcount");
job.setJarByClass(WordCount.class);
job.setInputFormatClass(TextInputFormat.class);
TextInputFormat.addInputPath(job, new Path(args[0]));
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setReducerClass(MyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
TextOutputFormat.setOutputPath(job, new Path(args[1]));
job.setNumReduceTasks(Integer.parseInt(args[2]));
boolean b = job.waitForCompletion(true);
return b ? 0 : 1;
}
}
12.4、將程式打包、點擊maven的package
十三、在集群中實作
13.1、使用FileZilla鏈接hadoop102
找到打包檔案,和測驗檔案,一起上傳到hadoop102
13.2、將測驗檔案上傳到hdfs
在hadoop-3.1.4目錄下
輸入指令:bin/hdfs dfs -mkdir -p /test-wrh 在hdfs上創建test-wrh檔案夾
輸入指令:bin/hdfs dfs -put 測驗檔案地址 /test-wrh/ 將測驗檔案上傳到test-wrh

13.3、在IDEA中拷貝地址
13.4、運行程式
在包含程式的目錄中:
輸入指令:hadoop jar jar包名 Reference /輸入路徑 /輸出路徑 3個節點
13.5、將結果從hdfs上下載
![]()
輸入指令:hadoop fs -get /輸出路徑/part-r-00000 /下載路徑

13.6、查看結果
輸入指令:vim part-r-00000

轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/394198.html
標籤:其他
下一篇:spark復習資料
