基于ubuntu的hadoop完全分布式集群搭建-有解無憂

借鑒網址1

借鑒網址2

hadoop官方配置教程

搭建虛擬機，克隆（或者先配置JAVA和Hadoop環境再克隆，之后要改主機名和映射以及SSH免密）

可以利用xsync集群分發腳本一臺機器配置其他機器分發

修改主機名和ip映射

檢查

配置ssh免密登錄

這里地址為 ~/.ssh

配置JAVA

Hadoop配置

Hadoop完全分布式集群搭建

    #Hadoop’s Java configuration is driven by two types of important configuration files:
    1.Read-only default configuration - core-default.xml, hdfs-default.xml, yarn-default.xml and mapred-default.xml.#默認組態檔在官網可以找到
    2.Site-specific configuration - etc/hadoop/core-site.xml, etc/hadoop/hdfs-site.xml, etc/hadoop/yarn-site.xml and etc/hadoop/mapred-site.xml. #這些都在etc/hadoop/中

集群分配示意圖

/*For large installations, these are generally running on separate hosts.*/
HDFS daemons are NameNode, SecondaryNameNode, and DataNode.
?
YARN daemons are ResourceManager, NodeManager, and WebAppProxy.
?
If MapReduce is to be used, then the MapReduce Job History Server will also be running.

開始配置

etc/hadoop/hadoop-env.sh中配置JAVA_HOME

配置core-site.xml

##core-site.xml  自己決定引數
<configuration>
         <!-- 指定 NameNode 的地址 -->
         <property>
                 <name>fs.defaultFS</name>
                 <value>hdfs://Master:8080</value>
         </property>
?
         <!-- 指定 hadoop 資料的存盤目錄 -->
         <property>
                 <name>hadoop.tmp.dir</name>
                 <value>/usr/local/hadoop/tmp</value>
         </property>
</configuration>
?

配置hdfs-site.xml

<configuration>
    <!-- nn web 端訪問地址-->
    <property>
         <name>dfs.namenode.http-address</name>
         <value>Master:9870</value>
     </property>
?
    <!-- 2nn web 端訪問地址-->
     <property>
         <name>dfs.namenode.secondary.http-address</name>
         <value>Slave2:9868</value>
     </property>
?
        <property>
         <name>dfs.namenode.name.dir</name>
         <value>file:/usr/local/hadoop/tmp/dfs/name</value>
    </property>
?
    <property>
         <name>dfs.datanode.data.dir</name>
         <value>file:/usr/local/hadoop/tmp/dfs/data</value>
    </property>
</configuration>

配置 yarn-site.xml

<configuration>
     <!-- 指定 MR 走 shuffle -->
     <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
     </property>
     
     <!-- 指定 ResourceManager 的地址-->
     <property>
         <name>yarn.resourcemanager.hostname</name>
         <value>Slave1</value>
     </property>
     
      <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>Slave1:8088</value>
      </property>
</configuration>

配置mapred-site.xm

<configuration>
    <!-- 指定 MapReduce 程式運行在 Yarn 上 -->
     <property>
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
     </property>
</configuration>

配置workers

List all worker hostnames or IP addresses in your etc/hadoop/workers file, one per line. Helper scripts (described below) will use the etc/hadoop/workers file to run commands on many hosts at once. It is not used for any of the Java-based Hadoop configuration. In order to use this functionality, ssh trusts (via either passphraseless ssh or some other means, such as Kerberos) must be established for the accounts used to run Hadoop.

Master
Slave1
Slave2

同步所有節點組態檔（這里使用xsync同步檔案腳本）

xsync腳本撰寫并且配置到PATH里可以echo $PATH然后隨便找個地方丟進去

更改執行權限 chmod 777 xsync

隨便建一個檔案測驗是否能傳到其他服務器上

xsync同步可以看到檔案更新

啟動hadoop集群

在Master節點上

首次啟動需要格式化只有首次啟動需要

啟動HDFS

在配置了ResourceManager的節點啟動Yarn

這里是Slave1

在Slave1配置historyserver

查看是否配置完成(與之前畫的圖一樣)

嘗試web頁面是否能打開

hdfs:

http://master:9870

如果在主機里沒有配置master與ip地址關聯的話

http://master的ip地址:9870

同理試試yarn:

http://Slave1:8088

由此，Hadoop集群配置完成

xsync

#!/bin/bash
#1. 判斷引數個數
if [ $# -lt 1 ]
then
     echo Not Enough Arguement!
     exit;
fi
?
#2. 遍歷集群所有機器
for host in Master Slave1 Slave2 ##這里注意更改!!
do
     echo ==================== $host ====================
     #3. 遍歷所有目錄，挨個發送
     for file in $@
     do
         #4. 判斷檔案是否存在
         if [ -e $file ]
             then
                 #5. 獲取父目錄
                 pdir=$(cd -P $(dirname $file); pwd)
                 #6. 獲取當前檔案的名稱
                 fname=$(basename $file)
                 ssh $host "mkdir -p $pdir"
                 rsync -av $pdir/$fname $host:$pdir
             else
                echo $file does not exists!
         fi
     done
done

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/394123.html

標籤：其他

上一篇：DNS服務器搭建

下一篇：Linux Shell語言考試系統（純Shell腳本，帶虛擬機共享檔案夾配置教程）