目錄
基于本地模式測驗
資料需求
定義Map類
定義Reduce類
自定義輸出型別類并且進行排序
定義磁區類Partitioner
定義Driver/main類(驅動)
資料結果
基于Hadoop集群測驗
jar包匯出并且上傳到集群
運行jar包(在這之前先將資料上傳到HDFS)
資料結果
基于本地模式測驗
資料需求
MapReduce是一種并行編程模型,用于大規模資料集的并行運算,
資料格式:
球員-位置-身高-體重-年齡-球齡-出場次數-場均時間-進攻能力-防守能力-是否進入過全明星-薪資
斯蒂芬-庫里,得分后衛,1.91,86,29,7,79,33.38,31.933,4,是,3468 勒布朗-詹姆斯,大前鋒,2.03,113,32,13,74,37.75,36.14,8,是,3329 保羅-米爾薩普,中鋒,2.03,112,32,10,69,33.95,22.712,7,是,3127 戈登-海沃德,小前鋒,2.03,103,27,6,73,34.45,25.382,5,是,2973......
需求:將資料按照薪資降序排行并且按位置進行磁區
資料輸出格式:
球員-位置-身高-體重-年齡-進攻能力-防守能力-薪資
凱爾-洛里,控球后衛,1.85m,89,31歲,29.35,4,2870萬美元 邁克-康利,控球后衛,1.85m,79,29歲,26.785,4,2853萬美元 拉塞爾-維斯布魯克,控球后衛,1.91m,91,28歲,42.95,9,2853萬美元 達米安-利拉德,控球后衛,1.91m,88,27歲,32.857,4,2615萬美元 ......
戈登-海沃德,小前鋒,2.03m,103,27歲,25.382,5,2973萬美元 德瑪爾-德羅贊,小前鋒,2.01m,99,28歲,31.219,5,2774萬美元 尼古拉斯-巴圖姆,小前鋒,2.03m,91,28歲,21.042,6,2243萬美元 卡瓦伊-萊納德,小前鋒,2.01m,104,26歲,30.024,5,1887萬美元 ......
勒布朗-詹姆斯,大前鋒,2.03m,113,32歲,36.14,8,3329萬美元 卡梅隆-安東尼,大前鋒,2.03m,109,33歲,25.298,5,2624萬美元 凱文-杜蘭特,大前鋒,2.06m,109,29歲,29.919,9,2500萬美元 奧托-波特,大前鋒,2.03m,93,24歲,15.952,5,2477萬美元 ......
保羅-米爾薩普,中鋒,2.03m,112,32歲,22.712,7,3127萬美元 布雷克-格里芬,中鋒,2.08m,114,28歲,27.488,6,2951萬美元 艾爾-霍弗德,中鋒,2.08m,111,31歲,19.956,6,2773萬美元 安德烈-德拉蒙德,中鋒,2.11m,127,24歲,18.751,11,2378萬美元 ......
斯蒂芬-庫里,得分后衛,1.91m,86,29歲,31.933,4,3468萬美元 詹姆斯-哈登,得分后衛,1.96m,100,28歲,41.288,7,2830萬美元 CJ-麥科勒姆,得分后衛,1.91m,86,26歲,26.522,2,2396萬美元 布拉德利-比爾,得分后衛,1.96m,94,24歲,26.568,3,2378萬美元 ......
定義Map類
class MapperNBA extends Mapper<LongWritable, Text, SerializeNBA, Text>{
SerializeNBA k = new SerializeNBA();
Text v = new Text();
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, SerializeNBA, Text>.Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] split = line.split(",");
String pos = split[1];
double height = Double.parseDouble(split[2]);
long weight = Long.parseLong(split[3]);
double attack = Double.parseDouble(split[8]);
int defend = Integer.parseInt(split[9]);
long income = Long.parseLong(split[11]);
k.setName(split[0]);
k.setPos(pos);
k.setHeight(height);
k.setWeight(weight);
k.setAge(Integer.parseInt(split[4]));
k.setAttack(attack);
k.setDefend(defend);
k.setIncome(income);
context.write(k,v);
}
}
定義Reduce類
class ReducerNBA extends Reducer<SerializeNBA, Text,SerializeNBA, Text>{
@Override
protected void reduce(SerializeNBA key, Iterable<Text> values, Reducer<SerializeNBA, Text, SerializeNBA, Text>.Context context) throws IOException, InterruptedException {
for (Text value : values) {
context.write(key,value);
}
}
}
自定義輸出型別類并且進行排序
class SerializeNBA implements WritableComparable<SerializeNBA>{
private String name;
private String pos;
private double height;
private long weight;
private int age;
private double attack;
private int defend;
private long income;
public SerializeNBA(){}
public String getName(String name) {
return this.name;
}
public void setName(String name) {
this.name = name;
}
public String getPos(String pos) {
return this.pos;
}
public void setPos(String pos) {
this.pos = pos;
}
public double getHeight(double height) {
return this.height;
}
public void setHeight(double height) {
this.height = height;
}
public long getWeight(long weight) {
return this.weight;
}
public void setWeight(long weight) {
this.weight = weight;
}
public int getAge(int age) {
return this.age;
}
public void setAge(int age) {
this.age = age;
}
public double getAttack(double attack) {
return this.attack;
}
public void setAttack(double attack) {
this.attack = attack;
}
public int getDefend(int defend) {
return this.defend;
}
public void setDefend(int defend) {
this.defend = defend;
}
public long getIncome(long income) {
return this.income;
}
public void setIncome(long income) {
this.income = income;
}
@Override
public void write(DataOutput dataOutput) throws IOException {
dataOutput.writeUTF(name);
dataOutput.writeUTF(pos);
dataOutput.writeDouble(height);
dataOutput.writeLong(weight);
dataOutput.writeInt(age);
dataOutput.writeDouble(attack);
dataOutput.writeInt(defend);
dataOutput.writeLong(income);
}
@Override
public void readFields(DataInput dataInput) throws IOException {
this.name=dataInput.readUTF();
this.pos=dataInput.readUTF();
this.height=dataInput.readDouble();
this.weight=dataInput.readLong();
this.age=dataInput.readInt();
this.attack=dataInput.readDouble();
this.defend=dataInput.readInt();
this.income=dataInput.readLong();
}
// readFileds()和write()方法用于讀取和寫入序列化資料以通過網路傳輸,
@Override
public String toString() {
return name + "," +
pos + "," +
height + "m" + "," +
weight + "," +
age + "歲"+","+
attack + "," +
defend + "," +
income+"萬美元";
}
@Override
public int compareTo(SerializeNBA o) {
if (this.income>o.income){
return -1;
}else if (this.income<o.income){
return 1;
}else{
return 0;
}
}
}
定義磁區類Partitioner
class PartitionerNBA extends Partitioner<SerializeNBA,Text>{
int partitioner;
@Override
public int getPartition(SerializeNBA serializeNBA, Text text, int numPartitions) {
String s = serializeNBA.toString();
String[] split = s.split(",");
if (split[1].equals("大前鋒")){
return partitioner = 0;
}else if (split[1].equals("小前鋒")){
return partitioner = 1;
}else if (split[1].equals("控球后衛")){
return partitioner = 2;
}else if (split[1].equals("中鋒")){
return partitioner = 3;
}else {
return partitioner = 4;
}
}
}
定義Driver/main類(驅動)
public class DriverNBA {
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Job job = Job.getInstance(new Configuration());
job.setJarByClass(DriverNBA.class);
job.setMapperClass(MapperNBA.class);
job.setReducerClass(ReducerNBA.class);
job.setMapOutputKeyClass(SerializeNBA.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(SerializeNBA.class);
job.setOutputValueClass(Text.class);
job.setPartitionerClass(PartitionerNBA.class);
job.setNumReduceTasks(5);
FileInputFormat.setInputPaths(job,new Path("E:\\com.raymone.hadoop\\data\\NBA"));
FileOutputFormat.setOutputPath(job,new Path("E:\\com.raymone.hadoop\\data\\NBA_OUT"));
System.exit(job.waitForCompletion(true)?0:1);
}
}
資料結果

基于Hadoop集群測驗
jar包匯出并且上傳到集群
上傳jar包并將其改名為nba.jar(mv /com.raymone.hadoop-1.0-SNAPSHOT.jar /nba.jar)

運行jar包(在這之前先將資料上傳到HDFS)

資料結果
登錄9870埠:
資料已經存放在HDFS的各個DataNode節點里 ,
可以根據自己的需求輸出自己想要的資料:

轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/347090.html
標籤:其他
上一篇:HBase之集群搭建與快速入門
