最近剛學flink自己,寫了 ```StreamExecutionEnvironment env =StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Tuple2<String,Integer>> ds =env.fromElements(Tuple2.of("tom", 12),Tuple2.of("jack", 20)
,Tuple2.of("rose", 18),Tuple2.of("tom",20),Tuple2.of("jack", 32)
,Tuple2.of("rose", 11),Tuple2.of("frank", 12),Tuple2.of("putin", 67)
,Tuple2.of("frank", 22),Tuple2.of("tom", 18),Tuple2.of("putin", 99)
,Tuple2.of("jack", 28),Tuple2.of("rose", 19),Tuple2.of("putin", 33)
,Tuple2.of("rose", 88),Tuple2.of("jack", 47),Tuple2.of("tom", 66));
KeyedStream<Tuple2<String,Integer>, String> ks =ds.keyBy(new KeySelector<Tuple2<String,Integer>, String>() {
@Override
public String getKey(Tuple2<String, Integer> value) throws Exception {
String key =value.f0;
return key;
}
}); ```
但是print列印之后發現rose和frank被分到同一組了,請問這是為啥?我看keby說是通過hashcode實作的,但是這2個的hashcode并不一樣啊?
還有就是```DataStream<Tuple2<String, Integer>> st =ks.max(1);```之后print發現回傳的并不是各個分組的Integer最大值的元素,自己搞了半天沒弄明白,請問有哪位大佬能解釋下么?
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/20589.html
標籤:Spark
上一篇:k8s加入節點沒反應的問題
