val lines = rdd.map(_._2)
.map(line => {
... ...
})
.reduceByKey((a, b) => a.merge(b))
.map {
case((ip, dateTime, query, status), stats) => {
... ...
(ip, dateTime, query, count, bytes)
}
}
這時候,得到一組資料 (ip, dateTime, query, count, bytes),
問題:如何獲得:
1) 此 ip對應的 count累加,bytes累加結果,
2) 每個query對應的 count累加,bytes累加結果
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/75891.html
標籤:Spark
