data.mapPartitionsWithIndex
{
(index,points)=>
}
怎么在大括號中訪問index+i磁區中的資料呢,新手請教!
uj5u.com熱心網友回復:
Why do you want to access other partition's data?mapParitions(func) or mapPartitionsWihIndex(func) are for performance optimization, which allow your function to be run once PER partition, that's why its the function type must be Iterator<T> => Iterator<U>. You access the whole parittion's data in one iterator, but should and can NOT access other partitions' data.
uj5u.com熱心網友回復:
Why do you want to access other partition's data?你為什么想訪問另一個磁區的資料?
mapParitions(func) or mapPartitionsWihIndex(func) are for performance optimization, which allow your function to be run once PER partition, that's why its the function type must be Iterator<T> => Iterator<U>. You access the whole parittion's data in one iterator, but should and can NOT access other partitions' data.
mapParitions(func) 或mapPartitionsWihIndex(func) 是優化時用到的,這些操作允許你依次訪問每個磁區,這就是為什這個函式提供一個Iterator迭代參考給你,你可以通過這個迭代器遍歷磁區內的全部資料,但是一個磁區的迭代器不能訪問其他磁區的資料。
uj5u.com熱心網友回復:
樓上已經解釋得很清楚了。如果你需要跨行訪問資料,請使用self join。轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/69113.html
標籤:Spark
