目錄
一.Shuffle Write框架 1.不聚合,不排序(BypassMergeSortShuffleWriter) 2.不聚合,但排序(SortShuffleWriter) 3.聚合,排序或者不排序
二.Shuffle Read框架 1.不聚合,不按key排序 2.不聚合,按key排序 3.聚合,排序或者不排序
三.支持高效聚合和排序的資料結構 四.Spark和MapReduce的shuffle機制對比 五.總結
一.Shuffle Write框架
框架通用流程:
<style>#mermaid-svg-Nnmsk7TwgRZLRuGP .label{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);fill:#333;color:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .label text{fill:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .node rect,#mermaid-svg-Nnmsk7TwgRZLRuGP .node circle,#mermaid-svg-Nnmsk7TwgRZLRuGP .node ellipse,#mermaid-svg-Nnmsk7TwgRZLRuGP .node polygon,#mermaid-svg-Nnmsk7TwgRZLRuGP .node path{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-Nnmsk7TwgRZLRuGP .node .label{text-align:center;fill:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .node.clickable{cursor:pointer}#mermaid-svg-Nnmsk7TwgRZLRuGP .arrowheadPath{fill:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-svg-Nnmsk7TwgRZLRuGP .flowchart-link{stroke:#333;fill:none}#mermaid-svg-Nnmsk7TwgRZLRuGP .edgeLabel{background-color:#e8e8e8;text-align:center}#mermaid-svg-Nnmsk7TwgRZLRuGP .edgeLabel rect{opacity:0.9}#mermaid-svg-Nnmsk7TwgRZLRuGP .edgeLabel span{color:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .cluster rect{fill:#ffffde;stroke:#aa3;stroke-width:1px}#mermaid-svg-Nnmsk7TwgRZLRuGP .cluster text{fill:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-svg-Nnmsk7TwgRZLRuGP .actor{stroke:#ccf;fill:#ECECFF}#mermaid-svg-Nnmsk7TwgRZLRuGP text.actor>tspan{fill:#000;stroke:none}#mermaid-svg-Nnmsk7TwgRZLRuGP .actor-line{stroke:grey}#mermaid-svg-Nnmsk7TwgRZLRuGP .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .messageLine1{stroke-width:1.5;stroke-dasharray:2, 2;stroke:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP #arrowhead path{fill:#333;stroke:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .sequenceNumber{fill:#fff}#mermaid-svg-Nnmsk7TwgRZLRuGP #sequencenumber{fill:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP #crosshead path{fill:#333;stroke:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .messageText{fill:#333;stroke:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .labelBox{stroke:#ccf;fill:#ECECFF}#mermaid-svg-Nnmsk7TwgRZLRuGP .labelText,#mermaid-svg-Nnmsk7TwgRZLRuGP .labelText>tspan{fill:#000;stroke:none}#mermaid-svg-Nnmsk7TwgRZLRuGP .loopText,#mermaid-svg-Nnmsk7TwgRZLRuGP .loopText>tspan{fill:#000;stroke:none}#mermaid-svg-Nnmsk7TwgRZLRuGP .loopLine{stroke-width:2px;stroke-dasharray:2, 2;stroke:#ccf;fill:#ccf}#mermaid-svg-Nnmsk7TwgRZLRuGP .note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-Nnmsk7TwgRZLRuGP .noteText,#mermaid-svg-Nnmsk7TwgRZLRuGP .noteText>tspan{fill:#000;stroke:none}#mermaid-svg-Nnmsk7TwgRZLRuGP .activation0{fill:#f4f4f4;stroke:#666}#mermaid-svg-Nnmsk7TwgRZLRuGP .activation1{fill:#f4f4f4;stroke:#666}#mermaid-svg-Nnmsk7TwgRZLRuGP .activation2{fill:#f4f4f4;stroke:#666}#mermaid-svg-Nnmsk7TwgRZLRuGP .mermaid-main-font{font-family:"trebuchet ms", verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-Nnmsk7TwgRZLRuGP .section{stroke:none;opacity:0.2}#mermaid-svg-Nnmsk7TwgRZLRuGP .section0{fill:rgba(102,102,255,0.49)}#mermaid-svg-Nnmsk7TwgRZLRuGP .section2{fill:#fff400}#mermaid-svg-Nnmsk7TwgRZLRuGP .section1,#mermaid-svg-Nnmsk7TwgRZLRuGP .section3{fill:#fff;opacity:0.2}#mermaid-svg-Nnmsk7TwgRZLRuGP .sectionTitle0{fill:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .sectionTitle1{fill:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .sectionTitle2{fill:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .sectionTitle3{fill:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .sectionTitle{text-anchor:start;font-size:11px;text-height:14px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-Nnmsk7TwgRZLRuGP .grid .tick{stroke:#d3d3d3;opacity:0.8;shape-rendering:crispEdges}#mermaid-svg-Nnmsk7TwgRZLRuGP .grid .tick text{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-Nnmsk7TwgRZLRuGP .grid path{stroke-width:0}#mermaid-svg-Nnmsk7TwgRZLRuGP .today{fill:none;stroke:red;stroke-width:2px}#mermaid-svg-Nnmsk7TwgRZLRuGP .task{stroke-width:2}#mermaid-svg-Nnmsk7TwgRZLRuGP .taskText{text-anchor:middle;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-Nnmsk7TwgRZLRuGP .taskText:not([font-size]){font-size:11px}#mermaid-svg-Nnmsk7TwgRZLRuGP .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-Nnmsk7TwgRZLRuGP .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-svg-Nnmsk7TwgRZLRuGP .task.clickable{cursor:pointer}#mermaid-svg-Nnmsk7TwgRZLRuGP .taskText.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-Nnmsk7TwgRZLRuGP .taskTextOutsideLeft.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-Nnmsk7TwgRZLRuGP .taskTextOutsideRight.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-Nnmsk7TwgRZLRuGP .taskText0,#mermaid-svg-Nnmsk7TwgRZLRuGP .taskText1,#mermaid-svg-Nnmsk7TwgRZLRuGP .taskText2,#mermaid-svg-Nnmsk7TwgRZLRuGP .taskText3{fill:#fff}#mermaid-svg-Nnmsk7TwgRZLRuGP .task0,#mermaid-svg-Nnmsk7TwgRZLRuGP .task1,#mermaid-svg-Nnmsk7TwgRZLRuGP .task2,#mermaid-svg-Nnmsk7TwgRZLRuGP .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-svg-Nnmsk7TwgRZLRuGP .taskTextOutside0,#mermaid-svg-Nnmsk7TwgRZLRuGP .taskTextOutside2{fill:#000}#mermaid-svg-Nnmsk7TwgRZLRuGP .taskTextOutside1,#mermaid-svg-Nnmsk7TwgRZLRuGP .taskTextOutside3{fill:#000}#mermaid-svg-Nnmsk7TwgRZLRuGP .active0,#mermaid-svg-Nnmsk7TwgRZLRuGP .active1,#mermaid-svg-Nnmsk7TwgRZLRuGP .active2,#mermaid-svg-Nnmsk7TwgRZLRuGP .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-svg-Nnmsk7TwgRZLRuGP .activeText0,#mermaid-svg-Nnmsk7TwgRZLRuGP .activeText1,#mermaid-svg-Nnmsk7TwgRZLRuGP .activeText2,#mermaid-svg-Nnmsk7TwgRZLRuGP .activeText3{fill:#000 !important}#mermaid-svg-Nnmsk7TwgRZLRuGP .done0,#mermaid-svg-Nnmsk7TwgRZLRuGP .done1,#mermaid-svg-Nnmsk7TwgRZLRuGP .done2,#mermaid-svg-Nnmsk7TwgRZLRuGP .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-svg-Nnmsk7TwgRZLRuGP .doneText0,#mermaid-svg-Nnmsk7TwgRZLRuGP .doneText1,#mermaid-svg-Nnmsk7TwgRZLRuGP .doneText2,#mermaid-svg-Nnmsk7TwgRZLRuGP .doneText3{fill:#000 !important}#mermaid-svg-Nnmsk7TwgRZLRuGP .crit0,#mermaid-svg-Nnmsk7TwgRZLRuGP .crit1,#mermaid-svg-Nnmsk7TwgRZLRuGP .crit2,#mermaid-svg-Nnmsk7TwgRZLRuGP .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-svg-Nnmsk7TwgRZLRuGP .activeCrit0,#mermaid-svg-Nnmsk7TwgRZLRuGP .activeCrit1,#mermaid-svg-Nnmsk7TwgRZLRuGP .activeCrit2,#mermaid-svg-Nnmsk7TwgRZLRuGP .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-svg-Nnmsk7TwgRZLRuGP .doneCrit0,#mermaid-svg-Nnmsk7TwgRZLRuGP .doneCrit1,#mermaid-svg-Nnmsk7TwgRZLRuGP .doneCrit2,#mermaid-svg-Nnmsk7TwgRZLRuGP .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-svg-Nnmsk7TwgRZLRuGP .milestone{transform:rotate(45deg) scale(0.8, 0.8)}#mermaid-svg-Nnmsk7TwgRZLRuGP .milestoneText{font-style:italic}#mermaid-svg-Nnmsk7TwgRZLRuGP .doneCritText0,#mermaid-svg-Nnmsk7TwgRZLRuGP .doneCritText1,#mermaid-svg-Nnmsk7TwgRZLRuGP .doneCritText2,#mermaid-svg-Nnmsk7TwgRZLRuGP .doneCritText3{fill:#000 !important}#mermaid-svg-Nnmsk7TwgRZLRuGP .activeCritText0,#mermaid-svg-Nnmsk7TwgRZLRuGP .activeCritText1,#mermaid-svg-Nnmsk7TwgRZLRuGP .activeCritText2,#mermaid-svg-Nnmsk7TwgRZLRuGP .activeCritText3{fill:#000 !important}#mermaid-svg-Nnmsk7TwgRZLRuGP .titleText{text-anchor:middle;font-size:18px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-Nnmsk7TwgRZLRuGP g.classGroup text{fill:#9370db;stroke:none;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:10px}#mermaid-svg-Nnmsk7TwgRZLRuGP g.classGroup text .title{font-weight:bolder}#mermaid-svg-Nnmsk7TwgRZLRuGP g.clickable{cursor:pointer}#mermaid-svg-Nnmsk7TwgRZLRuGP g.classGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-Nnmsk7TwgRZLRuGP g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-Nnmsk7TwgRZLRuGP .classLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5}#mermaid-svg-Nnmsk7TwgRZLRuGP .classLabel .label{fill:#9370db;font-size:10px}#mermaid-svg-Nnmsk7TwgRZLRuGP .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-Nnmsk7TwgRZLRuGP .dashed-line{stroke-dasharray:3}#mermaid-svg-Nnmsk7TwgRZLRuGP #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-Nnmsk7TwgRZLRuGP #compositionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-Nnmsk7TwgRZLRuGP #aggregationStart{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-Nnmsk7TwgRZLRuGP #aggregationEnd{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-Nnmsk7TwgRZLRuGP #dependencyStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-Nnmsk7TwgRZLRuGP #dependencyEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-Nnmsk7TwgRZLRuGP #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-Nnmsk7TwgRZLRuGP #extensionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-Nnmsk7TwgRZLRuGP .commit-id,#mermaid-svg-Nnmsk7TwgRZLRuGP .commit-msg,#mermaid-svg-Nnmsk7TwgRZLRuGP .branch-label{fill:lightgrey;color:lightgrey;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-Nnmsk7TwgRZLRuGP .pieTitleText{text-anchor:middle;font-size:25px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-Nnmsk7TwgRZLRuGP .slice{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-Nnmsk7TwgRZLRuGP g.stateGroup text{fill:#9370db;stroke:none;font-size:10px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-Nnmsk7TwgRZLRuGP g.stateGroup text{fill:#9370db;fill:#333;stroke:none;font-size:10px}#mermaid-svg-Nnmsk7TwgRZLRuGP g.statediagram-cluster .cluster-label text{fill:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP g.stateGroup .state-title{font-weight:bolder;fill:#000}#mermaid-svg-Nnmsk7TwgRZLRuGP g.stateGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-Nnmsk7TwgRZLRuGP g.stateGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-Nnmsk7TwgRZLRuGP .transition{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-Nnmsk7TwgRZLRuGP .stateGroup .composit{fill:white;border-bottom:1px}#mermaid-svg-Nnmsk7TwgRZLRuGP .stateGroup .alt-composit{fill:#e0e0e0;border-bottom:1px}#mermaid-svg-Nnmsk7TwgRZLRuGP .state-note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-Nnmsk7TwgRZLRuGP .state-note text{fill:black;stroke:none;font-size:10px}#mermaid-svg-Nnmsk7TwgRZLRuGP .stateLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.7}#mermaid-svg-Nnmsk7TwgRZLRuGP .edgeLabel text{fill:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .stateLabel text{fill:#000;font-size:10px;font-weight:bold;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-Nnmsk7TwgRZLRuGP .node circle.state-start{fill:black;stroke:black}#mermaid-svg-Nnmsk7TwgRZLRuGP .node circle.state-end{fill:black;stroke:white;stroke-width:1.5}#mermaid-svg-Nnmsk7TwgRZLRuGP #statediagram-barbEnd{fill:#9370db}#mermaid-svg-Nnmsk7TwgRZLRuGP .statediagram-cluster rect{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-Nnmsk7TwgRZLRuGP .statediagram-cluster rect.outer{rx:5px;ry:5px}#mermaid-svg-Nnmsk7TwgRZLRuGP .statediagram-state .divider{stroke:#9370db}#mermaid-svg-Nnmsk7TwgRZLRuGP .statediagram-state .title-state{rx:5px;ry:5px}#mermaid-svg-Nnmsk7TwgRZLRuGP .statediagram-cluster.statediagram-cluster .inner{fill:white}#mermaid-svg-Nnmsk7TwgRZLRuGP .statediagram-cluster.statediagram-cluster-alt .inner{fill:#e0e0e0}#mermaid-svg-Nnmsk7TwgRZLRuGP .statediagram-cluster .inner{rx:0;ry:0}#mermaid-svg-Nnmsk7TwgRZLRuGP .statediagram-state rect.basic{rx:5px;ry:5px}#mermaid-svg-Nnmsk7TwgRZLRuGP .statediagram-state rect.divider{stroke-dasharray:10,10;fill:#efefef}#mermaid-svg-Nnmsk7TwgRZLRuGP .note-edge{stroke-dasharray:5}#mermaid-svg-Nnmsk7TwgRZLRuGP .statediagram-note rect{fill:#fff5ad;stroke:#aa3;stroke-width:1px;rx:0;ry:0}:root{--mermaid-font-family: '"trebuchet ms", verdana, arial';--mermaid-font-family: "Comic Sans MS", "Comic Sans", cursive}#mermaid-svg-Nnmsk7TwgRZLRuGP .error-icon{fill:#522}#mermaid-svg-Nnmsk7TwgRZLRuGP .error-text{fill:#522;stroke:#522}#mermaid-svg-Nnmsk7TwgRZLRuGP .edge-thickness-normal{stroke-width:2px}#mermaid-svg-Nnmsk7TwgRZLRuGP .edge-thickness-thick{stroke-width:3.5px}#mermaid-svg-Nnmsk7TwgRZLRuGP .edge-pattern-solid{stroke-dasharray:0}#mermaid-svg-Nnmsk7TwgRZLRuGP .edge-pattern-dashed{stroke-dasharray:3}#mermaid-svg-Nnmsk7TwgRZLRuGP .edge-pattern-dotted{stroke-dasharray:2}#mermaid-svg-Nnmsk7TwgRZLRuGP .marker{fill:#333}#mermaid-svg-Nnmsk7TwgRZLRuGP .marker.cross{stroke:#333}
:root { --mermaid-font-family: "trebuchet ms", verdana, arial;}</style>
<style>#mermaid-svg-Nnmsk7TwgRZLRuGP {
color: rgba(0, 0, 0, 0.75);
font: ;
}</style>
map磁區標記
聚合
排序
磁區輸出
聚合和排序是可選的,
map磁區標記就是在<key,value> record上加上partition資訊,變為<partition,key,value>,
1.不聚合,不排序(BypassMergeSortShuffleWriter)
每個partition都有個buffer,輸出到一個磁區檔案(本地磁盤),最后合并為一個檔案,
優點:速度很快, 缺點:partition過多,buffer就越多,記憶體消耗過大,并且輸出的磁區檔案很多,
常見的算子有:groupByKey(100),partitionBy(100),sortByKey(100),磁區數較少,默認磁區上限引數為200
2.不聚合,但排序(SortShuffleWriter)
排序程序需要Array陣列(PartitonPairBuffer)來進行排序, 可以擴容和spill到磁盤,spill到磁盤的多個檔案再進行全域排序(一個map task的排序),結果輸出到磁盤,
這個可以解決第一種型別的缺點,排序后可直接寫入一個檔案, 常見的算子有:groupByKey(300),partitionBy(300),sortByKey(300),磁區數較多
<key,value> record,變為<(partition,key),value> record,相當于進行雙重排序, spark暫時并沒有實作這個排序的資料操作,用戶可自定義,
3.聚合,排序或者不排序
聚合程序需要類似HashMap的資料結構來進行聚合,
HashMap的key是 “partitionId+key” , <key,value> record 聚合成<key,func(list(value))> ,對list進行func操作,
比如reduceByKey()進行sum()操作,
有兩種聚合方式: 1.兩步聚合,全部存在list中再func ,占用記憶體比較大, 2.在線聚合,每個record加入HashMap就進行func操作,
排序,再用Array陣列,
spark使用了特殊設計的HashMap,即PartitionAppendOnlyMap,可同時聚合和排序,
二.Shuffle Read框架
框架通用流程:
<style>#mermaid-svg-mbeerF3gBl9ZsPk0 .label{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);fill:#333;color:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .label text{fill:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .node rect,#mermaid-svg-mbeerF3gBl9ZsPk0 .node circle,#mermaid-svg-mbeerF3gBl9ZsPk0 .node ellipse,#mermaid-svg-mbeerF3gBl9ZsPk0 .node polygon,#mermaid-svg-mbeerF3gBl9ZsPk0 .node path{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-mbeerF3gBl9ZsPk0 .node .label{text-align:center;fill:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .node.clickable{cursor:pointer}#mermaid-svg-mbeerF3gBl9ZsPk0 .arrowheadPath{fill:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-svg-mbeerF3gBl9ZsPk0 .flowchart-link{stroke:#333;fill:none}#mermaid-svg-mbeerF3gBl9ZsPk0 .edgeLabel{background-color:#e8e8e8;text-align:center}#mermaid-svg-mbeerF3gBl9ZsPk0 .edgeLabel rect{opacity:0.9}#mermaid-svg-mbeerF3gBl9ZsPk0 .edgeLabel span{color:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .cluster rect{fill:#ffffde;stroke:#aa3;stroke-width:1px}#mermaid-svg-mbeerF3gBl9ZsPk0 .cluster text{fill:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-svg-mbeerF3gBl9ZsPk0 .actor{stroke:#ccf;fill:#ECECFF}#mermaid-svg-mbeerF3gBl9ZsPk0 text.actor>tspan{fill:#000;stroke:none}#mermaid-svg-mbeerF3gBl9ZsPk0 .actor-line{stroke:grey}#mermaid-svg-mbeerF3gBl9ZsPk0 .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .messageLine1{stroke-width:1.5;stroke-dasharray:2, 2;stroke:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 #arrowhead path{fill:#333;stroke:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .sequenceNumber{fill:#fff}#mermaid-svg-mbeerF3gBl9ZsPk0 #sequencenumber{fill:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 #crosshead path{fill:#333;stroke:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .messageText{fill:#333;stroke:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .labelBox{stroke:#ccf;fill:#ECECFF}#mermaid-svg-mbeerF3gBl9ZsPk0 .labelText,#mermaid-svg-mbeerF3gBl9ZsPk0 .labelText>tspan{fill:#000;stroke:none}#mermaid-svg-mbeerF3gBl9ZsPk0 .loopText,#mermaid-svg-mbeerF3gBl9ZsPk0 .loopText>tspan{fill:#000;stroke:none}#mermaid-svg-mbeerF3gBl9ZsPk0 .loopLine{stroke-width:2px;stroke-dasharray:2, 2;stroke:#ccf;fill:#ccf}#mermaid-svg-mbeerF3gBl9ZsPk0 .note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-mbeerF3gBl9ZsPk0 .noteText,#mermaid-svg-mbeerF3gBl9ZsPk0 .noteText>tspan{fill:#000;stroke:none}#mermaid-svg-mbeerF3gBl9ZsPk0 .activation0{fill:#f4f4f4;stroke:#666}#mermaid-svg-mbeerF3gBl9ZsPk0 .activation1{fill:#f4f4f4;stroke:#666}#mermaid-svg-mbeerF3gBl9ZsPk0 .activation2{fill:#f4f4f4;stroke:#666}#mermaid-svg-mbeerF3gBl9ZsPk0 .mermaid-main-font{font-family:"trebuchet ms", verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-mbeerF3gBl9ZsPk0 .section{stroke:none;opacity:0.2}#mermaid-svg-mbeerF3gBl9ZsPk0 .section0{fill:rgba(102,102,255,0.49)}#mermaid-svg-mbeerF3gBl9ZsPk0 .section2{fill:#fff400}#mermaid-svg-mbeerF3gBl9ZsPk0 .section1,#mermaid-svg-mbeerF3gBl9ZsPk0 .section3{fill:#fff;opacity:0.2}#mermaid-svg-mbeerF3gBl9ZsPk0 .sectionTitle0{fill:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .sectionTitle1{fill:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .sectionTitle2{fill:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .sectionTitle3{fill:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .sectionTitle{text-anchor:start;font-size:11px;text-height:14px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-mbeerF3gBl9ZsPk0 .grid .tick{stroke:#d3d3d3;opacity:0.8;shape-rendering:crispEdges}#mermaid-svg-mbeerF3gBl9ZsPk0 .grid .tick text{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-mbeerF3gBl9ZsPk0 .grid path{stroke-width:0}#mermaid-svg-mbeerF3gBl9ZsPk0 .today{fill:none;stroke:red;stroke-width:2px}#mermaid-svg-mbeerF3gBl9ZsPk0 .task{stroke-width:2}#mermaid-svg-mbeerF3gBl9ZsPk0 .taskText{text-anchor:middle;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-mbeerF3gBl9ZsPk0 .taskText:not([font-size]){font-size:11px}#mermaid-svg-mbeerF3gBl9ZsPk0 .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-mbeerF3gBl9ZsPk0 .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-svg-mbeerF3gBl9ZsPk0 .task.clickable{cursor:pointer}#mermaid-svg-mbeerF3gBl9ZsPk0 .taskText.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-mbeerF3gBl9ZsPk0 .taskTextOutsideLeft.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-mbeerF3gBl9ZsPk0 .taskTextOutsideRight.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-mbeerF3gBl9ZsPk0 .taskText0,#mermaid-svg-mbeerF3gBl9ZsPk0 .taskText1,#mermaid-svg-mbeerF3gBl9ZsPk0 .taskText2,#mermaid-svg-mbeerF3gBl9ZsPk0 .taskText3{fill:#fff}#mermaid-svg-mbeerF3gBl9ZsPk0 .task0,#mermaid-svg-mbeerF3gBl9ZsPk0 .task1,#mermaid-svg-mbeerF3gBl9ZsPk0 .task2,#mermaid-svg-mbeerF3gBl9ZsPk0 .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-svg-mbeerF3gBl9ZsPk0 .taskTextOutside0,#mermaid-svg-mbeerF3gBl9ZsPk0 .taskTextOutside2{fill:#000}#mermaid-svg-mbeerF3gBl9ZsPk0 .taskTextOutside1,#mermaid-svg-mbeerF3gBl9ZsPk0 .taskTextOutside3{fill:#000}#mermaid-svg-mbeerF3gBl9ZsPk0 .active0,#mermaid-svg-mbeerF3gBl9ZsPk0 .active1,#mermaid-svg-mbeerF3gBl9ZsPk0 .active2,#mermaid-svg-mbeerF3gBl9ZsPk0 .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-svg-mbeerF3gBl9ZsPk0 .activeText0,#mermaid-svg-mbeerF3gBl9ZsPk0 .activeText1,#mermaid-svg-mbeerF3gBl9ZsPk0 .activeText2,#mermaid-svg-mbeerF3gBl9ZsPk0 .activeText3{fill:#000 !important}#mermaid-svg-mbeerF3gBl9ZsPk0 .done0,#mermaid-svg-mbeerF3gBl9ZsPk0 .done1,#mermaid-svg-mbeerF3gBl9ZsPk0 .done2,#mermaid-svg-mbeerF3gBl9ZsPk0 .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-svg-mbeerF3gBl9ZsPk0 .doneText0,#mermaid-svg-mbeerF3gBl9ZsPk0 .doneText1,#mermaid-svg-mbeerF3gBl9ZsPk0 .doneText2,#mermaid-svg-mbeerF3gBl9ZsPk0 .doneText3{fill:#000 !important}#mermaid-svg-mbeerF3gBl9ZsPk0 .crit0,#mermaid-svg-mbeerF3gBl9ZsPk0 .crit1,#mermaid-svg-mbeerF3gBl9ZsPk0 .crit2,#mermaid-svg-mbeerF3gBl9ZsPk0 .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-svg-mbeerF3gBl9ZsPk0 .activeCrit0,#mermaid-svg-mbeerF3gBl9ZsPk0 .activeCrit1,#mermaid-svg-mbeerF3gBl9ZsPk0 .activeCrit2,#mermaid-svg-mbeerF3gBl9ZsPk0 .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-svg-mbeerF3gBl9ZsPk0 .doneCrit0,#mermaid-svg-mbeerF3gBl9ZsPk0 .doneCrit1,#mermaid-svg-mbeerF3gBl9ZsPk0 .doneCrit2,#mermaid-svg-mbeerF3gBl9ZsPk0 .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-svg-mbeerF3gBl9ZsPk0 .milestone{transform:rotate(45deg) scale(0.8, 0.8)}#mermaid-svg-mbeerF3gBl9ZsPk0 .milestoneText{font-style:italic}#mermaid-svg-mbeerF3gBl9ZsPk0 .doneCritText0,#mermaid-svg-mbeerF3gBl9ZsPk0 .doneCritText1,#mermaid-svg-mbeerF3gBl9ZsPk0 .doneCritText2,#mermaid-svg-mbeerF3gBl9ZsPk0 .doneCritText3{fill:#000 !important}#mermaid-svg-mbeerF3gBl9ZsPk0 .activeCritText0,#mermaid-svg-mbeerF3gBl9ZsPk0 .activeCritText1,#mermaid-svg-mbeerF3gBl9ZsPk0 .activeCritText2,#mermaid-svg-mbeerF3gBl9ZsPk0 .activeCritText3{fill:#000 !important}#mermaid-svg-mbeerF3gBl9ZsPk0 .titleText{text-anchor:middle;font-size:18px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-mbeerF3gBl9ZsPk0 g.classGroup text{fill:#9370db;stroke:none;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:10px}#mermaid-svg-mbeerF3gBl9ZsPk0 g.classGroup text .title{font-weight:bolder}#mermaid-svg-mbeerF3gBl9ZsPk0 g.clickable{cursor:pointer}#mermaid-svg-mbeerF3gBl9ZsPk0 g.classGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-mbeerF3gBl9ZsPk0 g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-mbeerF3gBl9ZsPk0 .classLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5}#mermaid-svg-mbeerF3gBl9ZsPk0 .classLabel .label{fill:#9370db;font-size:10px}#mermaid-svg-mbeerF3gBl9ZsPk0 .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-mbeerF3gBl9ZsPk0 .dashed-line{stroke-dasharray:3}#mermaid-svg-mbeerF3gBl9ZsPk0 #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-mbeerF3gBl9ZsPk0 #compositionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-mbeerF3gBl9ZsPk0 #aggregationStart{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-mbeerF3gBl9ZsPk0 #aggregationEnd{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-mbeerF3gBl9ZsPk0 #dependencyStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-mbeerF3gBl9ZsPk0 #dependencyEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-mbeerF3gBl9ZsPk0 #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-mbeerF3gBl9ZsPk0 #extensionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-mbeerF3gBl9ZsPk0 .commit-id,#mermaid-svg-mbeerF3gBl9ZsPk0 .commit-msg,#mermaid-svg-mbeerF3gBl9ZsPk0 .branch-label{fill:lightgrey;color:lightgrey;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-mbeerF3gBl9ZsPk0 .pieTitleText{text-anchor:middle;font-size:25px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-mbeerF3gBl9ZsPk0 .slice{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-mbeerF3gBl9ZsPk0 g.stateGroup text{fill:#9370db;stroke:none;font-size:10px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-mbeerF3gBl9ZsPk0 g.stateGroup text{fill:#9370db;fill:#333;stroke:none;font-size:10px}#mermaid-svg-mbeerF3gBl9ZsPk0 g.statediagram-cluster .cluster-label text{fill:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 g.stateGroup .state-title{font-weight:bolder;fill:#000}#mermaid-svg-mbeerF3gBl9ZsPk0 g.stateGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-mbeerF3gBl9ZsPk0 g.stateGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-mbeerF3gBl9ZsPk0 .transition{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-mbeerF3gBl9ZsPk0 .stateGroup .composit{fill:white;border-bottom:1px}#mermaid-svg-mbeerF3gBl9ZsPk0 .stateGroup .alt-composit{fill:#e0e0e0;border-bottom:1px}#mermaid-svg-mbeerF3gBl9ZsPk0 .state-note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-mbeerF3gBl9ZsPk0 .state-note text{fill:black;stroke:none;font-size:10px}#mermaid-svg-mbeerF3gBl9ZsPk0 .stateLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.7}#mermaid-svg-mbeerF3gBl9ZsPk0 .edgeLabel text{fill:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .stateLabel text{fill:#000;font-size:10px;font-weight:bold;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-mbeerF3gBl9ZsPk0 .node circle.state-start{fill:black;stroke:black}#mermaid-svg-mbeerF3gBl9ZsPk0 .node circle.state-end{fill:black;stroke:white;stroke-width:1.5}#mermaid-svg-mbeerF3gBl9ZsPk0 #statediagram-barbEnd{fill:#9370db}#mermaid-svg-mbeerF3gBl9ZsPk0 .statediagram-cluster rect{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-mbeerF3gBl9ZsPk0 .statediagram-cluster rect.outer{rx:5px;ry:5px}#mermaid-svg-mbeerF3gBl9ZsPk0 .statediagram-state .divider{stroke:#9370db}#mermaid-svg-mbeerF3gBl9ZsPk0 .statediagram-state .title-state{rx:5px;ry:5px}#mermaid-svg-mbeerF3gBl9ZsPk0 .statediagram-cluster.statediagram-cluster .inner{fill:white}#mermaid-svg-mbeerF3gBl9ZsPk0 .statediagram-cluster.statediagram-cluster-alt .inner{fill:#e0e0e0}#mermaid-svg-mbeerF3gBl9ZsPk0 .statediagram-cluster .inner{rx:0;ry:0}#mermaid-svg-mbeerF3gBl9ZsPk0 .statediagram-state rect.basic{rx:5px;ry:5px}#mermaid-svg-mbeerF3gBl9ZsPk0 .statediagram-state rect.divider{stroke-dasharray:10,10;fill:#efefef}#mermaid-svg-mbeerF3gBl9ZsPk0 .note-edge{stroke-dasharray:5}#mermaid-svg-mbeerF3gBl9ZsPk0 .statediagram-note rect{fill:#fff5ad;stroke:#aa3;stroke-width:1px;rx:0;ry:0}:root{--mermaid-font-family: '"trebuchet ms", verdana, arial';--mermaid-font-family: "Comic Sans MS", "Comic Sans", cursive}#mermaid-svg-mbeerF3gBl9ZsPk0 .error-icon{fill:#522}#mermaid-svg-mbeerF3gBl9ZsPk0 .error-text{fill:#522;stroke:#522}#mermaid-svg-mbeerF3gBl9ZsPk0 .edge-thickness-normal{stroke-width:2px}#mermaid-svg-mbeerF3gBl9ZsPk0 .edge-thickness-thick{stroke-width:3.5px}#mermaid-svg-mbeerF3gBl9ZsPk0 .edge-pattern-solid{stroke-dasharray:0}#mermaid-svg-mbeerF3gBl9ZsPk0 .edge-pattern-dashed{stroke-dasharray:3}#mermaid-svg-mbeerF3gBl9ZsPk0 .edge-pattern-dotted{stroke-dasharray:2}#mermaid-svg-mbeerF3gBl9ZsPk0 .marker{fill:#333}#mermaid-svg-mbeerF3gBl9ZsPk0 .marker.cross{stroke:#333}
:root { --mermaid-font-family: "trebuchet ms", verdana, arial;}</style>
<style>#mermaid-svg-mbeerF3gBl9ZsPk0 {
color: rgba(0, 0, 0, 0.75);
font: ;
}</style>
資料獲取
聚合
排序
后續transformation
聚合和排序是可選的,
1.不聚合,不按key排序
從各個map task中獲取一種磁區的資料,輸出到buffer中,再從buffer中獲取資料,
優點:簡單, 適用算子:partitionBy()
2.不聚合,按key排序
排序程序需要Array陣列(PartitonPairBuffer)來進行排序, 可以擴容和spill到磁盤,spill到磁盤的多個檔案再進行全域排序,
適用算子:sortByKey()
3.聚合,排序或者不排序
spark使用了特殊設計的HashMap,即ExternalAppendOnlyMap,可同時聚合和排序,
和map端的聚合的PartitionAppendOnlyMap不同,ExternalAppendOnlyMap不包含Partition資訊,
三.支持高效聚合和排序的資料結構
AppendOnlyMap就是只支持添加和更新 的HashMap,
HashMap基于陣列+鏈表,AppendOnlyMap只基于陣列( Array[(K, V)] ),
擴容:利用率達到70%,擴大一倍,因此所有key需要rehash,
排序:將所有<K, V> record移到陣列前端(不留空隙),然后可以快排,按key值排序,比如聚合后需要排序的操作,對于其他操作,只需要按key的Hash排序即可,比如groupByKey(),
AppendOnlyMap只支持記憶體,ExternalAppendOnlyMap支持記憶體+磁盤, ExternalAppendOnlyMap就是靠AppendOnlyMap和磁盤來解決問題,
全域聚合:最小堆進行多路歸并,
四.Spark和MapReduce的shuffle機制對比
MapReduce的優點: MapReduce是強制性按key排序,不管map端還是reduce端,原生支持sortByKey(),而且聚合操作,可以從前往后直接遍歷,很高效, MapReduce的缺點: 1.有些操作并不需要按key排序,如groupByKey(), 2.不能在線聚合,耗記憶體,從reduce函式可看出,函式引數是陣列,(即先combine,再reduce)
spark改進: 1.提供按key排序,按partition排序多種方式, 2.基于hashMap,能夠在線聚合,(比如reduceByKey(),groupByKey()不能,)
五.總結
mr強制按key排序,有聚合操作的話,就是先排序,然后聚合 ,
spark可以不按key排序,不聚合:partitionBy() 不聚合,按key排序:sortByKey() 基于Array 聚合,不排序:reduceByKey() 基于HashMap 聚合,按key排序:rdd.reduceByKey().sortByKey() 先聚合,再排序