作者：耿宗杰

前言

關于pprof的文章在網上已是汗牛充棟，卻是千篇一律的命令介紹，鮮有真正實操的，本文將參考Go社區資料，結合自己的經驗，實戰Go程式的性能分析與優化程序，

優化思路

首先說一下性能優化的一般思路，系統性能的分析優化，一定是從大到小的步驟來進行的，即從業務架構的優化，到系統架構的優化，再到系統模塊間的優化，最后到代碼撰寫層面的優化，業務架構的優化是最具性價比的，技術難度相對較小，卻可以帶來大幅的性能提升，比如通過和同事或外部門溝通，減少了一些介面呼叫或者去掉了不必要的復雜的業務邏輯，可以輕松提升整個系統的性能，

系統架構的優化，比如加入快取，由http改進為rpc等，也可以在少量投入下帶來較大的性能提升，最后是程式代碼級別的性能優化，這又分為兩方面，一是合格的資料結構與使用，二才是在此基礎上的性能剖析，比如在Go語言中使用slice這種方便的資料結構時，盡可能提前申請足夠的記憶體防止append超過容量時的記憶體申請和資料拷貝；使用并發保護時盡量由RWMutex 代替mutex，甚至在極高并發場景下使用更細粒度的原子操作代替鎖等等，

優化實踐

下面進入正文，待優化程式是社區中一個例子，代碼有點長，實作的演算法是著名的計算機科學家Tarjan的求圖的強連通分量演算法，關于這個演算法的思想請自行google（就別自行百度了~），以下為實操程序（會有那么一丟丟長，，，）：

初始版本代碼 havlak1.go:

// Go from multi-language-benchmark/src/havlak/go_pro// Copyright 2011 Google Inc.//// Licensed under the Apache License, Version 2.0 (the "License");// you may not use this file except in compliance with the License.// You may obtain a copy of the License at////     http://www.apache.org/licenses/LICENSE-2.0//// Unless required by applicable law or agreed to in writing, software// distributed under the License is distributed on an "AS IS" BASIS,// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.// See the License for the specific language governing permissions and// limitations under the License.// Test Program for the Havlak loop finder.//// This program constructs a fairly large control flow// graph and performs loop recognition. This is the Go// version.//package mainimport (   "flag"   "fmt"   "log"   "os"   "runtime/pprof")type BasicBlock struct {   Name     int   InEdges  []*BasicBlock   OutEdges []*BasicBlock}func NewBasicBlock(name int) *BasicBlock {   return &BasicBlock{Name: name}}func (bb *BasicBlock) Dump() {   fmt.Printf("BB#%06d:", bb.Name)   if len(bb.InEdges) > 0 {      fmt.Printf(" in :")      for _, iter := range bb.InEdges {         fmt.Printf(" BB#%06d", iter.Name)      }   }   if len(bb.OutEdges) > 0 {      fmt.Print(" out:")      for _, iter := range bb.OutEdges {         fmt.Printf(" BB#%06d", iter.Name)      }   }   fmt.Printf("\n")}func (bb *BasicBlock) NumPred() int {   return len(bb.InEdges)}func (bb *BasicBlock) NumSucc() int {   return len(bb.OutEdges)}func (bb *BasicBlock) AddInEdge(from *BasicBlock) {   bb.InEdges = append(bb.InEdges, from)}func (bb *BasicBlock) AddOutEdge(to *BasicBlock) {   bb.OutEdges = append(bb.OutEdges, to)}//-----------------------------------------------------------type CFG struct {   Blocks []*BasicBlock   Start  *BasicBlock}func NewCFG() *CFG {   return &CFG{}}func (cfg *CFG) NumNodes() int {   return len(cfg.Blocks)}func (cfg *CFG) CreateNode(node int) *BasicBlock {   if node < len(cfg.Blocks) {      return cfg.Blocks[node]   }   if node != len(cfg.Blocks) {      println("oops", node, len(cfg.Blocks))      panic("wtf")   }   bblock := NewBasicBlock(node)   cfg.Blocks = append(cfg.Blocks, bblock)   if len(cfg.Blocks) == 1 {      cfg.Start = bblock   }   return bblock}func (cfg *CFG) Dump() {   for _, n := range cfg.Blocks {      n.Dump()   }}//-----------------------------------------------------------type BasicBlockEdge struct {   Dst *BasicBlock   Src *BasicBlock}func NewBasicBlockEdge(cfg *CFG, from int, to int) *BasicBlockEdge {   self := new(BasicBlockEdge)   self.Src = https://www.cnblogs.com/Jcloud/archive/2022/12/16/cfg.CreateNode(from)   self.Dst = cfg.CreateNode(to)   self.Src.AddOutEdge(self.Dst)   self.Dst.AddInEdge(self.Src)   return self}//-----------------------------------------------------------// Basic Blocks and Loops are being classified as regular, irreducible,// and so on. This enum contains a symbolic name for all these classifications//const (   _             = iota // Go has an interesting iota concept   bbTop                // uninitialized   bbNonHeader          // a regular BB   bbReducible          // reducible loop   bbSelf               // single BB loop   bbIrreducible        // irreducible loop   bbDead               // a dead BB   bbLast               // sentinel)// UnionFindNode is used in the Union/Find algorithm to collapse// complete loops into a single node. These nodes and the// corresponding functionality are implemented with this class//type UnionFindNode struct {   parent    *UnionFindNode   bb        *BasicBlock   loop      *SimpleLoop   dfsNumber int}// Init explicitly initializes UnionFind nodes.//func (u *UnionFindNode) Init(bb *BasicBlock, dfsNumber int) {   u.parent = u   u.bb = bb   u.dfsNumber = dfsNumber   u.loop = nil}// FindSet implements the Find part of the Union/Find Algorithm//// Implemented with Path Compression (inner loops are only// visited and collapsed once, however, deep nests would still// result in significant traversals).//func (u *UnionFindNode) FindSet() *UnionFindNode {   var nodeList []*UnionFindNode   node := u   for ; node != node.parent; node = node.parent {      if node.parent != node.parent.parent {         nodeList = append(nodeList, node)      }   }   // Path Compression, all nodes' parents point to the 1st level parent.   for _, ll := range nodeList {      ll.parent = node.parent   }   return node}// Union relies on path compression.//func (u *UnionFindNode) Union(B *UnionFindNode) {   u.parent = B}// Constants//// Marker for uninitialized nodes.const unvisited = -1// Safeguard against pathological algorithm behavior.const maxNonBackPreds = 32 * 1024// IsAncestor//// As described in the paper, determine whether a node 'w' is a// "true" ancestor for node 'v'.//// Dominance can be tested quickly using a pre-order trick// for depth-first spanning trees. This is why DFS is the first// thing we run below.//// Go comment: Parameters can be written as w,v int, inlike in C, where//   each parameter needs its own type.//func isAncestor(w, v int, last []int) bool {   return ((w <= v) && (v <= last[w]))}// listContainsNode//// Check whether a list contains a specific element. //func listContainsNode(l []*UnionFindNode, u *UnionFindNode) bool {   for _, ll := range l {      if ll == u {         return true      }   }   return false}// DFS - Depth-First-Search and node numbering.//func DFS(currentNode *BasicBlock, nodes []*UnionFindNode, number map[*BasicBlock]int, last []int, current int) int {   nodes[current].Init(currentNode, current)   number[currentNode] = current   lastid := current   for _, target := range currentNode.OutEdges {      if number[target] == unvisited {         lastid = DFS(target, nodes, number, last, lastid+1)      }   }   last[number[currentNode]] = lastid   return lastid}// FindLoops//// Find loops and build loop forest using Havlak's algorithm, which// is derived from Tarjan. Variable names and step numbering has// been chosen to be identical to the nomenclature in Havlak's// paper (which, in turn, is similar to the one used by Tarjan).//func FindLoops(cfgraph *CFG, lsgraph *LSG) {   if cfgraph.Start == nil {      return   }   size := cfgraph.NumNodes()   nonBackPreds := make([]map[int]bool, size)   backPreds := make([][]int, size)   number := make(map[*BasicBlock]int)   header := make([]int, size, size)   types := make([]int, size, size)   last := make([]int, size, size)   nodes := make([]*UnionFindNode, size, size)   for i := 0; i < size; i++ {      nodes[i] = new(UnionFindNode)   }   // Step a:   //   - initialize all nodes as unvisited.   //   - depth-first traversal and numbering.   //   - unreached BB's are marked as dead.   //   for i, bb := range cfgraph.Blocks {      number[bb] = unvisited      nonBackPreds[i] = make(map[int]bool)   }   DFS(cfgraph.Start, nodes, number, last, 0)   // Step b:   //   - iterate over all nodes.   //   //   A backedge comes from a descendant in the DFS tree, and non-backedges   //   from non-descendants (following Tarjan).   //   //   - check incoming edges 'v' and add them to either   //     - the list of backedges (backPreds) or   //     - the list of non-backedges (nonBackPreds)   //   for w := 0; w < size; w++ {      header[w] = 0      types[w] = bbNonHeader      nodeW := nodes[w].bb      if nodeW == nil {         types[w] = bbDead         continue // dead BB      }      if nodeW.NumPred() > 0 {         for _, nodeV := range nodeW.InEdges {            v := number[nodeV]            if v == unvisited {               continue // dead node            }            if isAncestor(w, v, last) {               backPreds[w] = append(backPreds[w], v)            } else {               nonBackPreds[w][v] = true            }         }      }   }   // Start node is root of all other loops.   header[0] = 0   // Step c:   //   // The outer loop, unchanged from Tarjan. It does nothing except   // for those nodes which are the destinations of backedges.   // For a header node w, we chase backward from the sources of the   // backedges adding nodes to the set P, representing the body of   // the loop headed by w.   //   // By running through the nodes in reverse of the DFST preorder,   // we ensure that inner loop headers will be processed before the   // headers for surrounding loops.   //   for w := size - 1; w >= 0; w-- {      // this is 'P' in Havlak's paper      var nodePool []*UnionFindNode      nodeW := nodes[w].bb      if nodeW == nil {         continue // dead BB      }      // Step d:      for _, v := range backPreds[w] {         if v != w {            nodePool = append(nodePool, nodes[v].FindSet())         } else {            types[w] = bbSelf         }      }      // Copy nodePool to workList.      //      workList := append([]*UnionFindNode(nil), nodePool...)      if len(nodePool) != 0 {         types[w] = bbReducible      }      // work the list...      //      for len(workList) > 0 {         x := workList[0]         workList = workList[1:]         // Step e:         //         // Step e represents the main difference from Tarjan's method.         // Chasing upwards from the sources of a node w's backedges. If         // there is a node y' that is not a descendant of w, w is marked         // the header of an irreducible loop, there is another entry         // into this loop that avoids w.         //         // The algorithm has degenerated. Break and         // return in this case.         //         nonBackSize := len(nonBackPreds[x.dfsNumber])         if nonBackSize > maxNonBackPreds {            return         }         for iter := range nonBackPreds[x.dfsNumber] {            y := nodes[iter]            ydash := y.FindSet()            if !isAncestor(w, ydash.dfsNumber, last) {               types[w] = bbIrreducible               nonBackPreds[w][ydash.dfsNumber] = true            } else {               if ydash.dfsNumber != w {                  if !listContainsNode(nodePool, ydash) {                     workList = append(workList, ydash)                     nodePool = append(nodePool, ydash)                  }               }            }         }      }      // Collapse/Unionize nodes in a SCC to a single node      // For every SCC found, create a loop descriptor and link it in.      //      if (len(nodePool) > 0) || (types[w] == bbSelf) {         loop := lsgraph.NewLoop()         loop.SetHeader(nodeW)         if types[w] != bbIrreducible {            loop.IsReducible = true         }         // At this point, one can set attributes to the loop, such as:         //         // the bottom node:         //    iter  = backPreds[w].begin();         //    loop bottom is: nodes[iter].node);         //         // the number of backedges:         //    backPreds[w].size()         //         // whether this loop is reducible:         //    type[w] != BasicBlockClass.bbIrreducible         //         nodes[w].loop = loop         for _, node := range nodePool {            // Add nodes to loop descriptor.            header[node.dfsNumber] = w            node.Union(nodes[w])            // Nested loops are not added, but linked together.            if node.loop != nil {               node.loop.Parent = loop            } else {               loop.AddNode(node.bb)            }         }         lsgraph.AddLoop(loop)      } // nodePool.size   } // Step c}// External entry point.func FindHavlakLoops(cfgraph *CFG, lsgraph *LSG) int {   FindLoops(cfgraph, lsgraph)   return lsgraph.NumLoops()}//======================================================// Scaffold Code//======================================================// Basic representation of loops, a loop has an entry point,// one or more exit edges, a set of basic blocks, and potentially// an outer loop - a "parent" loop.//// Furthermore, it can have any set of properties, e.g.,// it can be an irreducible loop, have control flow, be// a candidate for transformations, and what not.//type SimpleLoop struct {   // No set, use map to bool   basicBlocks map[*BasicBlock]bool   Children    map[*SimpleLoop]bool   Parent      *SimpleLoop   header      *BasicBlock   IsRoot       bool   IsReducible  bool   Counter      int   NestingLevel int   DepthLevel   int}func (loop *SimpleLoop) AddNode(bb *BasicBlock) {   loop.basicBlocks[bb] = true}func (loop *SimpleLoop) AddChildLoop(child *SimpleLoop) {   loop.Children[child] = true}func (loop *SimpleLoop) Dump(indent int) {   for i := 0; i < indent; i++ {      fmt.Printf("  ")   }   // No ? operator ?   fmt.Printf("loop-%d nest: %d depth %d ",      loop.Counter, loop.NestingLevel, loop.DepthLevel)   if !loop.IsReducible {      fmt.Printf("(Irreducible) ")   }   // must have > 0   if len(loop.Children) > 0 {      fmt.Printf("Children: ")      for ll := range loop.Children {         fmt.Printf("loop-%d", ll.Counter)      }   }   if len(loop.basicBlocks) > 0 {      fmt.Printf("(")      for bb := range loop.basicBlocks {         fmt.Printf("BB#%06d ", bb.Name)         if loop.header == bb {            fmt.Printf("*")         }      }      fmt.Printf("\b)")   }   fmt.Printf("\n")}func (loop *SimpleLoop) SetParent(parent *SimpleLoop) {   loop.Parent = parent   loop.Parent.AddChildLoop(loop)}func (loop *SimpleLoop) SetHeader(bb *BasicBlock) {   loop.AddNode(bb)   loop.header = bb}//------------------------------------// Helper (No templates or such)//func max(x, y int) int {   if x > y {      return x   }   return y}// LoopStructureGraph//// Maintain loop structure for a given CFG.//// Two values are maintained for this loop graph, depth, and nesting level.// For example://// loop        nesting level    depth//----------------------------------------// loop-0      2                0//   loop-1    1                1//   loop-3    1                1//     loop-2  0                2//var loopCounter = 0type LSG struct {   root  *SimpleLoop   loops []*SimpleLoop}func NewLSG() *LSG {   lsg := new(LSG)   lsg.root = lsg.NewLoop()   lsg.root.NestingLevel = 0   return lsg}func (lsg *LSG) NewLoop() *SimpleLoop {   loop := new(SimpleLoop)   loop.basicBlocks = make(map[*BasicBlock]bool)   loop.Children = make(map[*SimpleLoop]bool)   loop.Parent = nil   loop.header = nil   loop.Counter = loopCounter   loopCounter++   return loop}func (lsg *LSG) AddLoop(loop *SimpleLoop) {   lsg.loops = append(lsg.loops, loop)}func (lsg *LSG) Dump() {   lsg.dump(lsg.root, 0)}func (lsg *LSG) dump(loop *SimpleLoop, indent int) {   loop.Dump(indent)   for ll := range loop.Children {      lsg.dump(ll, indent+1)   }}func (lsg *LSG) CalculateNestingLevel() {   for _, sl := range lsg.loops {      if sl.IsRoot {         continue      }      if sl.Parent == nil {         sl.SetParent(lsg.root)      }   }   lsg.calculateNestingLevel(lsg.root, 0)}func (lsg *LSG) calculateNestingLevel(loop *SimpleLoop, depth int) {   loop.DepthLevel = depth   for ll := range loop.Children {      lsg.calculateNestingLevel(ll, depth+1)      ll.NestingLevel = max(loop.NestingLevel, ll.NestingLevel+1)   }}func (lsg *LSG) NumLoops() int {   return len(lsg.loops)}func (lsg *LSG) Root() *SimpleLoop {   return lsg.root}//======================================================// Testing Code//======================================================func buildDiamond(cfgraph *CFG, start int) int {   bb0 := start   NewBasicBlockEdge(cfgraph, bb0, bb0+1)   NewBasicBlockEdge(cfgraph, bb0, bb0+2)   NewBasicBlockEdge(cfgraph, bb0+1, bb0+3)   NewBasicBlockEdge(cfgraph, bb0+2, bb0+3)   return bb0 + 3}func buildConnect(cfgraph *CFG, start int, end int) {   NewBasicBlockEdge(cfgraph, start, end)}func buildStraight(cfgraph *CFG, start int, n int) int {   for i := 0; i < n; i++ {      buildConnect(cfgraph, start+i, start+i+1)   }   return start + n}func buildBaseLoop(cfgraph *CFG, from int) int {   header := buildStraight(cfgraph, from, 1)   diamond1 := buildDiamond(cfgraph, header)   d11 := buildStraight(cfgraph, diamond1, 1)   diamond2 := buildDiamond(cfgraph, d11)   footer := buildStraight(cfgraph, diamond2, 1)   buildConnect(cfgraph, diamond2, d11)   buildConnect(cfgraph, diamond1, header)   buildConnect(cfgraph, footer, from)   footer = buildStraight(cfgraph, footer, 1)   return footer}var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to this file")func main() {   flag.Parse()   if *cpuprofile != "" {      f, err := os.Create(*cpuprofile)      if err != nil {         log.Fatal(err)      }      pprof.StartCPUProfile(f)      defer pprof.StopCPUProfile()   }   lsgraph := NewLSG()   cfgraph := NewCFG()   cfgraph.CreateNode(0) // top   cfgraph.CreateNode(1) // bottom   NewBasicBlockEdge(cfgraph, 0, 2)   for dummyloop := 0; dummyloop < 15000; dummyloop++ {      FindHavlakLoops(cfgraph, NewLSG())   }   n := 2   for parlooptrees := 0; parlooptrees < 10; parlooptrees++ {      cfgraph.CreateNode(n + 1)      buildConnect(cfgraph, 2, n+1)      n = n + 1      for i := 0; i < 100; i++ {         top := n         n = buildStraight(cfgraph, n, 1)         for j := 0; j < 25; j++ {            n = buildBaseLoop(cfgraph, n)         }         bottom := buildStraight(cfgraph, n, 1)         buildConnect(cfgraph, n, top)         n = bottom      }      buildConnect(cfgraph, n, 1)   }   FindHavlakLoops(cfgraph, lsgraph)   for i := 0; i < 50; i++ {      FindHavlakLoops(cfgraph, NewLSG())   }   fmt.Printf("# of loops: %d (including 1 artificial root node)\n", lsgraph.NumLoops())   lsgraph.CalculateNestingLevel()}

我們借助macbook系統上的time命令來列印程式運行的時間（內核態、用戶態、總時間）：

編譯后運行程式：

用戶態耗時23.07s，內核態耗時0.4s，總耗時13.7s（用了兩個核，170%），因為程式里面已經先開啟了pprof統計cpu耗時，直接top命令看：

可以看到，pprof資料采集持續了12.99s，采樣時長是19.49s（還是兩核的緣故），

這里要說一下，無論是cpu耗時統計還是記憶體占用統計，都是間隔采樣，cpu耗時時每隔一段時間（大概是10ms）對呼叫堆疊的函式進行記錄，最后分析在所有的記錄次數中，各個函式出現的次數，包括在運行中的次數，和入堆疊次數（說明它呼叫了別的函式），記憶體占用統計是每分配512K記錄一次分配路徑，

耗時最多的是mapaccess1_fast64，這是運行時中的map讀寫時的資料查詢函式，如果編譯程式時沒有禁用行內，看到的會有所不同，其中會顯示FindHavlakLoops函式，并標識為inline，因為FindHavlakLoops里面就呼叫了FindLoops，所以在編譯器會直接把這個函式展開，用FindLoops替換FindHavlakLoops函式，也可以在不禁用行內編譯時，設定pprof的noinlines開關為true，默認為false，即noinlines=true，

這里看到最大的函式并不是業務函式而是系統函式，那沒啥好優化系統函式的（只能是咱用的不對了唄~），就看是哪里呼叫的這個系統函式：

web mapaccess1_fast64

可以看到，呼叫最多的是DFS和FindLoops，那就看看這倆函式里面是怎么使用map的，來看DFS：

可以看到，DFS函式里耗時較長又是map操作的，是242 246 和250三行，對于這里的優化方法，是用list結構代替map結構，因為能用list達到效果的情況下沒必要用map，這個咋一看沒問題，但好像又有點別扭對吧？其實是因為這個程式的本身特點，這里有明顯的一點，就是BasicBlock結構的特殊性，本身的Name屬性就是自身的索引下標，查找某個BasicBlock不需要使用map來記錄，直接通過下標訪問顯然是最低的時間復雜度（關于map和list的時間復雜度就不多說了，map看起來是O1，但其實沒有考慮hash計算本身的程序和解決hash沖突的成本，而list是必然的O1），通過把這部分的map修改為list資料結構，版本H2，再編譯查看耗時情況：

此時耗時降低到了8s，再次查看cpu耗時分布：

可以看到，top1變成了scanobject函式，不再是map讀寫了，看到這個函式，以及下邊的mallocgc，我們要知道，這都是記憶體相關的，前者是記憶體物件掃描的處理函式，是垃圾回收三色標記的一個程序，后者則是直接管理記憶體分配和回收的（注意是同時負責記憶體分配和回收，malloc && gc），這說明目前程式花了很多時間在進行記憶體管理，看比例是8%，那就要分析一下了，是什么東西產生了這么多記憶體變化，合不合理，分析記憶體，就是pprof的memprofile功能了，添加一下相關的記憶體統計代碼，具體怎么添加大家肯定都會，就不貼了（網上多得是），添加完之后，重新編譯生成版本H3，執行H3，生成對應的記憶體監測檔案：

查看記憶體的分配情況，在這里不禁止行內，因為在實際運行時該行內的函式也是會被展開替換掉的：

可以看到，節點創建是第一，FindLoops是第二，因為創建BasicBlock首先很簡單，沒有復雜的程序，其次這是程式中的一個基礎物件結構，若要改結構體，那可能涉及到演算法也得改，這個顯然是不合適的，不僅可能改出bug，還可能收益不高，那我們就順著看第二位的FindLoops，這個就是前邊我們看到的呼叫mapaccess內置函式的另一個業務函式，所以優化的方法跟上面類似，還是優化map的使用，替換為list，這里有一點特殊的是，替換成list之后，map的追加需要改一下，加一個判斷重復的邏輯，所以新加了一個appendUnique方法，再次編譯，版本H4，查看耗時：

這時程式耗時降到了5.6s，再次查看記憶體分配，確認優化效果：

可以看到，FindLoops函式已經不在高位，記憶體消耗降下去了，再看一下此時的cpu耗時分布：

可以看到，FindLoops成為top1，scanobject函式成了第二位，這就對了，因為我們就是要讓cpu更多的去運行業務代碼，把時間花到真正需要的地方，

這時就可以進行下一輪的性能優化了（這就是性能調優，一遍一遍的排查耗時，壓縮不必要的cpu時間，壓榨計算性能），繼續看一下此時FindLoops到底在哪兒化的時間多，是否合理：

從各個陳述句的耗時來看，好像沒啥大問題，沒有很離譜的耗時（360ms那行是因為回圈的緣故），這說明編碼上沒有大問題，按照我們前邊一開始說的程式優化的步驟，就需要往上找了，看能不能優化邏輯來減少不必要的計算，以達到提升性能的目的（即，每一個計算步驟處理的都沒啥大問題了，那能不能少算點兒），這里FindLoops在程式入口，花了不少時間來初始化一系列臨時變數，加起來有180ms，這就意味著每次呼叫FindLoops函式，都要先花180ms的準備作業，這部分臨時變數的多次創建+初始化，可以通過加記憶體快取的方式來減少重復創建和申請，這里涉及到的標準解法其實就是物件池（像Go創建的web服務，在并發量很高的情況下，每一次http請求的決議都需要創建物件+申請記憶體+反序列這一系列的初始動作，這時就可以借助sync.Pool來減少這種重復作業，提升性能），同理，這里也是加入了一個記憶體快取物件cache：

把原本的記憶體申請初始化程序做了替換，在原有快取不夠的情況下，申請新的變數，否者截取原有快取使用，減少記憶體申請：

調整完畢后，編譯新版本H5，再看下耗時：

這時候程式的耗時已經降到4.1s，相比一開始的13.7s，已經提升了兩倍多，看下現在的cpu耗時分布：

可以看到，相比上次的分布，前兩位都是業務代碼函式了，說明進一步提高了業務代碼的耗時占比，降低了無關系統函式的負載，這種直接使用全域變數的方式加cache不是并發安全的，但是因為這里程式的邏輯本身也不是并發的，所以這里沒問題，

到這里，實操的優化程序就走完了，提煉總結一下優化的步驟和使用的主要方法命令有：

1.通過top5查看cpu耗時：確定是字典資料操作占比最大；

2.通過web命令查看是誰主要呼叫了字典操作函式：確定有DFS和FindLoops；

3.使用list 命令查看DFS函式中每行代碼的耗時，找到跟map操作相關的部分，確定優化方法：把map換成list；

4.重新運行，再用top命令查看cpu耗時分布：找到耗時最大的是記憶體物件掃描函式scanobject；

5.加入記憶體剖析代碼，運行查看記憶體分配的大小分布：確定FindLoops占用較大；

6.使用list命令查看FindLoops函式的每行記憶體開銷：確定是map創建占用記憶體較大；

7.同樣使用list資料結構替換map，重新運行，再看記憶體大小分布：確定FindLoops已不在高位，同時再看CPU耗時時，scanobject已經降下去了，目的達到；

8.此時又開始新的一輪排查，繼續查看cpu耗時排行榜，FindLoops耗時居首；

9.繼續使用list方法看FindLoops函式每行的耗時，沒有明顯的問題，那就要換思路，從排查編碼問題轉換為排查邏輯問題，減少計算程序：這里是加快取；

10.加完快取看到性能明顯提升了，說明優化對了，這時又該回圈進入下一輪的排查的優化了，以此往復，，，直到壓榨出我們能達到的最大性能！

以上就是本次程式優化的整體步驟和思路，程序中秉持的思路方法是一貫的，就是不斷的用pprof排查top級的函式耗時和記憶體占用，通過優化資料結構、減少不必要的計算程序、降低記憶體分配和回收的負擔等方式來提升性能，這一點對所有的程式都適用，也是后續可以借鑒的方法論，

兩點感悟

1.優化程式的大前提是你一定要對程式有足夠深入的理解！（或者說我們不能優化程式優化出bug來啊，，，），最后生產H6版本，之所以又對性能提升了一倍，全是建立在作者對程式完全理解的基礎之上的，所以他才可以調整回圈邏輯，復用程式物件，調整呼叫邏輯等等，這一層再往上層思考，就到了程式邏輯，系統架構內置整體的業務架構層面了，這其實就又回到了一開始我們說的程式優化由大到小的總體思路上了，

2.帶GC的語言相比不帶的，反而更考驗記憶體管理的技術，Go語言在開發效率上的提升，是把垃圾回收交給了系統來處理，但別忘了，消耗的依然是我們自己的cpu時間（羊毛，那不得從羊身上來...），所以我們在使用每種結構體，進行每種計算操作時，都要明白其背后涉及的記憶體開銷，要把記憶體變化放到潛意識里去管理，

以上就是本次pprof優化Go程式的實操程序及總結感想，供參考，感謝~

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/540122.html

標籤：其他

上一篇：網易云VIP音樂NCM檔案轉MP3,C語言版本。

下一篇：全自動化資料洞察！資料分布對比可視化！?

Go語言性能剖析利器--pprof實戰

前言

優化思路

優化實踐

兩點感悟