如何提高MongoDB查找查詢性能？-有解無憂

我有一個collection1包含這樣的檔案的集合：

{
  _id: 123,
  field1: "test",
  array1: [
    {
      array2: [
        {
          field2: 1,
          object1: {
            field3: "test"
          }
        }
      ]
    }
  ]
}

我正在嘗試從按欄位過濾的集合中獲取所有檔案field1，field2并且field3。我的查詢看起來像：

db.collection1.find(
{
  field1: "test",
  array1: {
    $elemMatch: {
      array2: {
        $elemMatch: {
          field2: {
            $gte: 1
          }, 
          "object1.field3": "test"
        }
      }
    }
  }
})

該集合有約 125,000 個檔案。考慮到查詢必須如何瀏覽兩個嵌套陣列進行過濾，人們會認為這個查詢會很慢。它是，大約需要 30-40 秒。所以，為了提高它的性能，我為所有 3 個欄位創建了一個索引，看起來像db.collection1.createIndex({"array1.array2.object1.field3": 1, "array1.array2.field2": 1, "field1": 1});

使用索引，查詢速度提高了一倍，大約需要 15 秒。但是，這仍然太慢了。我想得到 <5 秒的查詢。關于如何提高速度的任何想法？如果有幫助，我可以為兩個查詢添加查詢計劃器（使用和不使用索引）。

編輯：我嘗試使用索引中欄位的不同排序的所有 6 種可能組合，它們都有相同的結果。所以后來我更加關注查詢計劃器和查詢的執行統計資訊，我注意到了一些事情：

"queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "db.collection1",
        "winningPlan" : {
            "stage" : "FETCH",
            "inputStage" : {
                "stage" : "IXSCAN",
                "indexName" : "fields_index"
            }
        }
    },
    "executionStats" : {
        "executionSuccess" : true,
        "executionTimeMillis" : "15602.784",
        "planningTimeMillis" : "0.248",
        "executionStages" : {
            "stage" : "FETCH",
            "nReturned" : "0",
            "executionTimeMillisEstimate" : "15602.130",
            "inputStages" : [
                {
                    "stage" : "IXSCAN",
                    "nReturned" : "300220",
                    "executionTimeMillisEstimate" : "87.616",
                    "indexName" : "fields_index"
                },
                {
                    "nReturned" : "0",
                    "executionTimeMillisEstimate" : "0.018"
                }
            ]
        }
    },
    "serverInfo" : {
        "host" : "mongo-instance",
        "port" : 27017,
        "version" : "3.6.0"
    },
    "ok" : 1

似乎該FETCH階段是耗時極長的階段，而不是索引掃描。這是為什么？此外，使用我使用的引數，查詢意味著不回傳任何結果。該FETCH階段確實回傳 0 個結果，但索引掃描回傳 300220 個檔案。為什么？

uj5u.com熱心網友回復：

順序不是這里的問題，問題是你沒有完全理解 Mongo 如何索引陣列。

Mongo 的做法是展平陣列并單獨索引每個元素，這意味著看起來像這樣（如下）的元素仍將匹配索引，因此使FETCH舞臺比它需要的大得多。

{
  _id: 123,
  field1: "test",
  array1: [
    {
      array2: [
        {
          field2: 1,
          object1: {
            field3: "no-test"
          }
        },
        {
          field2: 2,
          object1: {
            field3: "test"
          }
        }
      ]
    }
  ]
}

所以，我們能做些什么？

首先讓我們以更自然的方式對索引進行排序，將其test作為復合索引中的欄位欄位。
索引中的完整元素array2，正如我提到的，現在每個鍵都被展平，這使得索引在查詢整個元素時具有冗余。所以代替這個：

"array1.array2.object1.field3": 1, "array1.array2.field2": 1

你應該做：

"array1.array2": 1

這顯然會創建一個更大的索引樹，這可能會影響更新的性能。如果嵌套物件太大，第 2 步可能不適合您，但它會提高您的查詢速度。

uj5u.com熱心網友回復：

這是主題的變體。我創建了一個包含 200,000 個檔案的集合。100,000 已經field1設定為，NOTtest所以他們甚至沒有進行第一次削減。在其他 100,000 個中，每個array1的長度為 2，每個內部array2的長度為 3。這些葉元素中的 20,000 個被設定為object1.field3:"test"并field2:4使其 >1 并滿足兩個條件查詢（OP 有gte1 個，我做到了gt1 更清楚）。因此，在 200,000 個檔案中，只有 5 個檔案可以滿足所需的查詢。在 MacBookPro 上，以下查詢在 2.4 秒內生成 5 個檔案，沒有索引。訣竅是使用$map“潛入”陣列以到達所需的目標陣列，然后使用$filter產生一個填充陣列或一個空陣列。空陣串列示不匹配，在下一階段被過濾掉。

這種方法具有僅回傳具有匹配欄位的子檔案的額外優勢。挑戰$elemMatch在于陣列中子檔案的匹配回傳整個陣列，其中可能包括不匹配的子檔案。這些必須在管道中進一步過濾或在客戶端代碼中進行后處理。

db.foo.aggregate([
    {$match: {field1: "test"}},

    {$project: {
        XX:{$map: {input: "$array1", as:"z1", in:
                {QQ: {$filter: {input: "$$z1.array2",
                                as: "z2",
                                cond: {$and:[
                                    {$eq:["$$z2.object1.field3", "test"]},
                                    {$gt:["$$z2.field2",1]}
                                ]}
                     }}
        }
        }}
    }}

    ,{$match: {$expr: {
        // total of length of QQ array(s) must be > 0                                                                      
        $gt:[ {$reduce: {input: "$XX",
                         initialValue: 0,
                         in: {$add:["$$value",{$size: "$$this.QQ"}]}
               }}, 0]
        }
    }}
]);

隨著材料的大幅減少，您現在可以根據您$unwind的$project需要定制輸出。

$map 可以“鏈接”到任意深度：

var r = [
    {array1: [
        {array2: [
            {array3: [
                {array4: [
                    {f: "X"},
                    {f: "A"},
                    {f: "A"}
                ]}
            ]
            }
        ]}
    ]}
    ,
    {array1: [
        {array2: [
            {array3: [
                {array4: [
                    {f: "X"},
                    {f: "X"}
                ]}
            ]
            }
        ]}
    ]}
]

db.foo2.drop();
db.foo2.insert(r);

c = db.foo2.aggregate([
    {$project: {XX:
      {$map: {input: "$array1", as:"z1", in:
              {$map: {input: "$$z1.array2", as: "z2", in:
                      {$map: {input: "$$z2.array3", as: "z3", in:
                              {QQ: {$filter: {input: "$$z3.array4",
                                              as: "z4",
                                              cond: {$eq:["$$z4.f","A"]}
                                             }}
                              }
                       }}
             }}
         }}
   }}
]);

無可否認，輸出有點重，但這種方法避免$unwind了可能將資料集爆炸幾個數量級的深度倍數。

uj5u.com熱心網友回復：

我發現了這個問題。我在最初的問題中沒有提到的是我正在使用 AWS 的 DocumentDB 服務，該服務具有 MongoDB 兼容性。據此，在“$ne、$nin、$nor、$not、$exists 和 $elemMatch 索引”部分下，它表示 DocumentDB 不支持將索引與$elemMatch. 我的查詢使用索引的原因是因為它用于field1，而不是在$elemMatch. 但是，它對其他兩個不起作用，因此它仍然必須掃描數千個結果并按 andfield2過濾field3。

我修復它的方法是重寫我的查詢。根據MongoDB 檔案，我不需要$elemMatch用于我的查詢。所以我的查詢現在看起來像：

db.collection1.find(
{
  field1: "test",
  "array1.array2.field2": {
    $gte: 1
  }, 
  "array1.array2.object1.field3": "test"
})

查詢在功能上完全相同，但實際上它使用的是索引。現在運行查詢需要不到 1 秒的時間。感謝大家的幫助和很好的建議！

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/413276.html

標籤：

上一篇：function.php中的wordpressscript_loader_tag

下一篇：SQLServerINSERT性能（SQL、AzureSQL資料庫）