優化MongoDB聚合查詢性能-有解無憂

我有下一個資料庫結構：

作業區：

	鑰匙	指數
PK	ID	ID
	內容

專案：

	鑰匙	指數
PK	ID	ID
FK	作業空間	作業空間_1
	已洗掉	洗掉_1
	內容

專案：

	鑰匙	指數
PK	ID	ID
FK	專案	專案_1
	型別	_type_1
	已洗掉	洗掉_1
	內容

我需要計算的一些專案每個型別的每個專案的作業區，如預期的輸出：

[
  { _id: 'projectId1', itemType1Count: 100, itemType2Count: 50, itemType3Count: 200 },
  { _id: 'projectId2', itemType1Count: 40, itemType2Count: 100, itemType3Count: 300 },
  ....
]

經過幾次嘗試和一些除錯后，我創建了一個查詢，它提供了我需要的輸出：

const pipeline = [
    { $match: { workspace: 'workspaceId1' } },
    {
      $lookup: {
        from: 'items',
        let: { id: '$_id' },
        pipeline: [
          {
            $match: {
              $expr: {
                $eq: ['$project', '$$id'],
              },
            },
          },
          // project only fields necessary for later pipelines to not overload
          // memory and to not get `exceeded memory limit for $group` error
          { $project: { _id: 1, type: 1, deleted: 1 } },
        ],
        as: 'items',
      },
    },
    // Use $unwind here to optimize aggregation pipeline, see:
    // https://stackoverflow.com/questions/45724785/aggregate-lookup-total-size-of-documents-in-matching-pipeline-exceeds-maximum-d
    // Without $unwind we may get an `matching pipeline exceeds maximum document size` error.
    // Error appears not in all requests and it's really strange and hard to debug.
    { $unwind: '$items' },
    { $match: { 'items.deleted': { $eq: false } } },
    {
      $group: {
        _id: '$_id',
        items: { $push: '$items' },
      },
    },
    {
      $project: {
        _id: 1,
        // Note: I have only 3 possible item types, so it's OK that it's names hardcoded.
        itemType1Count: {
          $size: {
            $filter: {
              input: '$items',
              cond: { $eq: ['$$this.type', 'type1'] },
            },
          },
        },
        itemType2Count: {
          $size: {
            $filter: {
              input: '$items',
              cond: { $eq: ['$$this.type', 'type2'] },
            },
          },
        },
        itemType3Count: {
          $size: {
            $filter: {
              input: '$items',
              cond: { $eq: ['$$this.type', 'type3'] },
            },
          },
        },
      },
    },
  ]

const counts = await Project.aggregate(pipeline)

查詢按預期作業，但速度很慢...如果我在一個作業區中有大約 1000 個專案，則需要大約8 秒才能完成。任何如何使它更快的想法都值得贊賞。

謝謝。

uj5u.com熱心網友回復：

假設您的索引被正確編入索引，它們包含“正確”的欄位，我們仍然可以對查詢本身進行一些調整。

方法 1：保留現有的集合模式

db.projects.aggregate([
  {
    $match: {
      workspace: "workspaceId1"
    }
  },
  {
    $lookup: {
      from: "items",
      let: {id: "$_id"},
      pipeline: [
        {
          $match: {
            $expr: {
              $and: [
                {$eq: ["$project","$$id"]},
                {$eq: ["$deleted",false]}
              ]
            }
          }
        },
        // project only fields necessary for later pipelines to not overload
        // memory and to not get `exceeded memory limit for $group` error
        {
          $project: {
            _id: 1,
            type: 1,
            deleted: 1
          }
        }
      ],
      as: "items"
    }
  },
  // Use $unwind here to optimize aggregation pipeline, see:
  // https://stackoverflow.com/questions/45724785/aggregate-lookup-total-size-of-documents-in-matching-pipeline-exceeds-maximum-d
  // Without $unwind we may get an `matching pipeline exceeds maximum document size` error.
  // Error appears not in all requests and it's really strange and hard to debug.
  {
    $unwind: "$items"
  },
  {
    $group: {
      _id: "$_id",
      itemType1Count: {
        $sum: {
            "$cond": {
                "if": {$eq: ["$items.type","type1"]},
                "then": 1,
                "else": 0
            }
        }
      },
      itemType2Count: {
        $sum: {
            "$cond": {
                "if": {$eq: ["$items.type","type2"]},
                "then": 1,
                "else": 0
            }
        }
      },
      itemType3Count: {
        $sum: {
            "$cond": {
                "if": {$eq: ["$items.type","type1"]},
                "then": 1,
                "else": 0
            }
        }
      }
    }
  }
])

有2個主要變化：

將items.deleted : false條件移動到子$lookup管道中以查找更少的items檔案
跳過items: { $push: '$items' }。取而代之的是，在做后一個條件和$group階段

這是Mongo 游樂場供您參考。（至少為了新查詢的正確性）

方法二：如果可以修改集合模式。我們可以像這樣反規范化projects.workspace到items集合中：

{
    "_id": "i1",
    "project": "p1",
    "workspace": "workspaceId1",
    "type": "type1",
    "deleted": false
}

這樣，您可以跳過$lookup. 一個簡單$match而$group就足夠了。

db.items.aggregate([
  {
    $match: {
      "deleted": false,
      "workspace": "workspaceId1"
    }
  },
  {
    $group: {
      _id: "$project",
      itemType1Count: {
        $sum: {
          "$cond": {
            "if": {$eq: ["$type","type1"]},
            "then": 1,
            "else": 0
          }
        }
      },
      ...

這是帶有非規范化架構的Mongo 游樂場供您參考。

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/314069.html

標籤：MongoDB 表现 mongodb-查询聚合框架查询优化

上一篇：意想不到的答案

下一篇：流中兩個字串的正則運算式驗證