如何對動作的依賴樹進行排序和分塊，以便在每個步驟中將盡可能多的動作批處理在一起？-有解無憂

賞金將在 5 天后到期。此問題的答案有資格獲得 500聲望賞金。蘭斯正在尋找一個規范的答案：

尋找這種排序/分塊演算法的基本 JavaScript 實作。在我的腦海中無法想象如何構建一個拓撲排序/圖，然后是一個級別圖（進行批處理）。

假設您有一堆用于將記錄創建/插入到一堆不同的資料庫表中的操作。您有一些記錄可以插入而不依賴于任何其他插入的輸出。你有一些需要等待另一件事完成。還有其他人需要等待許多事情完成，這些事情可能在流程中的不同時間完成。

你如何撰寫一個演算法來對依賴樹中的動作進行排序和分塊，以便對插入/資料庫動作進行最佳批處理？通過最佳批處理，我的意思是如果您可以一次將 10 條記錄插入同一個表中，那么就這樣做。任何時候可以批量插入，都應該盡量減少資料庫呼叫/插入的數量。

這是我使用簡單資料結構捕獲所有必需資訊的示例代碼片段以及一系列虛假操作。

...
{ action: 'create', table: 'tb1', set: 'key12', input: {
  p: { type: 'binding', path: ['key10', 'z'] },
  q: { type: 'binding', path: ['key11', 'a'] }
} },
{ action: 'create', table: 'tb4', set: 'key13' },
{ action: 'create', table: 'tb3', set: 'key14' },
{ action: 'create', table: 'tb4', set: 'key15', input: {
  a: { type: 'binding', path: ['key8', 'z'] },
} },
...

請注意，我們的“依賴節點項”有 4 個可能的屬性：

action：在我們的情況下，這始終是“創造”，但將來可能是其他事情。
table：要插入的表名。
set：要添加到動作依賴樹中共享全域“范圍”的變數的名稱，因此其他動作可以將其作為輸入讀取。
input：動作的輸入，在我們的例子中都是“系結”輸入（但也可以是文字值，但這太容易了）。對于系結輸入，它從存盤在依賴樹的共享范圍內的記錄中讀取一些屬性/列值。

鑒于此，應該有可能以某種方式構建一個簡單的演算法，將動作分成可以并行和批處理的子集。例如，我們下面的代碼最終會是這樣的結構（我手動創建了這個輸出，所以雖然我認為我做對了，但可能會有錯誤。哦，請注意，雖然表格是編號的，但這并不意味著順序對他們來說，只是簡單的名字選擇）：

// this is "close to" the desired result, because
// there might be a slightly different sort applied
// to this when implemented, but the chunking pattern
// and grouping of elements should be exactly like This
// (I am pretty sure, I manually did this 
// so there could be small mistakes, but I doubt it)
const closeToDesiredResult = [
  [
    [
      { action: 'create', table: 'tb1', set: 'key1' },
      { action: 'create', table: 'tb1', set: 'key21' },
    ],
    [
      { action: 'create', table: 'tb2', set: 'key2' },
      { action: 'create', table: 'tb2', set: 'key3' },
      { action: 'create', table: 'tb2', set: 'key23' },
    ],
    [
      { action: 'create', table: 'tb4', set: 'key6' },
      { action: 'create', table: 'tb4', set: 'key8' },
      { action: 'create', table: 'tb4', set: 'key13' },
    ],
    [
      { action: 'create', table: 'tb3', set: 'key5' },
      { action: 'create', table: 'tb3', set: 'key7' },
      { action: 'create', table: 'tb3', set: 'key9' },
      { action: 'create', table: 'tb3', set: 'key14' },
      { action: 'create', table: 'tb3', set: 'key24' },
    ],
    [
      { action: 'create', table: 'tb6', set: 'key17' },
    ],
    [
      { action: 'create', table: 'tb5', set: 'key16' },
    ]
  ],
  [
    [
      { action: 'create', table: 'tb1', set: 'key4', input: {
        x: { type: 'binding', path: ['key2', 'baz'] }
      } },
    ],
    [
      { action: 'create', table: 'tb3', set: 'key10', input: {
        y: { type: 'binding', path: ['key6', 'foo'] },
        z: { type: 'binding', path: ['key1', 'bar'] }
      } },
    ],
    [
      { action: 'create', table: 'tb4', set: 'key15', input: {
        a: { type: 'binding', path: ['key8', 'z'] },
      } },
    ]
  ],
  [
    [
      { action: 'create', table: 'tb1', set: 'key12', input: {
        p: { type: 'binding', path: ['key10', 'z'] },
        q: { type: 'binding', path: ['key11', 'a'] }
      } },
    ],
    [
      { action: 'create', table: 'tb4', set: 'key11', input: {
        a: { type: 'binding', path: ['key10', 'z'] },
        b: { type: 'binding', path: ['key1', 'bar'] }
      } },
    ],
    [
      { action: 'create', table: 'tb6', set: 'key18', input: {
        m: { type: 'binding', path: ['key4', 'x'] },
      } },
      { action: 'create', table: 'tb6', set: 'key19', input: {
        m: { type: 'binding', path: ['key4', 'x'] },
        n: { type: 'binding', path: ['key13', 'a'] },
      } },
    ]
  ],
  [
    [
      { action: 'create', table: 'tb2', set: 'key22', input: {
        w: { type: 'binding', path: ['key18', 'm'] },
        x: { type: 'binding', path: ['key17', 'm'] },
      } },
    ],
    [
      { action: 'create', table: 'tb6', set: 'key20', input: {
        m: { type: 'binding', path: ['key18', 'm'] },
        n: { type: 'binding', path: ['key17', 'm'] },
      } },
    ]
  ]
]

注意結果陣列中有 4 個頂級塊。這些是主要步驟。然后在每一步中，所有的東西都是按表分組的，所以它們都可以并行運行，并且在每個表組內，它們都可以批量插入。繁榮。

您將如何實作這一點，我的大腦似乎很難掌握？

const actionTree = generateActionTree()
const chunkedActionTree = chunkDependencyTree(actionTree)

function chunkDependencyTree(list) {
  const independentOnesMapByTableName = {}
  list.forEach(node => {
    // easy case
    if (!node.input) {
      const group = independentOnesMapByTableName[node.table]
        = independentOnesMapByTableName[node.table] ?? []
      group.push(node)
    } else {
      // I am at a loss for words...
    }
  })
}

function generateActionTree() {
  // this would be constructed through a bunch of real-world
  // functions, queuing up all the actions
  // and pointing outputs to inputs.
  return [
    { action: 'create', table: 'tb1', set: 'key1' },
    { action: 'create', table: 'tb2', set: 'key2' },
    { action: 'create', table: 'tb2', set: 'key3' },
    { action: 'create', table: 'tb3', set: 'key5' },
    { action: 'create', table: 'tb4', set: 'key6' },
    { action: 'create', table: 'tb3', set: 'key7' },
    { action: 'create', table: 'tb4', set: 'key8' },
    { action: 'create', table: 'tb3', set: 'key9' },
    { action: 'create', table: 'tb3', set: 'key10', input: {
      y: { type: 'binding', path: ['key6', 'foo'] },
      z: { type: 'binding', path: ['key1', 'bar'] }
    } },
    { action: 'create', table: 'tb1', set: 'key4', input: {
      x: { type: 'binding', path: ['key2', 'baz'] }
    } },
    { action: 'create', table: 'tb4', set: 'key11', input: {
      a: { type: 'binding', path: ['key10', 'z'] },
      b: { type: 'binding', path: ['key1', 'bar'] }
    } },
    { action: 'create', table: 'tb1', set: 'key12', input: {
      p: { type: 'binding', path: ['key10', 'z'] },
      q: { type: 'binding', path: ['key11', 'a'] }
    } },
    { action: 'create', table: 'tb4', set: 'key13' },
    { action: 'create', table: 'tb3', set: 'key14' },
    { action: 'create', table: 'tb4', set: 'key15', input: {
      a: { type: 'binding', path: ['key8', 'z'] },
    } },
    { action: 'create', table: 'tb5', set: 'key16' },
    { action: 'create', table: 'tb6', set: 'key17' },
    { action: 'create', table: 'tb6', set: 'key18', input: {
      m: { type: 'binding', path: ['key4', 'x'] },
    } },
    { action: 'create', table: 'tb6', set: 'key19', input: {
      m: { type: 'binding', path: ['key4', 'x'] },
      n: { type: 'binding', path: ['key13', 'a'] },
    } },
    { action: 'create', table: 'tb6', set: 'key20', input: {
      m: { type: 'binding', path: ['key18', 'm'] },
      n: { type: 'binding', path: ['key17', 'm'] },
    } },
    { action: 'create', table: 'tb1', set: 'key21' },
    { action: 'create', table: 'tb2', set: 'key22', input: {
      w: { type: 'binding', path: ['key18', 'm'] },
      x: { type: 'binding', path: ['key17', 'm'] },
    } },
    { action: 'create', table: 'tb2', set: 'key23' },
    { action: 'create', table: 'tb3', set: 'key24' },
  ]
}

我認為這大致是拓撲排序，但不太確定如何將其應用于這種特定情況。

uj5u.com熱心網友回復：

我們有這個：

const actionTree = [
    { action: 'create', table: 'tb1', set: 'key1' },
    { action: 'create', table: 'tb2', set: 'key2' },
...
    { action: 'create', table: 'tb3', set: 'key24' },
  ];

我們要填寫這個：

batches = [];

假設我們只需要確保所有依賴的輸入集都已經被插入，并且輸入的第二項path: ['key8', 'z']（z在這種情況下）不會影響任何東西，因為集是原子的，我們所要做的就是：

batches = [];

batched = () => batches.reduce((p,a)=>[...p,...a],[]);
unbatched = () => actionTree.filter(b=>batched().indexOf(b)<0);

nextbatchfilter = (a) => (!("input" in a))||(Object.values(a.input).filter(i=>batched().map(a=>a.set).indexOf(i.path[0])<0).length==0);

while (unbatched().length>0)
    batches.push(unbatched().filter(nextbatchfilter));
    if (batches[batches.length-1].length==0) {
        console.log("could not build dependency graph with all items, " unbatched().length.toString() " items remaining")
        break; // avoid infinite loop in case of impossible tree
    }

這里batched()通過 flattening 顯示哪些動作已被批處理batches；反之亦然unbatched()，它顯示哪些動作仍需要批處理。nextbachfilter從那些未批處理的中顯示哪些可以批處理。有了這些，就可以完成我們的作業。

可以修改代碼以減少計算unbatched和時過多的 cpu 返工，方法batched是讓中間狀態物化：

batches = [];

unbatched = Array.from(actionTree);

nextbatchfilter = (a) => (!("input" in a))||(Object.values(a.input).filter(i=>!(batchedsets.has(i.path[0]))).length==0);

batchedsets = new Set();
while (unbatched.length>0) {
    nextbatch = unbatched.filter(nextbatchfilter);
    if (nextbatch.length==0) {
        console.log("could not build dependency graph with all items, " unbatched.length.toString() " items remaining")
        break; // avoid infinite loop in case of impossible tree
    }
    unbatched = unbatched.filter(a=>!nextbatchfilter(a));
    batches.push(nextbatch);
    nextbatch.forEach(a=>batchedsets.add(a.set));
}

在這兩種情況下，batches輸出都不會按表對操作進行分組，為了按表分組查看它，就像在示例中一樣，只需要：

batches.map(b=>Array.from(new Set(b.map(a=>a.table))).map(t=>b.filter(a=>a.table==t)));

可選地，它可以通過已經就位的該組來構建。

編輯：為兩種解決方案添加了無限回圈保護

uj5u.com熱心網友回復：

你的資料結構我不清楚。什么是單字母 ids p,q等？我也不明白桌子的作用。您可以一次寫入在同一個表中插入多個專案，不是嗎？我假設這些 tihngs 在根排序問題中并不重要。

我將該set欄位視為“作業”，并將相應的鍵inputs視為依賴項：必須在它之前完成的作業。

我沒有時間在這里徹底，而且我沒有方便的javascript環境，所以這是在Python中。

讓我們從詳細的資料結構中提取依賴圖。然后尋找“水平”。第一級是所有沒有依賴關系的節點。第二個是在任何先前級別中滿足依賴關系的所有節點，等等。沖洗并重復。

注意與我在評論中的注釋不同，這不是傳統定義的水平圖。

另外，我不會為提高效率而費心使用資料結構。你可以在 O(n log n) 時間內完成。我的代碼是 O(n^2)。

對不起，如果我誤解了你的問題。也很抱歉在這里未經測驗，可能有錯誤的實作。

from collections import defaultdict

def ExtractGraph(cmds):
  """Gets a dependency graph from verbose command input data."""
  graph = defaultdict(set)
  for cmd in cmds:
    node = cmd['set']
    graph[node].update(set())
    inputs = cmd.get('input')
    if inputs:
      for _, val in inputs.items():
        graph[node].add(val['path'][0])
  return graph

def FindSources(graph):
  """Returns the nodes of the given graph having no dependencies."""
  sources = set()
  for node, edges in graph.items():
    if not edges:
      sources.add(node)
  return sources

def GetLevels(dependencies):
  """Returns sequence levels satisfying given dependency graph."""
  sources = FindSources(dependencies)
  level = set(sources)
  done = set(level)
  todos = dependencies.keys() - done
  levels = []
  while level:
    levels.append(level)
    # Next level is jobs that have all dependencies done
    new_level = set()
    # A clever data structure could find the next level in O(k log n)
    # for a level size of k and n jobs. This needs O(n).
    for todo in todos:
      if dependencies[todo].issubset(done):
        new_level.add(todo)
    todos.difference_update(new_level)
    done.update(new_level)
    level = new_level
  return levels

cmds = [
    { 'action' : 'create', 'table' : 'tb1', 'set' : 'key1' },
    { 'action' : 'create', 'table' : 'tb2', 'set' : 'key2' },
    { 'action' : 'create', 'table' : 'tb2', 'set' : 'key3' },
    { 'action' : 'create', 'table' : 'tb3', 'set' : 'key5' },
    { 'action' : 'create', 'table' : 'tb4', 'set' : 'key6' },
    { 'action' : 'create', 'table' : 'tb3', 'set' : 'key7' },
    { 'action' : 'create', 'table' : 'tb4', 'set' : 'key8' },
    { 'action' : 'create', 'table' : 'tb3', 'set' : 'key9' },
    { 'action' : 'create', 'table' : 'tb3', 'set' : 'key10', 'input' : {
      'y' : { 'type' : 'binding', 'path' : ['key6', 'foo'] },
      'z' : { 'type' : 'binding', 'path' : ['key1', 'bar'] }
    } },
    { 'action' : 'create', 'table' : 'tb1', 'set' : 'key4', 'input' : {
      'x' : { 'type' : 'binding', 'path' : ['key2', 'baz'] }
    } },
    { 'action' : 'create', 'table' : 'tb4', 'set' : 'key11', 'input' : {
      'a' : { 'type' : 'binding', 'path' : ['key10', 'z'] },
      'b' : { 'type' : 'binding', 'path' : ['key1', 'bar'] }
    } },
    { 'action' : 'create', 'table' : 'tb1', 'set' : 'key12', 'input' : {
      'p' : { 'type' : 'binding', 'path' : ['key10', 'z'] },
      'q' : { 'type' : 'binding', 'path' : ['key11', 'a'] }
    } },
    { 'action' : 'create', 'table' : 'tb4', 'set' : 'key13' },
    { 'action' : 'create', 'table' : 'tb3', 'set' : 'key14' },
    { 'action' : 'create', 'table' : 'tb4', 'set' : 'key15', 'input' : {
      'a' : { 'type' : 'binding', 'path' : ['key8', 'z'] },
    } },
    { 'action' : 'create', 'table' : 'tb5', 'set' : 'key16' },
    { 'action' : 'create', 'table' : 'tb6', 'set' : 'key17' },
    { 'action' : 'create', 'table' : 'tb6', 'set' : 'key18', 'input' : {
      'm' : { 'type' : 'binding', 'path' : ['key4', 'x'] },
    } },
    { 'action' : 'create', 'table' : 'tb6', 'set' : 'key19', 'input' : {
      'm' : { 'type' : 'binding', 'path' : ['key4', 'x'] },
      'n' : { 'type' : 'binding', 'path' : ['key13', 'a'] },
    } },
    { 'action' : 'create', 'table' : 'tb6', 'set' : 'key20', 'input' : {
      'm' : { 'type' : 'binding', 'path' : ['key18', 'm'] },
      'n' : { 'type' : 'binding', 'path' : ['key17', 'm'] },
    } },
    { 'action' : 'create', 'table' : 'tb1', 'set' : 'key21' },
    { 'action' : 'create', 'table' : 'tb2', 'set' : 'key22', 'input' : {
      'w' : { 'type' : 'binding', 'path' : ['key18', 'm'] },
      'x' : { 'type' : 'binding', 'path' : ['key17', 'm'] },
    } },
    { 'action' : 'create', 'table' : 'tb2', 'set' : 'key23' },
    { 'action' : 'create', 'table' : 'tb3', 'set' : 'key24' },
]
    
dependencies = ExtractGraph(cmds)
levels = GetLevels(dependencies)
print(levels)

運行時，會發現幾個級別：

[
{'key9', 'key3', 'key24', 'key7', 'key17', 'key8', 'key21', 'key1',
     'key5', 'key2', 'key16', 'key6', 'key23', 'key13', 'key14'}, 
{'key15', 'key4', 'key10'}, 
{'key19', 'key18', 'key11'}, 
{'key22', 'key12', 'key20'}
]

對于抽查，讓我們看一下 key12。它有 10 和 11 作為依賴項。key10 有 6 和 1。key11 有 10 和 1。鍵 1 和 6 沒有。我們發現

0 級中的 1 和 6（無依賴關系），
10（需要 1 和 6）在第 1 級，
11（需要 10 和 1）在第 2 級，
12（需要 10 和 11）在第 3 級。

一旦滿足其依賴關系，每項作業就會完成。所以這是令人鼓舞的。然而，更徹底的測驗是必須的。

如果您需要將這些級別進一步分組到每個表的單獨插入中，這是一個后處理步驟。在一個級別中生成的插入可以并行完成。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/417253.html

標籤：

上一篇：矩陣最大對角模式匹配

下一篇：如何在ASP.net中檢查和更改標簽和文本框