計算數千個檔案的組合檔案大小-有解無憂

我們有一個軟體包，它通過為一批檔案分配一個作業號來執行任務。批處理中可以包含任意數量的檔案。然后將檔案存盤在類似于以下的目錄結構中：

/asc/array1/.storage/10/10297/10297-Low-res.m4a
...
/asc/array1/.storage/3/3814/3814-preview.jpg

檔案名是自動生成的。中的目錄.storage是檔案號的千分之一。

還有一個資料庫將作業號和檔案號與有問題的客戶相關聯。運行 SQL 查詢，我可以列出作業編號、客戶端和檔案的完整路徑。例子：

213     sample-data     /asc/array1/.storage/10/10297/10297-Low-res.m4a
...
214     client-abc      /asc/array1/.storage/3/3814/3814-preview.jpg

我的任務是計算每個客戶端使用的總存盤量。因此，我撰寫了一個快速而骯臟的 bash 腳本來迭代每一行和du檔案，并將其添加到關聯陣列中。然后我計劃將其回顯或生成一個 CSV 檔案以將其攝取到 PowerBI 或其他一些工具中。這是處理這個問題的最好方法嗎？這是腳本的副本：

#!/bin/sh

declare -A clientArr

# 1 == Job Num
# 2 == Client
# 3 == Path
while read line; do
    client=$(echo "$line" | awk '{ print $2 }')
    path=$(echo "$line" | awk '{ print $3 }')

    if [ -f "$path" ]; then
        size=$(du -s "$path" | awk '{ print $1 }')
        clientArr[$client]=$((${clientArr[$client]} ${size}))
    fi
done < /tmp/pm_report.txt

for key in "${!clientArr[@]}"; do
    echo "$key,${clientArr[$key]}"
done

uj5u.com熱心網友回復：

假設：

你有 GNU coreutils du
檔案名不包含空格

這沒有 shell 回圈，呼叫du一次，并在 pm_report 檔案上迭代兩次。

file=/tmp/pm_report.txt

awk '{printf '%s\0', $3}' "$file" \
| du -s --files0-from=- 2>/dev/null \
| awk '
    NR == FNR {du[$2] = $1; next}
    {client_du[$2]  = du[$3]}
    END {
      OFS = "\t"
      for (client in client_du) print client, client_du[client]
    }
  ' - "$file"

uj5u.com熱心網友回復：

使用檔案foo：

$ cat foo
213     sample-data     foo          # this file
214     client-abc      bar          # some file I had in the dir
215     some            nonexistent  # didn't have this one

和 awk：

$ gawk '                             # using GNU awk
@load "filefuncs"                    # for this default extension
!stat($3,statdata) {                 # "returns zero upon success"
    a[$2] =statdata["size"]          # get the size and update array
}
END {                                # in the end
    for(i in a)                      # iterate all
        print i,a[i]                 # and output
}' foo foo                           # running twice for testing array grouping

輸出：

client-abc 70
sample-data 18

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/382634.html

標籤：linux 猛击杜

上一篇：如何驗證githubSSH密鑰

下一篇：bash腳本中的docker日志不起作用