Linux計算某個域每個日期發送的郵件數量-有解無憂

我有一些使用 postfix 代理的 Linux SMTP 服務器的日志。我想對日志執行操作，這樣我就可以知道某個域每個日期發送了多少封郵件，而無需撰寫腳本。

例如我的mail.log檔案有以下內容：

Jan  1 14:05:31 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 78B06EC0073)
Jan  1 15:05:00 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 874BE4587C4)
Jan  1 15:05:00 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 98C484E1571)
Jan  2 10:08:15 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 4456D154E12)
Jan  2 15:07:00 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 4F54515C154)
Jan  2 14:59:11 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 9856C984E16)
Feb  1 13:14:35 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as EC1874415E8)

我想要的輸出是：

- 首先是發送郵件的域/地址

- 特定域每個日期發送的郵件數量（例如 1 月 1 日發送的 2 封郵件）

所以這里的輸出應該是：

http://mail.example.org[127.0.0.1]:25
Jan 1 2
Jan 2 1
Feb 1 1

http://mail.example2.org[127.0.0.1]:25
Jan 1 1
Jan 2 2

現在我知道我有 2 個命令可以單獨執行這些操作，但我真的不知道如何將它們組合在一起：

1.統計某個域總共發送了多少封郵件：

[user@linux ~] grep -h "status=sent" mail.log | cut -d' ' -f9 | awk '{c[$0]  = 1} END {for(i in c){printf "%6s M\n", i, c[i]}}' | sort -M

relay=http://mail.example2.org[127.0.0.1]:25,    3
relay=http://mail.example.org[127.0.0.1]:25,    4

2.統計每天發送多少封郵件

[user@linux ~]$ grep -h "status=sent" mail.log | cut -c-6 | awk '{c[$0]  = 1} END {for(i in c){printf "%6s M\n", i, c[i]}}' | sort -k2

Feb  1    1
Jan  1    3
Jan  2    3

有誰知道一個可以幫助我完成這個特定操作的好命令？任何幫助將不勝感激，謝謝！

uj5u.com熱心網友回復：

使用您顯示的示例，請嘗試以下awk代碼。用 GNU 撰寫和測驗的awk應該適用于任何版本。

awk '
{
  gsub(/^relay=|,$/,"",$8)
}
{
  arr1[$1 OFS $2 OFS $8]  
}
END{
  for(i in arr1){
    split(i,arr2)
    arr3[arr2[3]]=(arr3[arr2[3]]?arr3[arr2[3]] ORS:"") (arr2[1] OFS arr2[2] OFS arr2[4] OFS arr1[i])
  }
  for(i in arr3){
    print i ORS arr3[i]
  }
}
'  Input_file

解釋：在第一個全域替換起始繼電器的主程式中awk= AND,在第 7 欄位中以 NULL 結尾。然后創建一個名為的陣列arr1，其索引為$1 OFS $2 OFS $8，并在此處使用相同的索引增加其計數 1，對 Input_file 的所有行執行此操作。然后在END塊中awk，遍歷 arr1 所有元素并將其索引拆分i為 arr2。然后使用 arr2 的第 3 個元素的索引創建新陣列 arr3，該元素是 Input_file 中的 http 值。并將值分配給arr2[1] OFS arr2[2] OFS arr2[4] OFS arr1[i]. 在所有回圈中創建一次 arr3，然后通過 for 回圈遍歷其所有專案并列印其索引，然后是 ORS（新行），然后是 arr3 的值（負責列印所需的輸出）。

uj5u.com熱心網友回復：

假設：

一行最多可以有一個字串實體relay=
relay=可能并不總是出現在同一個以空格分隔的欄位中
給定域/地址的輸出應該按日歷順序（在這種情況下也應該是從中讀取日期的順序mail.log）

添加幾行不包括relay=：

$ cat mail.log
Jan  1 14:05:31 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 78B06EC0073)
Jan  1 14:17:27 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=rejected (250 2.0.0 Ok: queued as 78B06EC0073)
Jan  1 15:05:00 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 874BE4587C4)
Jan  1 15:05:00 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 98C484E1571)
Jan  2 10:08:15 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 4456D154E12)
Jan  2 12:13:31 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=rejected (250 2.0.0 Ok: queued as 78B06EC0073)
Jan  2 15:07:00 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 4F54515C154)
Jan  2 14:59:11 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example2.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 9856C984E16)
Feb  1 13:14:35 mail postfix/smtp[31349]: E6EC84105D: to=<[email protected]>, relay=http://mail.example.org[127.0.0.1]:25, delay=1.7, delays=0.22/0.05/0.36/1.1, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as EC1874415E8)

GNU awk使用（用于陣列陣列）的一個想法：

awk '
BEGIN         { regex = "\\<relay=[^, ]" }

/status=sent/ { date=$1 FS $2
                addr=""

                for (i=3;i<=NF;i  ) {                     # loop through fields looking for "relay="
                    if ($i ~ regex) {                     # and if found then parse out the domain/address
                       split($i,arr,"=")
                       addr=arr[2]
                       gsub(",","",addr)
                       continue
                    }
                }

                if (addr != "") {                         # if we found an address then increment our counter
                   counts[addr][date]  

                   if (date != prevdate) {                # and keep track of the order in which dates have been processed
                      dates[  dtorder]=date
                      prevdate=date
                   }
                }
              }

END           { for (addr in counts) {
                    print addr

                    for (i=1;i<=dtorder;i  )              # loop through dates[] in the same order in which they were processed
                        if (dates[i] in counts[addr])
                           print dates[i],counts[addr][dates[i]]
                }
              }
' mail.log

筆記：

for (addr in counts)不保證以任何特定順序處理陣列條目
dates[ dtorder]=date用于跟蹤處理日期的順序；然后在END{...}處理中使用它以確保我們以相同的順序輸出日期；這假設日期mail.log按日歷順序顯示，這反過來又消除了弄清楚如何按日歷順序排序、、、等Jan的Feb需要Mar

這會產生：

http://mail.example2.org[127.0.0.1]:25
Jan 1 1
Jan 2 2
http://mail.example.org[127.0.0.1]:25
Jan 1 2
Jan 2 1
Feb 1 1

uj5u.com熱心網友回復：

不是您想要的確切輸出，但非常簡單（使用 GNU 和 BSD和awk測驗）：sortuniq

$ awk -F'=|,?[[:space:]] ' '{print $10,$1,$2}' mail.log | sort | uniq -c
      1 http://mail.example.org[127.0.0.1]:25 Feb 1
      2 http://mail.example.org[127.0.0.1]:25 Jan 1
      1 http://mail.example.org[127.0.0.1]:25 Jan 2
      1 http://mail.example2.org[127.0.0.1]:25 Jan 1
      2 http://mail.example2.org[127.0.0.1]:25 Jan 2

awk欄位分隔符由選項設定為-F'=|,?[[:space:]] '符號=或可選逗號，后跟至少一個空格（或制表符，換頁符...）據此，您感興趣的欄位是數字 10（原點），1（月）和 2（日）。sort | uniq -c排序并列印結果，每個唯一輸入一行，前面是計數。

但是月份的排序是按字母順序排列的。如果您希望輸出首先按來源排序，然后按增加日期排序，我們可以添加sort選項：

$ awk -F'=|,?[[:space:]] ' '{print $10,$1,$2}' mail.log | sort -k1,1 -k2,2M -k3,3 |
  uniq -c
      2 http://mail.example.org[127.0.0.1]:25 Jan 1
      1 http://mail.example.org[127.0.0.1]:25 Jan 2
      1 http://mail.example.org[127.0.0.1]:25 Feb 1
      1 http://mail.example2.org[127.0.0.1]:25 Jan 1
      2 http://mail.example2.org[127.0.0.1]:25 Jan 2

-k2,2M按日期而不是按字母順序對第二個鍵的月份名稱進行排序。最后，如果您想要顯示的確切輸出，我們可以awk為最終格式添加最后一個腳本：

$ awk -F'=|,?[[:space:]] ' '{print $10,$1,$2}' mail.log | sort -k1,1 -k2,2M -k3,3 |
  uniq -c | awk '$2!=p {p=$2; print (NR!=1) ? "\n" p : p} {print $3,$4,$1}'
http://mail.example.org[127.0.0.1]:25
Jan 1 2
Jan 2 1
Feb 1 1

http://mail.example2.org[127.0.0.1]:25
Jan 1 1
Jan 2 2

每次原點更改 ( $2!=p) 時，最后一個awk腳本將新原點存盤在變數p中以供以后比較，列印一個換行符（第一行除外，因此是(NR!=1) ? "\n" p : p），然后列印新原點。對于每一行，它還列印月份 ( $3)、日期 ( $4) 和計數 ( $1)。

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/439530.html

標籤：贝壳 awk 命令行 grep smtp

上一篇：Python計算每一行的MSE

下一篇：使用命令合并csv檔案中具有相同值和每100行的行