按日期過濾檔案，日期可能位于特定列中的任何位置-有解無憂

假設我有一個包含兩列的檔案：

blahblah2020-02-03_moreblah | VALUE |
blah2021-03-04blah | VALUE |

使用 awk 我只需要選擇第一列中的日期小于我擁有的其他日期的那些行。令人討厭的是日期可能在任何一側的任何奇怪的字串中，或??者根本沒有 - 但它的格式為 YYYY-mm-dd。我不確定我是如何最終陷入必須使用 awk 的情況的，但我在這里，我非常感謝！

uj5u.com熱心網友回復：

假設：

日期將始終采用格式YYYY-MM-DD（在 OP 的描述中確認）
任何感興趣的日期將僅位于第一個|分隔的欄位中
第一個欄位最多只包含一個日期字串（即，不必擔心第一個欄位包含多個日期字串）

使用GNU awk 4.0（或更新）FPAT支持：

awk -v testdt="${dt}" '                                        # pass bash variable "dt" in as awk variable "testdt"
BEGIN { FPAT="[12][0-9]{3}-[012][0-9]-[0123][0-9]"             # define pattern we are looking for; if exists it should be field #1
#       FPAT="[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}"    # one of a few alternatives
      }

$1 < testdt                                                    # if we have a match for FPAT and less than testdt then echo entire line to stdout
' input.dat

注意：如果輸入可能包含####-##-##無效日期格式的資料，則 OP 可能需要調整FPAT定義和/或添加更多邏輯以在運行測驗之前將匹配驗證為實際日期 ( $1 < testdt)

此處使用 OP 的 2 行樣本輸入是對 ( bash) 變數使用不同值的一些結果dt：

$ dt='2019-06-01'
$ awk -v testdt="${dt}" 'BEGIN {FPAT="[12][0-9]{3}-[012][0-9]-[0123][0-9]"} $1 < testdt' input.dat
       -- no output --

$ dt='2020-06-01'
$ awk -v testdt="${dt}" 'BEGIN {FPAT="[12][0-9]{3}-[012][0-9]-[0123][0-9]"} $1 < testdt' input.dat
blahblah2020-02-03_moreblah | VALUE |

$ dt='2021-06-01'
$ awk -v testdt="${dt}" 'BEGIN {FPAT="[12][0-9]{3}-[012][0-9]-[0123][0-9]"} $1 < testdt' input.dat
blahblah2020-02-03_moreblah | VALUE |
blah2021-03-04blah | VALUE |

uj5u.com熱心網友回復：

在任何 shell 中，在每個 Unix 機器上使用任何 awk：

$ awk -v tgt='2020-05-01' 'match($0,/[0-9]{4}(-[0-9]{2}){2}/) && (substr($0,RSTART,RLENGTH) < tgt)' file
blahblah2020-02-03_moreblah | VALUE |

$ awk -v tgt='2021-05-01' 'match($0,/[0-9]{4}(-[0-9]{2}){2}/) && (substr($0,RSTART,RLENGTH) < tgt)' file
blahblah2020-02-03_moreblah | VALUE |
blah2021-03-04blah | VALUE |

uj5u.com熱心網友回復：

\d\d\d\d-\d\d-\d\d https://regexone.com/ 它可以作業，但如果你希望它少于通常在 pyhton 中使用此正則運算式語法撰寫腳本，則有更好的解決方案收集所有日期，然后根據之前的位置進行過濾 - 無論是大于還是小于您擁有的日期。對于 i 在范圍日期：如果 date[i] < regex

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/391010.html

標籤：正则表达式 awk

上一篇：BindingExpression路徑錯誤：'SystemParameters'

下一篇：替換出現在兩個特定單詞之間的一組字串的所有出現