查找以相同17個字符結尾的檔案組-有解無憂

我正在抓取具有獨特和常見模式的檔案。我正在嘗試在共同點上進行匹配。目前正在嘗試使用 bash。我可以使用 python 或其他任何東西。

file1_02_01_2021_002244.mp4
file2_02_01_2021_002244.mp4
file3_02_01_2021_002244.mp4
# _02_01_2021_002244.mp4 should be the 'match all files that contain this string'

file1_03_01_2021_092200.mp4
file2_03_01_2021_092200.mp4
file3_03_01_2021_092200.mp4
# _03_01_2021_092200.mp4 is the match
...    
file201_01_01_2022_112230.mp4
file202_01_01_2022_112230.mp4
file203_01_01_2022_112230.mp4
# _01_01_2022_112230.mp4 is the match

目標是找到從檔案末尾到第一個 uniq 字符的所有匹配項，然后將它們移動到檔案夾中。可操作的部分將很容易。我只需要匹配方面的幫助。

find -type f $("all that match the same last 17 characters of the file name"); do
    do things
done

這是我的示例目錄：

total 28480
drwxr-xr-x  2 user  user    64B Feb 24 10:49 dir1
drwxr-xr-x  2 user  user    64B Feb 24 10:49 dir2
-rw-r--r--  2 user  user   6.8M Feb 24 08:59 file1_02_01_2021_002244.mp4
-rw-r--r--  2 user  user   468K Feb 24 09:06 file1_03_01_2021_092200.mp4
-rw-r--r--  2 user  user   4.5M Feb 24 08:59 file2_02_01_2021_002244.mp4
-rw-r--r--  2 user  user   665K Feb 24 09:06 file2_03_01_2021_092200.mp4
-rw-r--r--  1 user  user     0B Feb 24 10:49 otherfile1
-rw-r--r--  1 user  user     0B Feb 24 10:49 otherfile2

我已經讓它與標記為正確的答案中的建議一起作業。他們的 python 方法可能會更好地作業（尤其是其中包含空格的檔案名），但我對 python 的熟練程度不足以讓它做我想做的一切。完整的腳本如下：

#!/usr/local/bin/bash
# this is my solution
# create array with patterns
aPATTERN=($(find . -type f -name "*.mp4" | sed 's/^[^_]*//'|sort -u ))

# itterate through all patterns, do things
for each in ${aPATTERN[@]}; do
        # create a temp working directory for files that match the pattern
        vDIR=`gmktemp -d -p $(pwd)`
        # create array of all files found matching the pattern
        aFIND =(`find . -mindepth 1 -maxdepth 1 -type f -iname \*$each`)
        # move all files that match the match to the working temp directory
        for file in ${aFIND[@]}; do
                mv -iv $file $vDIR
        done
        # reset the found files array, get ready for next pattern
        aFIND=()
done

uj5u.com熱心網友回復：

在蟒蛇中：

import os

os.chdir("folder_path")

data = {}
data = [[file[-22:], file] for file in os.listdir()]

output = {}
for pattern, filename in data:
    output.setdefault(pattern, []).append(filename)
print(output)

這將創建一個 dict 將每個檔案與相應的模式相關聯。

輸出：

{
    '_03_01_2021_092200.mp4': ['file1_03_01_2021_092200.mp4', 'file3_03_01_2021_092200.mp4', 'file2_03_01_2021_092200.mp4'], 
    '_01_01_2022_112230.mp4': ['file202_01_01_2022_112230.mp4', 'file201_01_01_2022_112230.mp4', 'file203_01_01_2022_112230.mp4'], 
    '_02_01_2021_002244.mp4': ['file1_02_01_2021_002244.mp4', 'file2_02_01_2021_002244.mp4', 'file3_02_01_2021_002244.mp4']
}

uj5u.com熱心網友回復：

嘗試玩這個

首先得到所有模式排序和uniq

find ./data -type f -name "*.mp4" | sed 's/^[^_]*//'|sort -u

或使用正則運算式

find ./data -type f -regextype sed -regex '.*_[0-9]\{2\}_[0-9]\{2\}_[0-9]\{4\}_[0-9]\{6\}\.mp4$'| sed 's/^[^_]*//'|sort -u

然后通過 while 回圈迭代模式以查找每個模式的檔案

while read pattern
do
   # find and exec
   find ./data -type f -name "*$pattern" -exec mv {} /to/whatever/you/want/ \;
   #or find and xargs
   find ./data -type f -name "*$pattern" | xargs -I {} mv {} /to/whaterver/you/want/
done < <(find ./data -type f -name "*.mp4" | sed 's/^[^_]*//'|sort -u)

uj5u.com熱心網友回復：

有幾種方法可以解決這個問題，包括撰寫 bash 腳本，但如果是我，我會選擇快速簡便的方法。使用 grep 并閱讀：

PATTERN=_02_01_2021_002244.mp4
find . -name '*.mp4' | grep $PATTERN; while read -t 1 A; do echo $A; done

我可能沒有想到更好的方法，但這可以完成作業。

uj5u.com熱心網友回復：

嘗試這個：

#!/bin/bash

while IFS= read -r line
do
    if [[ "$line" == *_ ([0-9])_ ([0-9])_ ([0-9])_ ([0-9])\.mp4 ]]
    then
        echo "MATCH: $line"
    else
        echo "no match: $line"
    fi
done < <(/bin/ls -c1)

請記住，在構建模式時使用的是通配符，而不是正則運算式。

這就是為什么我不使用[0-9]{2}匹配 2 位數字，{}在 globbing 中不這樣做，就像在正則運算式中那樣。

要使用正則運算式，請使用：

#!/bin/bash

while IFS= read -r line
do
    if [[ $(echo "$line" | grep -cE '*_[0-9]{2}_[0-9]{2}_[0-9]{4}_[0-9]{6}\.mp4') -ne 0 ]]
    then
        echo "MATCH: $line"
    else
        echo "no match: $line"
    fi
done < <(/bin/ls -c1)

這是一個更精確的匹配，因為您可以指定在每個子模式中接受多少位數。

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/432227.html

標籤：重击贝壳

上一篇：使用shell腳本在檔案中插入多行

下一篇：在Linux中使用xargs和chmod命令更改目錄和子目錄的權限