Python-多重處理多個檔案夾 -有解無憂

作為一個編程的新手，我還在探索多行程和多執行緒的概念。

我寫了一個小腳本，它讀取一個檔案并將檔案復制到多個臨時檔案夾中，并對每個檔案夾做以下操作。

建立一個標簽。
生成一個包
將其推送到Nexus.

有~500個檔案夾&；是按順序處理的。我怎樣才能在這里使用多處理，從而一次并行處理100個檔案夾，或者仍然增加這個數字。此外，是否有可能跟蹤這些行程，即使有一個子行程失敗，也會導致構建失敗。

我閱讀了關于多行程的多篇文章，但我無法理解它:(

)

任何指導將對我有很大幫助，謝謝。

folder1
   --戰爭檔案
   --元資料

2檔案夾
   --戰爭檔案
   --元資料
....
....

檔案夾500--戰爭檔案
   --戰爭檔案
   --元資料

代碼片段

import re, shutil, os
from pathlib import Path

target = "/home/work"/span>
file_path = target   "/file.txt" 

dict = {}。
count = 1

def commands_to_run_on_each_folder(filetype, tmp_folder) 。
    target_folder = tmp_folder '/tmp' str(count)

    os.system(<建立標簽的第一個命令>)
    os.system(<構建包的第2條命令>)
    <多個檔案操作，其中`filetype`是使用和獲得所需的檔案與正確擴展名>。
    <curl命令將其上傳到Nexus>。

#Read the text file and assemble it in a dictionary. 
with open(file_path, 'r) as f:
    lines = f.read().splitlines()
    for i, line in enumerate（lines）:
        match = re.match(r".*.war"/span>, line)
        if match:
            j = i-1 if i > 1 else 0
            for k in range（j, i）:
                dict[match.string] = lines[k] 。
#Iterate the dictionary and copy the folder to the temporary folders.
for key, value in dict.items()。
    os.mkdir(target '/tmp' str（count）)
    shutil.copy(key, target '/tmp' str（count）)
    shutil.copy(value, target '/tmp' str（count）)
    commands_to_run_on_each_folder("war", target)
    count  = 1

作業系統: Ubuntu 18.04 記憶體：22 GB 容器

uj5u.com熱心網友回復：

這不是一個好的多行程目標，但它是一個好的gnu parallel目標。

你的構建是在后臺進行的：python 只是在呼叫系統命令。你當然可以從 python 中并行地進行多個后臺 os.system 呼叫，但這個腳本最好以 find | parallel 的模式運行。

我所要做的是重寫腳本，只處理一個檔案夾。然后我將會做：

find /path/to/root/folder -type d | parallel --bar -I{} python3 script.py {} 。

由于你是在ubuntu上，你已經有find和parallel。請注意，這是bash，在shell中運行，而不是python。

反對在python中這樣做的理由

不要重新輸入

不要重新發明輪子。

易于定制：你可以通過添加--jobs N

來改變行程的數量。

你的代碼只是呼叫其他行程：你使用python就像使用bash這樣的腳本語言（這很好），所以把它當作每個檔案夾的構建腳本更有意義

。

你可以免費得到一個進度條和其他好東西！

。

另一方面，如果您確實想在 python 中完成這一作業，那么是可以的。

注意，目前的智慧建議使用 subprocess 而不是 os.system。

uj5u.com熱心網友回復：

使用concurrent.futures很容易。我把你的腳本修改成了：

#!/usr/bin/env python3。 import itertools import concurrent.futures import logging import pathlib import re import shutil logging.basicConfig( level=logging.DEBUG。 format="%（levelname）s:%（processName）s:%（message）s"。 ) def worker（path1, path2, src, target, logger）。 logger.debug("Create dir %s"/span>, target) target.mkdir(existence_ok=True) logger.debug("復制檔案") shutil.copy(src / path1, target / path1) shutil.copy(src / path2, target / path2) logger.debug("對%s運行的額外命令", target) # TODO: 在這里添加動作。 # commands_to_run_on_each_folder(...) def main() 。 #Read the text file and assemble it in a dictionary. 任務={}。 with open("file.txt"/span>, 'r') as f: lines = f.read().splitlines() for i, line in enumerate（lines）: match = re.match(r".*.war"/span>, line) if match: j = i-1 if i > 1 else 0 for k in range（j, i）: 任務[match.string] = lines[k] logger = logging.getLogger() # src: 這個腳本所在的目錄。 src = pathlib.Path(__file__).parent with concurrent.futures.ProcessPoolExecutor() as executor。 for taskid, (path2, path1) in enumerate（tasks.items(), 1）。 target = pathlib.Path(f"/tmp/dir{taskid}"/span>) # Calls `worker` function with parameters path1, path2, ... # concurrently executor.submit(worker, path1, path2, src, target, logger) if __name__ == "__main__"/span>: main()

下面是一個輸出樣本：

DEBUG:ForkProcess-1: Create dir /tmp/dir1
DEBUG:ForkProcess-1:復制檔案
DEBUG:ForkProcess-1:在/tmp/dir1上運行的額外命令
DEBUG:ForkProcess-1:創建dir /tmp/dir2
DEBUG:ForkProcess-1:復制檔案
DEBUG:ForkProcess-1:在/tmp/dir2上運行的額外命令
DEBUG:ForkProcess-1:創建dir /tmp/dir3
DEBUG:ForkProcess-1:復制檔案
DEBUG:ForkProcess-1:在/tmp/dir3上運行的額外命令
DEBUG:ForkProcess-1:創建dir /tmp/dir4
DEBUG:ForkProcess-1:復制檔案
DEBUG:ForkProcess-1:在/tmp/dir4上運行的額外命令

注釋

我使用logging而不是print，因為logging在多行程環境下作業得更好
要關閉日志，將級別改為logging.WARN
我使用pathlib，因為它比os.path
注意：submit呼叫將不會等待。這意味著如果函式worker需要很長的時間來運行，submit將立即回傳。
使用with結構，執行器將等待所有并發的任務完成后再退出。這就是你想要的。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/332337.html

標籤：

上一篇：將從資料庫收到的資料過濾到控制器中的視圖

下一篇：使用Pip安裝Python模塊時出現錯誤