我有一個檔案名串列
files = (
"myinstruction.txt",
"myinfo.txt",
"mydata.txt",
"myclients.txt",
"foo.txt",
)
以及可能包含檔案的一組目錄(路徑可能導致嵌套結構)
search_paths = (
"C:/Users/Foo/Desktop/thisfolder/",
"F:/Documents/mylibrary/",
"F:/Folder/mylibrary/",
"E:/Otherfolder/foolibrary/",
)
我們可以創建什么最優化的函式來找回檔案的完整路徑?
uj5u.com熱心網友回復:
多執行緒將是理想的選擇。
首先,將檔案名的元組做成一個集合,以便更快地搜索。
然后就這么簡單...
from concurrent.futures import ThreadPoolExecutor
import os
FILES = {
"myinstruction.txt",
"myinfo.txt",
"mydata.txt",
"myclients.txt",
"foo.txt"
}
SEARCH_PATHS = [
"C:/Users/Foo/Desktop/thisfolder/",
"F:/Documents/mylibrary/",
"F:/Folder/mylibrary/",
"E:/Otherfolder/foolibrary/"
]
def process_directory(directory):
output = []
for root, _, files in os.walk(directory):
for file in files:
if file in FILES:
output.append(os.path.join(root, file))
return output
result = []
with ThreadPoolExecutor() as executor:
for rv in executor.map(process_directory, SEARCH_PATHS):
result.extend(rv)
print(result)
這樣,每個目錄都將在單獨的(并發)執行緒中進行檢查。由于 os.walk() 受 I/O 限制,多執行緒是合適的
uj5u.com熱心網友回復:
根據評論,您可以使用os.walk()
除非您使用記憶化,否則我不確定是否有比這更快的方法
示例代碼:
import os
file_list = (
"myinstruction.txt",
"myinfo.txt",
"mydata.txt",
"myclients.txt",
"foo.txt",
)
search_paths = (
"C:/Users/Foo/Desktop/thisfolder/",
"F:/Documents/mylibrary/",
"F:/Folder/mylibrary/",
"E:/Otherfolder/foolibrary/",
)
# search for files inside search_paths using os.walk()
for path in search_paths:
for root, dirs, files in os.walk(path):
for file in files:
if file in file_list:
print(os.path.join(root, file))
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/517671.html
標籤:Python文件搜索目录
上一篇:打開一個xlsx檔案
