我想撰寫將使用以下命令運行的 python 代碼:
python3 myProgram.py 4 A B C D stemfile
其中 4 是檔案數,A、B、C、D 是 4 個檔案。然后我想生成除空的 A、B、C、D 的所有組合。(A, B, C, D, AB, AC, AD, BC, BD, CD, ABC, ABD, ACD, BCD, ABCD)但是在此之前它會讀取stemfile.names并且如果stemfile.names有一行| Final Pseudo Deletion Count is 0.那么它只會生成上面的15個組合,否則它會說noisy data并且不會列印3個檔案的組合并且不考慮D。所以輸出將是:(A, B, C, AB, AC, BC, ABC)
所以在我的代碼中,我所做的是,我總是將 D 作為最后一個檔案引數并少運行該回圈 1 次。但是,D 僅是最后一個引數并不總是正確的。它可以是這樣的:python3 myProgram.py 4 B D C A stemfile
在這種情況下,在我的代碼中,在進行組合時不會考慮 A,但是每當在 中找不到該行時stemfile.names,我只想從等式中洗掉 D 檔案。我該怎么做?
稍后在該代碼中,當組合為 A 時,它會將 A 存盤在單獨的輸出檔案中,每當它是 AB 時,它將 A、B 檔案的并集存盤在單獨的檔案中,等等所有組合。在這里,如果有嘈雜的資料,那么該 D 檔案將不會出現在任何輸出檔案中。
再舉一個例子,如果我給出: python3 myProgram.py 3 A D B stemfile
并且stemfile.names沒有該行,| Final Pseudo Deletion Count is 0.則輸出組合為 :A,B,AB并且它只會創建 2 個輸出檔案。
下面我附上我的代碼:
import sys
import itertools
from itertools import combinations
def union(files):
lines = set()
for file in files:
with open(file) as fin:
lines.update(fin.readlines())
return lines
def main():
number = int(sys.argv[1])
dataset = sys.argv[number 2]
with open(dataset '.names') as myfile:
if '| Final Pseudo Deletion Count is 0.' in myfile.read():
a_list = sys.argv[2:number 2]
print("All possible combinations:\n")
for L in range(1, len(a_list) 1):
for subset in itertools.combinations(a_list, L):
print(*list(subset), sep=',')
print("...............................")
matrix = [itertools.combinations(a_list, r)
for r in range(1, len(a_list) 1)]
combinations = [c for combinations in matrix for c in combinations]
for combination in combinations:
filenames = [f'{name}' for name in combination]
output = f'{"".join(combination)}_output'
print(f'Writing union of {filenames} to {output}')
with open(output, 'w') as fout:
fout.writelines(union(filenames))
else:
a_list = sys.argv[2:number 1]
# Here I am reducing a number only
print("Noisy data.\n")
print("So all possible combinations:\n")
for L in range(1, len(a_list) 1):
for subset in itertools.combinations(a_list, L):
print(*list(subset), sep=',')
print("................................")
matrix = [itertools.combinations(a_list, r)
for r in range(1, len(a_list) 1)]
combinations = [c for combinations in matrix for c in combinations]
for combination in combinations:
filenames = [f'{name}' for name in combination]
output = f'{"".join(combination)}_output'
print(f'Writing union of {filenames} to {output}')
with open(output, 'w') as fout:
fout.writelines(union(filenames))
if __name__ == '__main__':
main()
請幫幫我。
uj5u.com熱心網友回復:
我認為您應該將其分解為更小、更具體的問題。似乎這里有很多細節沒有關注您面臨的具體問題。但是,我對我認為您要問的問題進行了嘗試。
我認為您正在嘗試弄清楚如何從命令列引數中洗掉專案。如果是這種情況,您對傳遞給程式的內容無能為力,但您可以在決議后修改輸入串列。argparse正如我在評論中所述,我真的認為您應該嘗試閱讀有關圖書館的資訊。我不確定這是否正是您要查找的內容,但這里有一些代碼使用argparse它需要每個輸入檔案的完整檔案名。最后一個引數必須是莖檔案。
決議引數后,您將獲得pathlib.Path物件串列。您可以簡單地D從串列中洗掉該檔案。
import argparse
import itertools
import pathlib
NOISY_DATA_LINE = '| Final Pseudo Deletion Count is 0.'
def get_parser():
parser = argparse.ArgumentParser()
parser.add_argument('filenames', type=pathlib.Path, nargs=' ')
parser.add_argument('stemfile', type=pathlib.Path)
return parser
def union(files):
lines = set()
for file in files:
with open(file) as fin:
lines.update(fin.readlines())
return lines
def main():
parser = get_parser()
args = parser.parse_args()
stemfile_lines = args.stemfile.read_text().splitlines()
if stemfile_lines[-1] == NOISY_DATA_LINE:
filenames = [p for p in args.filenames if p.stem != 'D']
else:
filenames = args.filenames
matrix = [itertools.combinations(filenames, r) for r in range(1, len(filenames) 1)]
combinations = [c for combinations in matrix for c in combinations]
print(' '.join([str([p.stem for p in c]) for c in combinations]))
for combination in combinations:
output = f'{"".join([p.stem for p in combination])}_output.txt'
print(f'Writing union of {[p.stem for p in combination]} to {output}')
with open(output, 'w') as fout:
fout.writelines(union(filenames))
if __name__ == '__main__':
main()
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/369298.html
