我正試圖在兩個csv檔案之間比較特定的值。我使用 csv.DictReader() 函式讀入這兩個 csv 檔案,并且我有一個嵌套的 for 回圈,每個回圈都要經過一個閱讀器。當然,通常情況下,內層for回圈會在外層回圈的每一次迭代中重置并通過其整個回圈,但對我來說不是這樣的。當使用我的除錯器時,我可以看到在外回圈的第二次迭代中,代碼完全跳過了內回圈,好像沒有任何東西需要回圈。這是否是由于在字典閱讀器物件中回圈的一個屬性造成的?如果是的話,我怎樣才能解決這個問題?我在下面附上了我的代碼片段。
with open('csv1.csv'/span>, 'r'/span>) as inFile1:
with open('csv2.csv'/span>, 'r'/span>) as inFile2:
reader1 = csv.DictReader(inFile1)
reader2 = csv.DictReader(inFile2)
for row1 in readers1:
for row2 in readers2:
if row1['key1'] == row2['key2'] 。
[在此進行其他操作] 。
uj5u.com熱心網友回復:
一旦你用盡了一個迭代器,它不會自動重置。
相反,你必須為每個外部迭代提供一個新的內部迭代器。
with open('csv1.csv'/span>, 'r'/span>) as inFile1:
reader1 = csv.DictReader(inFile1)
for row1 in readers1:
with open('csv2.csv'/span>, 'r'/span>) as inFile2:
reader2 = csv.DictReader(inFile2)
for row2 in reader2:
if row1['key1'] == row2['key2'] 。
[在此進行其他操作] 。
或者,如果檔案大小合理,只需在處理前將檔案讀入記憶體即可:
with open('csv1.csv'/span>, 'r'/span>) as inFile1, open('csv2.csv', 'r') as inFile2:
csv1 = list(csv.DictReader(inFile1) )
csv2 = list(csv.DictReader(inFile2))
for dict1 in csv1:
for dict2 in csv2:
if dict1['key1'] == dict2['key2'] 。
[在此進行其他操作] 。
uj5u.com熱心網友回復:
@djones的答案是可行的,但是效率很低,因為它需要O(n x m)的時間復雜性,其中n和m是兩個檔案的行數。
如果你從第一個檔案中建立一個以key1的值為鍵的dict,并在第二個檔案中迭代行來尋找dict中key2的匹配,那么這個問題就可以在線性時間內解決。由于dict查找的平均時間復雜度為O(1),整體的時間復雜度將變成O(n):
with open('csv1.csv'/span>, 'r') as inFile1:
rows1 = {row1['key1']: row1 for row1 in csv.DictReader(inFile1)}.
with open('csv2.csv'/span>, 'r'/span>) as inFile2:
for row2 in csv.DictReader(inFile2)。
key = row2['key2']
if key in rows1:
print(rows1[key], row2)
如果兩個CSV檔案中的兩個表的鍵是多對多的關系,你可以把第一個檔案讀成一個串列的dict,這樣你仍然可以在恒定的時間內從第二個檔案中查找鍵,并以O(n m k)的線性時間復雜度完成整個程序,其中n和m是兩個檔案中記錄的數量,k是匹配數量:
rows1 = {}。
with open('csv1.csv'/span>, 'r'/span>) as inFile1:
for row1 in csv.DictReader(inFile1)。
rows1.setdefault(row1['key1'], []) .append(row1)
with open('csv2.csv'/span>, 'r'/span>) as inFile2:
for row2 in csv.DictReader(inFile2)。
for row1 in rows1.get(row2['key2'], ()) 。
print(row1, row2)
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/310255.html
標籤:
