如何在Python中將第一個文本檔案的每一行與第二個文本檔案的每一行進行比較？-有解無憂

我有 2 個名為 f1 和 f2 的文本檔案，每個檔案有 100k 行名稱。我想將 f1 的第一行與 f2 的每一行進行比較，然后將 f1 的第二行與 f2 的每一行進行比較，依此類推。我已經嘗試過使用嵌套 for 回圈，如下面的代碼，但它不起作用。

我做錯了什么我似乎找不到？請問有人可以告訴我嗎？

提前致謝。

舊的.txt

sourcreameggnest
saturnnixgreentea
saxophonedesertham
footballplumvirgo
soybeansthesting
cauliflowertornado
sourcreameggnest
saturnnixgreentea

新的.txt

goldfishpebbleduck
saxophonedesertham
footballplumvirgo
abloomtheavengers
venisonflowersea
goodfellaswalker
saturnnixgreentea

代碼：

 with open('old.txt', 'r') as f1, open('new.txt', 'r') as f2:
    
    for line1 in f1:
        print('Line 1:- '   line1, end='')
        
        for line2 in f2:
            print('Line 2:- '   line2, end='')
            
            if line1.strip() == line2:
                print("Inside comparison"   line1, end='')

輸出：

Line 1:- goldfishpebbleduck
Line 2:- sourcreameggnest
Line 2:- saturnnixgreentea
Line 2:- saxophonedesertham
Line 2:- footballplumvirgo
Line 2:- soybeansthesting
Line 2:- cauliflowertornado
Line 2:- sourcreameggnest
Line 2:- saturnnixgreentea
Line 1:- saxophonedesertham
Line 1:- footballplumvirgo
Line 1:- abloomtheavengers
Line 1:- venisonflowersea
Line 1:- goodfellaswalker
Line 1:- saturnnixgreentea

uj5u.com熱心網友回復：

結合@LukasNeugebauer 和@Thierry Lathuille 的答案，您的代碼應如下所示：

with open('old.txt', 'r') as f1, open('new.txt', 'r') as f2:
    lines1 = f1.readlines()
    lines2 = f2.readlines()
    for line1 in lines1:
        print('Line 1:- '   line1, end='')
        if line1 in lines2:
            print("Inside comparison"   line1, end='')

如果您想知道，使用incheck 是否比遍歷第二個串列并將每個值與進行比較是否更快==，我對其進行了測驗。對于包含 10,000 行隨機字串的兩個檔案，使用兩個回圈完全處理它們需要大約 2.8 秒，而使用in運算子只需要大約 0.8 秒。

如果您的檔案不大于 1 兆位元組，我真的不會費心優化它，但否則您應該真正考慮您實際比較的內容以及您可以使用哪些快捷方式。

編輯：一些評論建議制作第二行串列 a set，（將第 3 行更改為lines2 = set(f2.readlines())）它會使代碼更快（我上面使用的相同示例現在只運行 4 毫秒，快 200 倍），但它可能實際上并沒有解決問題，因為轉換list為 aset將洗掉所有重復項，因此只有在確定可以丟棄重復項時才使用它。

uj5u.com熱心網友回復：

在第一個外部回圈之后，您已經讀到了檔案的末尾。順便說一句，我不知道你可以回圈打開一個檔案。只需先存盤行。此外，我不明白你為什么只從其中一行中洗掉 '\n'。

 with open('old.txt', 'r') as f1, open('new.txt', 'r') as f2:
    lines1 = f1.readlines()
    lines2 = f2.readlines()
    for line1 in lines1:
        print('Line 1:- '   line1, end='')
        
        for line2 in lines2:
            print('Line 2:- '   line2, end='')
            
            if line1 == line2:
                print("Inside comparison"   line1, end='')

uj5u.com熱心網友回復：

O(n^2)考慮到檔案中的行數，我將完全避免使用嵌套回圈（否則設定。

然后我會遍歷第一個檔案中的行并檢查它們是否在字典中并采取相應的行動。這將使用與第二個檔案中的行數線性相關的一些額外空間，但會降低時間復雜度，O(n)因為字典查找是恒定的。

至于您當前的解決方案的不正確性，正如@Thierry Lathuille 所指出的那樣，第二個迭代器在第一次運行外回圈后已用盡，因此不會檢查剩余的迭代。緩解方法是將檔案的行讀入一個串列，您可以在其中重復回圈 ( lines1 = f1.readlines(); lines2 = f2.readlines())。此外，strip如果您打算避免使用空白行，則使用 of 是不正確的。它們仍將作為空字串進行比較，增加的缺點是剝離一條線而不是另一條線會產生不必要的差異。

無論如何，對于如此大的數字，二次時間復雜度的方法是不可行的。

uj5u.com熱心網友回復：

 with open('old.txt', 'r') as f1, open('new.txt', 'r') as f2:
    lines2 = f2.readlines()
    l2 = dict()
    
    # fill dictionary for line2 with each name and the lines it occurs
    for i in range(len(lines2)):
        l2[lines[i]]  = [i]
    for line in f1.readlines():
        if line in l2:
            for j in l2[line]:
                print(line, j, ...)

這兩個回圈中的每一個都應該具有 O(n) 的復雜度，并且通過 in 查找應該是 O(n)。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/441659.html

標籤：Python python-3.x python-2.7

上一篇：python2.7.5為什么這需要一個整數

下一篇：我需要使所用的水加侖數為0.8而不是99999999.2？這是輸入c作為代碼的時候