比較具有多列的兩個檔案-有解無憂

需要將 File1 的第一列與 File2 的第一列進行比較。如果匹配，則比較兩個檔案的第二列。如果第二列不匹配，則從 File1 列印該行并將輸出保存在另一個檔案中。

檔案1.txt

80002288    b17
97380002001 b18
97380002220 b17
97380002233 b18
80002333    b17
16501111    b04
16505044    b04
16505042    b04
97316505030  b05
16505043    b04
16505048    b04

檔案2.txt

97366630003 a01
97380002288 b17
97380002001 b17
97380002220 b17
97380002233 b17
97380002333 b17
97316501111 b04
97316505044 b04
97316505042 b04
97316505030 b04
97316505043 b04

期望的輸出

97380002001 b17
97316505030 b04

uj5u.com熱心網友回復：

方法一：沒有任何外部庫

使用以下代碼僅使用 python 獲取輸出

with open('files3.txt', 'w') as files3:
    with open('files1.txt') as files1:
        for line_a in files1.readlines():
            words_a = line_a.split()
            with open('files2.txt') as files2:
                for line_b in files2.readlines():
                    words_b = line_b.split()
                    if words_a[0] == words_b[0] and words_a[1] != words_b[1]:
                        diff_words = ' '.join(words_b)
                        files3.write(diff_words   '\n')
                        print(diff_words)

上述代碼的輸出

97380002001 b17
97380002233 b17
97316505030 b04

方法二：使用 Pandas 庫

您可以使用 python 的pandas庫來實作這一點。所以首先安裝熊貓庫，如：

pip install pandas

然后在python代碼下面運行以創建所需的檔案

import pandas as pd

# you can replace files1.txt and files2.txt with the complete path if files aren't in the same folder
df1 = pd.read_csv("files1.txt", sep=r'\s ', names=['c1', 'c2'])
df2 = pd.read_csv("files2.txt", sep=r'\s ', names=['c1', 'c2'])

df3 = pd.merge(df1, df2, on='c1')
df3 = df3[(df3["c2_x"] != (df3["c2_y"]))]

# use below if you want to save values from file 2
print(df3[['c1', 'c2_y']].to_string(index=False, header=False))
df3[['c1', 'c2_y']].to_csv("files3.txt", sep=' ', index=False, header=False)

# use below if you want to save values from file 1
# print(df3[['c1', 'c2_x']].to_string(index=False, header=False))
# df3[['c1', 'c2_x']].to_csv("Files3.txt", sep=' ', index=False, header=False)

# use below code to save values from both files
# print(df3.to_string(index=False, header=False))
# df3.to_csv("Files3.txt", sep=' ', index=False, header=False)

上述代碼的輸出

97380002001 b17
97380002233 b17
97316505030 b04

uj5u.com熱心網友回復：

這些中的任何一個都可能是您想要的，但您發布的預期輸出與您對需求的任何解釋都不匹配。在每個 Unix 機器上的任何 shell 中使用任何 awk：

要列印 file1 中的行：

$ awk 'NR==FNR{a[$1]=$2; next} ($1 in a) && (a[$1] != $2)' file2 file1
97380002001 b18
97380002233 b18
97316505030  b05

要列印 file2 中的行，只需交換輸入檔案名：

$ awk 'NR==FNR{a[$1]=$2; next} ($1 in a) && (a[$1] != $2)' file1 file2
97380002001 b17
97380002233 b17
97316505030 b04

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/420723.html

標籤：

上一篇：for回圈忽略第一項python

下一篇：如何在VBA中為每個表頭創建一個資料透視欄位？