在Python中從文本檔案中洗掉重復的反向字串-有解無憂

我有這個檔案（有數千行）。每行包含兩個由空格分隔的數字：

3466    937
3466    5233
3466    8579
3466    10310
3466    15931
3466    17038
3466    18720
3466    19607
10310   1854
10310   3466
10310   4583
10310   5233
10310   9572
10310   10841
10310   13056
10310   14982
10310   16310

并且我必須洗掉以相反順序重復的 python 行，即 10310 3466 和 3466 10310 應僅顯示為一行（10310 3466 或 3466 10310）。有任何想法嗎？謝謝你。

uj5u.com熱心網友回復：

一種方法是使用frozenset生成對順序不敏感的密鑰：

# change data.csv to the name of your file
with open("data.csv") as infile:
    uniques = set(frozenset(line.strip().split()) for line in infile)
    for value in uniques:
        print(*value)

輸出 （對于給定的輸入）

10310 3466
5233 10310
10310 4583
19607 3466
1854 10310
3466 8579
10310 9572
10310 13056
10310 14982
5233 3466
17038 3466
15931 3466
10310 10841
937 3466
18720 3466
16310 10310

替代方法，sorted用于將每一行轉換為相同的鍵：

# change data.csv to the name of your file
with open("data.csv") as infile:
    uniques = set(" ".join(sorted(line.strip().split())) for line in infile)
    for value in uniques:
        print(value)

為了更好地理解使用的方法frozenset，請參閱下面的代碼：

frozenset((1, 2)) == frozenset((2, 1))
Out[2]: True

可以看出，兩個frozenset等于獨立于用作輸入的元組的順序。這也發生在常規集上，但從檔案中，frozensets 是可散列的：

frozenset 型別是不可變和可散列的——它的內容在創建后不能改變；因此，它可以用作字典鍵或另一個集合的元素。

筆記

要將重復資料洗掉的行寫入新檔案，請執行以下操作：

# change data.csv to the name of your file
with open("data.csv") as infile:
    uniques = set(frozenset(line.strip().split()) for line in infile)

    # change output.csv to the name of your output file
    with open("output.csv", mode="w") as outfile:
        for value in uniques:
            outfile.write(f'{" ".join(value)}\n')

uj5u.com熱心網友回復：

似乎數字的順序并不重要，所以你可以這樣做：

filename='data.txt'

list=[]

with open(filename) as file:
    lines = file.readlines()
    for line in lines:
        nums=line.split(' ')
        nums = ' '.join(nums).split()
        a,b=int(nums[0]),int(nums[1])
        min=a
        max=b
        if b<a:
            min=b
            max=a
        list.append(str(min) ' ' str(max))

uniqueSet=set(list)
with open("output.txt", mode="w") as outfile:
    for l in uniqueSet:
        outfile.write(l '\n')

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/324512.html

標籤：Python 细绳文本重复逆转

上一篇：正則運算式在字符上拆分字串，內部字串除外

下一篇：Bootstrap5選項卡中的Flex內容在更改選項卡后沒有隱藏？