我有這個檔案(有數千行)。每行包含兩個由空格分隔的數字:
3466 937
3466 5233
3466 8579
3466 10310
3466 15931
3466 17038
3466 18720
3466 19607
10310 1854
10310 3466
10310 4583
10310 5233
10310 9572
10310 10841
10310 13056
10310 14982
10310 16310
并且我必須洗掉以相反順序重復的 python 行,即 10310 3466 和 3466 10310 應僅顯示為一行(10310 3466 或 3466 10310)。有任何想法嗎?謝謝你。
uj5u.com熱心網友回復:
一種方法是使用frozenset生成對順序不敏感的密鑰:
# change data.csv to the name of your file
with open("data.csv") as infile:
uniques = set(frozenset(line.strip().split()) for line in infile)
for value in uniques:
print(*value)
輸出 (對于給定的輸入)
10310 3466
5233 10310
10310 4583
19607 3466
1854 10310
3466 8579
10310 9572
10310 13056
10310 14982
5233 3466
17038 3466
15931 3466
10310 10841
937 3466
18720 3466
16310 10310
替代方法,sorted用于將每一行轉換為相同的鍵:
# change data.csv to the name of your file
with open("data.csv") as infile:
uniques = set(" ".join(sorted(line.strip().split())) for line in infile)
for value in uniques:
print(value)
為了更好地理解使用 的方法frozenset,請參閱下面的代碼:
frozenset((1, 2)) == frozenset((2, 1))
Out[2]: True
可以看出,兩個frozenset等于獨立于用作輸入的元組的順序。這也發生在常規集上,但從檔案中,frozensets 是可散列的:
frozenset 型別是不可變和可散列的——它的內容在創建后不能改變;因此,它可以用作字典鍵或另一個集合的元素。
筆記
要將重復資料洗掉的行寫入新檔案,請執行以下操作:
# change data.csv to the name of your file
with open("data.csv") as infile:
uniques = set(frozenset(line.strip().split()) for line in infile)
# change output.csv to the name of your output file
with open("output.csv", mode="w") as outfile:
for value in uniques:
outfile.write(f'{" ".join(value)}\n')
uj5u.com熱心網友回復:
似乎數字的順序并不重要,所以你可以這樣做:
filename='data.txt'
list=[]
with open(filename) as file:
lines = file.readlines()
for line in lines:
nums=line.split(' ')
nums = ' '.join(nums).split()
a,b=int(nums[0]),int(nums[1])
min=a
max=b
if b<a:
min=b
max=a
list.append(str(min) ' ' str(max))
uniqueSet=set(list)
with open("output.txt", mode="w") as outfile:
for l in uniqueSet:
outfile.write(l '\n')
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/324512.html
