一個看似簡單的問題,但事實證明有點令人煩惱。我有一個染色體串列(有 23 條染色體 - 染色體 1 到 21,然后是染色體 X 和染色體 Y),如下所示:
['chr11','chr14','chr16','chr13','chr4','chr13','chr2','chr1','chr2','chr3','chr14','chrX',]
我想按以下順序排序:
['chr1', 'chr2','chr2','chr3','chr4','chr11','chr13','chr13', 'chr14','chr14','chr16','chrX']
然而,由于 python 的字典性質,sort它會排序chr1, chr10, chr11, chr12...chr2,等,因為我有染色體 X,按它們的整數值排序似乎也不是一個選項。我是否可能必須指定一個唯一鍵來對串列進行排序?或者是否有某種明顯的解決方案我錯過了
與往常一樣,非常感謝任何幫助!
uj5u.com熱心網友回復:
您可以使用natsorted,畢竟您想要的是自然排序;)
l = ['chr11','chr14','chr16','chr13','chr4','chr13','chr2',
'chr1','chr2','chr3','chr14','chrX','chrY']
from natsort import natsorted
out = natsorted(l)
輸出:
['chr1', 'chr2', 'chr2', 'chr3', 'chr4', 'chr11', 'chr13',
'chr13', 'chr14', 'chr14', 'chr16', 'chrX', 'chrY']
uj5u.com熱心網友回復:
您可以創建自定義密鑰:
key={s:i for i,s in
enumerate([f'chr{x}' for x in list(range(1,22)) ['X','Y']],1)}
>>> key
{'chr1': 1, 'chr2': 2, 'chr3': 3, 'chr4': 4, 'chr5': 5, 'chr6': 6, 'chr7': 7, 'chr8': 8, 'chr9': 9, 'chr10': 10, 'chr11': 11, 'chr12': 12, 'chr13': 13, 'chr14': 14, 'chr15': 15, 'chr16': 16, 'chr17': 17, 'chr18': 18, 'chr19': 19, 'chr20': 20, 'chr21': 21, 'chrX': 22, 'chrY': 23}
然后使用該鍵作為查找sorted:
li = ['chr11','chr14','chr16','chr13','chr4','chr13','chr2',
'chr1','chr2','chr3','chr14','chrX','chrY']
>>> sorted(li, key=lambda s: key[s])
['chr1', 'chr2', 'chr2', 'chr3', 'chr4', 'chr11', 'chr13', 'chr13', 'chr14', 'chr14', 'chr16', 'chrX', 'chrY']
uj5u.com熱心網友回復:
@mozway 已經提到的 natsort 是最快的方法。
這里的解決方案不使用外部庫。
sorted(l, key=lambda x: int(val) if (val:=x[3:]).isnumeric() else ord(val))
它提供相同的輸出。
uj5u.com熱心網友回復:
您可以嘗試在 lambda 函式中分別將 X 和 Y 替換為 22 和 23,然后將 char 值替換為空,然后僅使用字串的 int 部分對串列進行排序
l = ['chr1', 'chr2','chr2','chr3','chr4','chr11','chr13','chr13', 'chr14','chr14','chr16','chrX']
sorted( l, key= lambda x: int(x.replace('X','22').replace('Y','23').replace('chr','')))
# OUTPUT
['chr1', 'chr2', 'chr2', 'chr3', 'chr4', 'chr11', 'chr13', 'chr13', 'chr14', 'chr14', 'chr16','chrX']
uj5u.com熱心網友回復:
您可以像這樣使用這種人工排序:
import re
def atoi(text):
return int(text) if text.isdigit() else text
def natural_keys(text):
'''
alist.sort(key=natural_keys) sorts in human order
http://nedbatchelder.com/blog/200712/human_sorting.html
(See Toothy's implementation in the comments)
'''
return [ atoi(c) for c in re.split(r'(\d )', text) ]
alist=['chr11','chr14','chr16','chr13','chr4','chr13','chr2','chr1','chr2','chr3','chr14','chrX',]
alist.sort(key=natural_keys)
print(alist)
輸出:
['chr1','chr2','chr2','chr3','chr4','chr11','chr13','chr13','chr14','chr14','chr16','chrX' , 'chr']
或者你可以像這樣使用natstor(github) natstor(lib) :
import natsort
list=['chr11','chr14','chr16','chr13','chr4','chr13','chr2',
'chr1','chr2','chr3','chr14','chrX','chrY']
result=natsort.natsorted(list)
print(result)
輸出:
['chr1','chr2','chr2','chr3','chr4','chr11','chr13','chr13','chr14','chr14','chr16','chrX' , 'chr']
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/497492.html
