我有一個檔案說temp.rule,其中說每行看起來像的m行和列。假設我的檔案如下所示:natt1,att2,att3,...attN,class,fitness
A,B,C,1,0.67
D,E,F,1,0.84
P,Q,R,2,0.77
S,T,U,2,0.51
G,H,I,1,0.45
J,K,L,1,0.82
M,N,O,2,0.28
V,W,X,2,0.41
Y,Z,A,2,0.51
其中第一行,A,B,C 是屬性,1 是類,0.67 是適應度。現在我想根據每個班級的適應度對行進行排序,并想分配排名。因此,在此之后,我的檔案將如下所示:
P,Q,R,2,0.77,5
S,T,U,2,0.51,3.5
Y,Z,A,2,0.51,3.5
V,W,X,2,0.41,2
M,N,O,2,0.28,1
D,E,F,1,0.84,4
J,K,L,1,0.82,3
A,B,C,1,0.67,2
G,H,I,1,0.45,1
在第 2 類中,因為有 5 行,所以它們根據適應度排序,排名從 1 到 5 分配,對于第 1 類也是如此,即因為有 4 行,所以它們根據適應度排序,排名從 1 到4. 我已經完成了排序部分,但無法像這樣分配排名。我還創建了字典來記錄有多少類 1 和類 2 等等。3.5 之所以存在,是因為如果出現平局,我想取連續排名的平均值。
下面我試一試:
rule_file_name = 'temp.rule'
rule_fp = open(rule_file_name)
rule_fit_val = []
for line in rule_fp.readlines():
rule_fit_val.append(line.replace("\n","").split(","))
def convert_fitness_to_float(lst):
return lst[:-1] [float(lst[-1])]
rule_fit_val =[convert_fitness_to_float(i) for i in rule_fit_val]
rule_fit_val = sorted(rule_fit_val, key=lambda x: x[-2:], reverse=True)
item_list = []
for i in rule_fit_val:
i = list(map(str, i))
s = ','.join(i).replace("\n","")
item_list.append(s)
print(*item_list,sep='\n')
with open("check_sorted_fitness.rule", "w") as outfile:
outfile.write("\n".join(item_list))
list1=[]
for i in rule_fit_val:
list1.append(i[-2])
freq = {}
for items in list1:
freq[items] = list1.count(items)
my_dict_new = {k:v for k,v in freq.items()}
print(my_dict_new)
請幫我說一下如何分配這樣的排名。
uj5u.com熱心網友回復:
考慮使用 pandas 模塊,然后你可以得到這樣的東西:
import pandas as pd
df = pd.read_csv('temp.rule', names=['att1','att2','att3','class','fitness'])
#-----------------^^^^^^^^^ your file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ column headers
>>> df
'''
att1 att2 att3 class fitness
0 A B C 1 0.67
1 D E F 1 0.84
2 P Q R 2 0.77
3 S T U 2 0.51
4 G H I 1 0.45
5 J K L 1 0.82
6 M N O 2 0.28
7 V W X 2 0.41
8 Y Z A 2 0.51
'''
out = (df.assign(rank=df.groupby('class')['fitness'].
transform(lambda x: x.rank())).
sort_values(['class','fitness'], ascending=False))
>>> out
'''
att1 att2 att3 class fitness rank
2 P Q R 2 0.77 5.0
3 S T U 2 0.51 3.5
8 Y Z A 2 0.51 3.5
7 V W X 2 0.41 2.0
6 M N O 2 0.28 1.0
1 D E F 1 0.84 4.0
5 J K L 1 0.82 3.0
0 A B C 1 0.67 2.0
4 G H I 1 0.45 1.0
'''
out.to_csv('out.rule', header=False, index=False)
#-----------^^^^^^^^ new file
>>> out.rule
'''
P,Q,R,2,0.77,5.0
S,T,U,2,0.51,3.5
Y,Z,A,2,0.51,3.5
V,W,X,2,0.41,2.0
M,N,O,2,0.28,1.0
D,E,F,1,0.84,4.0
J,K,L,1,0.82,3.0
A,B,C,1,0.67,2.0
G,H,I,1,0.45,1.0
UPD
現在,如果最后兩列分別應該是“類”和“健身”,那么檔案中有多少列并不重要:
import pandas as pd
df = pd.read_csv('temp.rule', header=None)
df = df.rename(columns={df.columns[-1]:'fitness',df.columns[-2]:'class'})
out = (df.assign(rank=df.groupby('class')['fitness'].
transform(lambda x: x.rank())).
sort_values(['class','fitness'],ascending=False))
out.to_csv('out.rule',header=False,index=False)
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/448611.html
標籤:Python python-3.x
