我有一個帶有一組值(價格)的熊貓資料框。在每組中,initiator_id我需要對價格進行升序排序,如果是type == sell,則降序排列type == buy。然后我在每個組中添加一個 id。現在我做:
df['bidnum'] = df.groupby(['initiator_id', 'type']).cumcount()
'initiator_id', 'type == sell'在每個組中升序和降序排序的有效方法是什么'initiator_id', 'type == buy'?
這是原始資料集現在的樣子:
initiator_id price type bidnum
1 170.81 sell 0
2 170.81 sell 0
2 169.19 buy 0
3 170.81 sell 0
3 169.19 buy 0
3 70.81 sell 1
4 170.81 sell 0
4 169.19 buy 0
4 70.81 sell 1
4 69.19 buy 1
我需要類似的東西:
initiator_id, price, type
1, 100,sell
1, 99, sell
1, 98, sell
1, 110, buy
1, 120, buy
1, 125, buy
這樣sell每個initiator_id組內的子組按降序排序,buy子組按升序排序。
uj5u.com熱心網友回復:
如果您可以假設您的"price"列將始終包含非負值,我們可以“作弊”。為買入或賣出操作的價格分配一個負值,排序,然后計算絕對值以回傳原始價格:
如果型別為
"buy",則價格保持正數 (2 * 1 - 1 = 1)。如果型別為"sell",價格將變為負數 (2 * 0 - 1 = -1)。df["price"] = df["price"] * (2 * (df["type"] == "buy").astype(int) - 1)現在正常排序值。我已經包含了
"initiator_id"和"type"列以匹配您的預期輸出:df = df.sort_values(["initiator_id", "type", "price"])最后,計算列的絕對值
"price"以檢索原始值:df["price"] = df["price"].abs()
此操作在您的示例輸入上的預期輸出:
initiator_id price type bidnum
0 1 170.81 sell 0
2 2 169.19 buy 0
1 2 170.81 sell 0
4 3 169.19 buy 0
3 3 170.81 sell 0
5 3 70.81 sell 1
9 4 69.19 buy 1
7 4 169.19 buy 0
6 4 170.81 sell 0
8 4 70.81 sell 1
uj5u.com熱心網友回復:
一種解決方案:
final_df = pd.DataFrame()
grouped_df = df.groupby(['initiator_id', 'type'])
for key, item in grouped_df:
dfg = grouped_df.get_group(key).reset_index()
final_df = final_df.append(dfg.sort_values('price', ascending=(dfg.loc[0, 'type']=='buy')))
final_df.drop(final_df.columns[0], axis=1, inplace=True)
final_df.reset_index(inplace=True, drop=True)
輸出:
initiator_id price type
0 1 170.81 sell
1 2 169.19 buy
2 2 170.81 sell
3 3 169.19 buy
4 3 170.81 sell
5 3 70.81 sell
6 4 69.19 buy
7 4 169.19 buy
8 4 170.81 sell
9 4 70.81 sell
uj5u.com熱心網友回復:
其他人都用熊貓給出了解決方案。在這里,我提出了一個沒有 pandas 的解決方案。
輸入 CSV:
initiator_id,price,type,bidnum
1,170.81,sell,0
2,170.81,sell,0
2,169.19,buy,0
3,170.81,sell,0
3,169.19,buy,0
3,70.81,sell,1
4,170.81,sell,0
4,169.19,buy,0
4,70.81,sell,1
4,69.19,buy,1
輸出 CSV:
initiator_id,price,type,bidnum
1,170.81,sell,0
2,170.81,sell,0
2,169.19,buy,0
3,170.81,sell,0
3,70.81,sell,1
3,169.19,buy,0
4,170.81,sell,0
4,70.81,sell,1
4,69.19,buy,1
4,169.19,buy,0
代碼:
from collections import OrderedDict
import numpy
"""
the reason why this code uses exec is so that the ordering of columns can be arbitrary
"""
def remove_duplicates(seq):
seen = set()
seen_add = seen.add
return [x for x in seq if not (x in seen or seen_add(x))]
def returnLastIndex(temp2):
global mydict
temp3 = mydict['initiator_id'][temp2]
while True:
temp2 = temp2 1
try:
if mydict['initiator_id'][temp2] != temp3:
return temp2-1
except:
return temp2-1
def returnFirstIndex(temp2):
global mydict
temp3 = mydict['initiator_id'][temp2]
while temp2 >= 1:
temp2 = temp2 - 1
if mydict['initiator_id'][temp2] != temp3:
return temp2 1
return 0
with open("input.csv") as file:
lines = file.readlines()
new_lines = []
new_headers = []
for x in range(len(lines)): #loop to reamove headers and newlines
if x == 0:
for y in lines[x].strip().split(","):
new_headers.append(y)
else:
new_lines.append(lines[x].strip())
mydict = OrderedDict()
for x in new_headers:
exec("mydict['" x "'] = []")
for x in range(len(new_headers)):
for y in new_lines:
if new_headers[x] == "initiator_id":
exec("mydict['" new_headers[x] "'].append(int('" y.split(",")[x] "'))")
elif new_headers[x] == "price":
exec("mydict['" new_headers[x] "'].append(float('" y.split(",")[x] "'))")
else:
exec("mydict['" new_headers[x] "'].append('" y.split(",")[x] "')")
for x in new_headers:
exec("mydict['" x "'] = numpy.array(mydict['" x "'])")
temp1 = mydict['initiator_id'].argsort()
for x in (new_headers):
exec("mydict['" x "'] = mydict['" x "'][temp1]")
splice_list_first = []
for x in range(len(mydict['initiator_id'])):
splice_list_first.append(returnFirstIndex(x))
splice_list_last = []
for x in range(len(mydict['initiator_id'])):
splice_list_last.append(returnLastIndex(x))
splice_list_first = remove_duplicates(splice_list_first)
splice_list_last = remove_duplicates(splice_list_last)
master_string = ",".join(new_headers) "\n"
for x in range(len(splice_list_first)):
temp4 = OrderedDict()
for y in new_headers:
exec("temp4['" y "'] = mydict['" y "'][" str(splice_list_first[x]) ":" str(splice_list_last[x] 1) "]")
sell_index = []
buy_index = []
for z in range(len(temp4['type'])):
if temp4['type'][z] == "sell":
sell_index.append(z)
if temp4['type'][z] == "buy":
buy_index.append(z)
temp5 = OrderedDict()
for a in range(len(sell_index)):
for b in new_headers:
try:
exec("temp5['" b "']")
except:
exec("temp5['" b "'] = []")
exec("temp5['" b "'].append(temp4['" b "'][" str(sell_index[a]) ":" str(sell_index[a] 1) "][0])")
try:
for c in new_headers:
exec("temp5['" c "'] = numpy.array(temp5['" c "'])")
temp7 = temp5['price'].argsort()[::-1]
for d in (new_headers):
exec("temp5['" d "'] = temp5['" d "'][temp7]")
for e in range(len(temp5['initiator_id'])):
for f in new_headers:
master_string = master_string str(temp5[f][e]) ","
master_string = master_string[:-1] "\n"
except Exception as g:
pass
temp6 = OrderedDict()
for a in range(len(buy_index)):
for b in new_headers:
try:
exec("temp6['" b "']")
except:
exec("temp6['" b "'] = []")
exec("temp6['" b "'].append(temp4['" b "'][" str(buy_index[a]) ":" str(buy_index[a] 1) "][0])")
try:
for c in new_headers:
exec("temp6['" c "'] = numpy.array(temp6['" c "'])")
temp7 = temp6['price'].argsort()
for d in (new_headers):
exec("temp6['" d "'] = temp6['" d "'][temp7]")
for e in range(len(temp6['initiator_id'])):
for f in new_headers:
master_string = master_string str(temp6[f][e]) ","
master_string = master_string[:-1] "\n"
except Exception as g:
pass
print(master_string)
f = open("output.csv", "w")
f.write(master_string)
f.close()
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/428706.html
下一篇:如何重命名熊貓中的嵌套列組?
