Python：如何根據值的順序在熊貓df中生成兩個新列？-有解無憂

我有下表作為輸入：

	X	是
0	-0.872803	137.097977
1	-0.418766	821.549805
2	-0.657833	712.427856
3	-0.922091	126.871956
4	-0.847130	217.126068
5	0.692070	2166.090820
6	-0.858773	297.893188
7	-0.466285	634.510315
8	-0.774720	91.447876
9	-0.111050	1200.390625
10	0.325138	1759.597900

我需要生成這樣的東西：

	X	是	pos_when_sorted_by_x	pos_when_sorted_by_y
0	-0.872803	137.097977	9	8
1	-0.418766	821.549805	3	3
2	-0.657833	712.427856	5	4
3	-0.922091	126.871956	10	9
4	-0.847130	217.126068	7	7
5	0.692070	2166.090820	0	0
6	-0.858773	297.893188	8	6
7	-0.466285	634.510315	4	5
8	-0.774720	91.447876	6	10
9	-0.111050	1200.390625	2	2
10	0.325138	1759.597900	1	1

pos_when_sorted_by_x并且pos_when_sorted_by_y基于這些列中的每一列在排序資料框中的位置。

uj5u.com熱心網友回復：

使用rank：

df[['x_pos', 'y_pos']] = df.agg('rank', ascending=False).sub(1).astype(int)
print(df)

# Output:
           x            y  x_pos  y_pos
0  -0.872803   137.097977      9      8
1  -0.418766   821.549805      3      3
2  -0.657833   712.427856      5      4
3  -0.922091   126.871956     10      9
4  -0.847130   217.126068      7      7
5   0.692070  2166.090820      0      0
6  -0.858773   297.893188      8      6
7  -0.466285   634.510315      4      5
8  -0.774720    91.447876      6     10
9  -0.111050  1200.390625      2      2
10  0.325138  1759.597900      1      1

numpy 和的替代方法argsort：

df[['x_pos', 'y_pos']] = np.argsort(np.argsort(-1*df, axis=0), axis=0)
print(df)

# Output:
           x            y  x_pos  y_pos
0  -0.872803   137.097977      9      8
1  -0.418766   821.549805      3      3
2  -0.657833   712.427856      5      4
3  -0.922091   126.871956     10      9
4  -0.847130   217.126068      7      7
5   0.692070  2166.090820      0      0
6  -0.858773   297.893188      8      6
7  -0.466285   634.510315      4      5
8  -0.774720    91.447876      6     10
9  -0.111050  1200.390625      2      2
10  0.325138  1759.597900      1      1

注意：-1*是因為argsort沒有降序選項。

uj5u.com熱心網友回復：

您可以使用pd.rankwithascending=False和減去 1，這樣排名從零開始。

import pandas as pd
df = pd.DataFrame({'x': [-0.872803,
  -0.418766,
  -0.657833,
  -0.922091,
  -0.84713,
  0.69207,
  -0.858773,
  -0.466285,
  -0.77472,
  -0.11105,
  0.325138],
 'y': [137.097977,
  821.549805,
  712.427856,
  126.871956,
  217.126068,
  2166.09082,
  297.893188,
  634.510315,
  91.447876,
  1200.390625,
  1759.5979]})

df['pos_x'] = (df.x.rank(ascending=False)-1).astype(int)
df['pos_y'] = (df.y.rank(ascending=False)-1).astype(int)

輸出

           x            y  pos_x  pos_y
0  -0.872803   137.097977      9      8
1  -0.418766   821.549805      3      3
2  -0.657833   712.427856      5      4
3  -0.922091   126.871956     10      9
4  -0.847130   217.126068      7      7
5   0.692070  2166.090820      0      0
6  -0.858773   297.893188      8      6
7  -0.466285   634.510315      4      5
8  -0.774720    91.447876      6     10
9  -0.111050  1200.390625      2      2
10  0.325138  1759.597900      1      1

uj5u.com熱心網友回復：

您也可以執行以下操作：

dfs_x = df.sort_values(by='x', ascending=False)
dfs_y = df.sort_values(by='y', ascending=False)
df['pos_x'] = df.index.map(dfs_x.index.get_loc)
df['pos_y'] = df.index.map(dfs_y.index.get_loc)

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/383796.html

標籤：Python 熊猫数据框

上一篇：將Lambda函式應用于多列

下一篇：將一列資料框拆分為多列