我有一個資料框(比這個例子大得多),如下所示,前兩列中的所有行都重復了 5 次。
import pandas as pd
df = pd.DataFrame({'text':['the weather is nice','the weather is nice','the weather is nice','the weather is nice','the weather is nice',
'the house is beautiful','the house is beautiful','the house is beautiful','the house is beautiful','the house is beautiful',
'the day is long','the day is long','the day is long','the day is long','the day is long'],
'reference':['weather','weather','weather','weather','weather',
'house','house','house','house','house',
'day','day','day','day','day'],
'id':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]})
我想將這個熊貓資料幀分成兩個資料幀,前兩個連續行出現在一個資料幀中,其他三個出現在第二個資料幀中,如下所示。
所需的輸出:
first df:
text reference id
0 the weather is nice weather 1
1 the weather is nice weather 2
3 the house is beautiful house 6
4 the house is beautiful house 7
5 the day is long day 11
6 the day is long day 12
second df:
text reference id
0 the weather is nice weather 3
1 the weather is nice weather 4
2 the weather is nice weather 5
3 the house is beautiful house 8
4 the house is beautiful house 9
5 the house is beautiful house 10
6 the day is long day 13
7 the day is long day 14
8 the day is long day 15
顯然選擇 n 行不起作用(例如 df.iloc[::3, :] 或 df[df.index % 3 == 0])所以我想知道上述輸出如何可能.
uj5u.com熱心網友回復:
如果您想對參考值進行分組(前 2 項與其余項):
mask = df.groupby('reference').cumcount().gt(1)
groups = [g for k,g in df.groupby(mask)]
# or manually
# df1 = df[~mask]
# df2 = df[mask]
使用位置:
mask = (np.arange(len(df))%5)<1
# or with a range index
# mask = df.index.mod(5).gt(1)
# then same as above using groupby or slicing
uj5u.com熱心網友回復:
制作面具m:
import numpy as np
m = np.tile([True, True, False, False, False], len(df) // 5)
df1 = df[m]
df2 = df[~m]
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/491315.html
