pandas：根據某些列和行中的條件劃分資料框-有解無憂

我有一個資料框（比這個例子大得多），如下所示，前兩列中的所有行都重復了 5 次。

import pandas as pd
df = pd.DataFrame({'text':['the weather is nice','the weather is nice','the weather is nice','the weather is nice','the weather is nice',
                        'the house is beautiful','the house is beautiful','the house is beautiful','the house is beautiful','the house is beautiful',
                        'the day is long','the day is long','the day is long','the day is long','the day is long'],
               'reference':['weather','weather','weather','weather','weather',
                            'house','house','house','house','house',
                            'day','day','day','day','day'],
               'id':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]})

我想將這個熊貓資料幀分成兩個資料幀，前兩個連續行出現在一個資料幀中，其他三個出現在第二個資料幀中，如下所示。

所需的輸出：

first df:

                      text reference  id
0      the weather is nice   weather   1
1      the weather is nice   weather   2
3   the house is beautiful     house   6
4   the house is beautiful     house   7
5         the day is long       day  11
6         the day is long       day  12

second df:
                      text reference  id
0      the weather is nice   weather   3
1      the weather is nice   weather   4
2      the weather is nice   weather   5
3   the house is beautiful     house   8
4   the house is beautiful     house   9
5   the house is beautiful     house  10
6         the day is long       day  13
7         the day is long       day  14
8         the day is long       day  15

顯然選擇 n 行不起作用（例如 df.iloc[::3, :] 或 df[df.index % 3 == 0]）所以我想知道上述輸出如何可能.

uj5u.com熱心網友回復：

如果您想對參考值進行分組（前 2 項與其余項）：

mask = df.groupby('reference').cumcount().gt(1)
groups = [g for k,g in df.groupby(mask)]

# or manually
# df1 = df[~mask]
# df2 = df[mask]

使用位置：

mask = (np.arange(len(df))%5)<1

# or with a range index
# mask = df.index.mod(5).gt(1)

# then same as above using groupby or slicing

uj5u.com熱心網友回復：

制作面具m：

import numpy as np

m = np.tile([True, True, False, False, False], len(df) // 5)

df1 = df[m]
df2 = df[~m]

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/491315.html

標籤：Python 熊猫数据框行划分

上一篇：無法將DataFrame索引轉換為日期時間

下一篇：是否有基于python中多個if陳述句的結果創建新變數的函式或方法？