在條件成立的資料框中獲取行/列名稱和基于整數的索引-有解無憂

給定一個df帶有簡單Index（不是 a MultiIndex）的資料框- 對應于帶有行和列名稱的二維實矩陣 - 以及e中元素的布爾運算式df，我想得到：

行的名稱和基于整數的索引
列的名稱和基于整數的索引

滿足運算式的所有元素e。運算式e沒什么特別的：我對大于閾值的元素的行/列感興趣。

在閱讀了檔案以及這里的大量問題和答案后，我撰寫了下面給出的代碼。它包含兩個解決方案：

一種基于numpy. 基本上，我從資料框中提取數字并將它們視為numpy陣列。這個解決方案似乎是合理的：鑒于任務的基本性質，代碼足夠簡單。
一種基于提供的方法pandas。即使pandas是為比簡單的數字矩陣更復雜的場景而設計的，這個解決方案對于我想要完成的事情來說似乎太復雜了。

設定資料

import numpy as np
import pandas as pd

n_rows, n_cols, v = 4, 5, 3

rows = [ "r"   str(i) for i in range(n_rows) ]
columns = [ "c"    str(i) for i in range(n_cols) ]
values = np.zeros( (n_rows, n_cols), dtype=int)

ii = np.random.randint(n_rows, size=(2,))
jj = np.random.randint(n_cols, size=(2,))

poss = zip(ii, jj)
for pos in poss:
    print(f"target set at {pos} -> ({rows[pos[0]]}, {columns[pos[1]]})")
    values[pos] = v   1

print(" === values ===")
print(values)

df = pd.DataFrame(values, index=rows, columns=columns)
print(" === df === ")
print(df)

帶輸出：

target set at (2, 4) -> (r2, c4)
target set at (1, 0) -> (r1, c0)
 === values ===
[[0 0 0 0 0]
 [4 0 0 0 0]
 [0 0 0 0 4]
 [0 0 0 0 0]]
 === df === 
    c0  c1  c2  c3  c4
r0   0   0   0   0   0
r1   4   0   0   0   0
r2   0   0   0   0   4
r3   0   0   0   0   0

解決方案 `numpy`

print("\n === USING NUMPY ===")
data = df.to_numpy()
indexes = np.argwhere(data > v)
for ind in indexes:
    print(f"(numpy) target found at {ind} -> ({rows[ind[0]]}, {columns[ind[1]]})")

帶輸出：

 === USING NUMPY ===
(numpy) target found at [1 0] -> (r1, c0)
(numpy) target found at [2 4] -> (r2, c4)

解決方案 `pandas`

print("\n === WITH PANDAS ===")

# select the rows with at least one column satisfying the condition
cond = (df > v).any(1)
df2 = df[cond]
print(df2, "\n")

# stack 
stacked = df2.stack()
print(stacked, "\n")

# filter (again!)
stacked2 = stacked.loc[stacked>v]
print("indexes in stacked:", stacked2.index.to_list(), "\n")

# get index (it is a MultiIndex at this point)
target_rows = [a for (a, _) in stacked2.index.to_list()]
target_cols = [b for (_, b) in stacked2.index.to_list()]

target_rows_idx = [df.index.get_loc(row_name) for row_name in target_rows]
target_cols_idx = [columns.index(col_name) for col_name in target_cols]

for pos in zip(target_rows_idx, target_cols_idx):
    print(f"(pandas) target found at {pos} -> ({rows[pos[0]]}, {columns[pos[1]]})")

帶輸出：

 === WITH PANDAS ===
    c0  c1  c2  c3  c4
r1   4   0   0   0   0
r2   0   0   0   0   4 

r1  c0    4
    c1    0
    c2    0
    c3    0
    c4    0
r2  c0    0
    c1    0
    c2    0
    c3    0
    c4    4
dtype: int64 

indexes in stacked: [('r1', 'c0'), ('r2', 'c4')] 

(pandas) target found at (1, 0) -> (r1, c0)
(pandas) target found at (2, 4) -> (r2, c4)

有沒有更簡單的方法來撰寫代碼pandas？

uj5u.com熱心網友回復：

由于stack下降NaN的默認值，我們可以屏蔽掉值第一，然后 stack（這避免了需要過濾器的兩倍）。然后，只需抓住index和使用get_loc上都index與columns該標簽轉換為整數值：

stacked = df[df > v].stack()
label_idx = stacked.index.tolist()
integer_idx = [(df.index.get_loc(r), df.columns.get_loc(c))
               for r, c in label_idx]

for i, j in zip(integer_idx, label_idx):
    print(f'(pandas 2) target found at {i} -> {j}')

輸出：

(pandas 2) target found at (0, 0) -> ('r0', 'c0')
(pandas 2) target found at (1, 4) -> ('r1', 'c4')

stacked：

r0  c0    4.0
r1  c4    4.0
dtype: float64

label_idx：

[('r0', 'c0'), ('r1', 'c4')]

integer_index：

[(0, 0), (1, 4)]

可重現：

np.random.seed(22)

uj5u.com熱心網友回復：

我會用pd.Series.iteritems()：

>>> [x for x, y in df.gt(3).stack().iteritems() if y]
[('r1', 'c3'), ('r2', 'c3')]

對于索引：

>>> [(df.index.get_loc(a), df.columns.get_loc(b)) for (a, b), y in df.gt(3).stack().iteritems() if y]
[(1, 3), (2, 3)]
>>>

df 在這種情況下：

>>> df
    c0  c1  c2  c3  c4
r0   0   0   0   0   0
r1   0   0   0   4   0
r2   0   0   0   4   0
r3   0   0   0   0   0
>>>

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/343402.html

標籤：Python 熊猫数据框麻木的

上一篇：如何根據另一個資料框的行組織資料框的列？

下一篇：如何從資料幀計算事件的相對頻率？

在條件成立的資料框中獲取行/列名稱和基于整數的索引

設定資料

解決方案 numpy

解決方案 pandas

解決方案 `numpy`

解決方案 `pandas`