使用Pandas在Python中進行分組列操作 -有解無憂

我有一個由.csv匯入的資料框架，其中包含n個試驗。試驗按列排列，有一個標題（波長1代表試驗1，波長2代表試驗2等）。我們正在跟蹤一個化學反應程序中溶液的吸收情況。你可以在鏈接中看到EXCEL檔案的SS。試驗被分成三組（g糖是IDV，吸光度（nm）是DV）。對于每個試驗：

我需要確定最大和最小值是什么。這當然可以用max()和min()來完成，但是當我們每0.25秒采樣一次時，資料可能會有噪音，這意味著我必須對其進行平滑處理。我已經建立了一個函式來做到這一點。我們也可能只是每隔一秒采樣一次，因為無論如何都會更平滑。

每組三個試驗需要繪制在同一張圖上，以便進行比較。N個試驗數量將創建N/3張圖。

我是有MATLAB的中級背景的人。這也不是我在那里能夠搞清楚的事情。

我做了什么？

到目前為止，我都做了些什么？

我曾試圖將每個試驗的標題做成一個串列，然后使用for回圈，使用df.column_name命令在資料中移動：

data = pd.read_csv('data.csv'/span>)
col_name = data.columns.values
print(col_name)
for i in col_name:
    print(data.col_name[i])

這段代碼一直作業到第4行，在那里回傳錯誤。AttributeError: 'DataFrame' object has no attribute 'col_name'. 在這里，我想用wavelength1試驗中的所有數值制作一個系列或集合（不管它在這里叫什么），以便繪制/操縱/等等。值得注意的是，我已經通過手動方式實作了多圖和多線的作業：但我想把它自動化，因為這就是編碼的意義所在。下面是 "手動 "版本的四張圖中的一張：

import pandas as pd import matplotlib.pyplot as plt #import matplotlib as matplotlib data = pd.read_csv('data.csv') plt.close("all") n_rows = 2 n_columns = 2 #initialize figure[/span]。 figure_size = (30,15) font_size = 13 #plot first three runs 'Trial 1') x=data.time2 y=data.wavelength2 plt1.plot(x,y,label='Trial 2') plt1.set_title('0.3g Glucose', fontweight="brought", size=font_size) x=data.time3 y=data.wavelength3 plt1.plot(x,y,label='Trial 3') plt1.set_ylabel('Wavelength (nm)', fontsize = font_size) plt1.set_xlabel('時間（s）', fontsize = font_size) plt1.legend(fontsize=font_size)

我的第一個想法是只做：

for i in range（0,num_col）。
    plot(time,data.wavelength(i))

但是這并不奏效。我相信這是很簡單的事情，但我不明白。

示例資料：

最終腳本

(包括在內，便于復制/粘貼。)

import pandas as pd
import matplotlib.pyplot as plt
import itertools


def grouper(n, iterable, fillvalue=None) 。
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx" 
    args = [iter（iterable）] * n
    return itertools.zip_longest(fillvalue=fillvalue, *args)


def get_trials(df, column_group_names) 。
    ""從資料框架中獲取列，去掉缺失的值。""。
    column_group = df[list（column_group_names）]
    column_group = column_group.dropna(how='all')
    return column_group


def subplot_axes_iterator(n_rows, n_columns) 。
    for i in range（n_rows）。
        for j in range（n_columns）。
            yield i, j



df = pd.read_csv("data.csv"/span>)

# 獲取最大和最小值。
print("ANALYSIS OF MIN AND MAX VALUES")
analysis_df = pd.DataFrame()
analysis_df["min"/span>] = df.min()
analysis_df["min_idx"] = df.idxmin()
analysis_df["max"] = df.max()
analysis_df["max_idx"] = df.idxmax()
print(analysis_df)


# Plot data[/span]。
n_rows = 2
n_columns =3
figure_size = (15, 10)
font_size = 13 3, df.columns)
axes_iterator = subplot_axes_iterator(n_rows, n_columns)
plot_names = [
    "Group 1",
    "Group 2"。
    "Group 3"。
    "第4組"。
    "第5組"。
]

for column_group_names, axes_position, plot_name in. 
        zip（col_iterator, axes_iterator, plot_names）。
    print(f "plotting {column_group_names} at {axes_position}")
    column_group = get_trials(df, column_group_names)
    column_group.plot(ax=axes[axes_position])
    axes[axes_position].set_title(plot_name, fontweight="bold"/span>, size=font_size)
    axes[axes_position].set_xlabel("時間（s）", fontsize=font_size)
    axes[axes_position].set_ylabel("Wavelength (nm)"/span>, fontsize=font_size)
plt.tight_layout()
plt.show()

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/310256.html

標籤：

上一篇：從React發送資料到Node后端

下一篇：利用三角法用java畫三角形