我有一組包含以下資料的串列(在 Python 中):
['425842', '2008', 'Monday', 23:30:00', '10']
['425843', '2008', 'Tuesday', 23:30:00', '9']
['425844', '2009', 'Monday', 23:30:00', '2']
['425845', '2009', 'Monday', 23:30:00', '3']
['425846', '2010', 'Monday', 23:30:00', '2']
['425847', '2010', 'Monday', 23:30:00', '10']
['425848', '2010', 'Tuesday', 23:30:00', '10']
我想根據年份計算最后一列(索引5)的值的平均值,例如:
[2008, 9.5]
[2009, 2.5]
[2010, 7.3]
我試圖通過 Python 內置的 zip 函式來實作,但是這個函式是由 interator 生成的。你能幫我解決一下嗎?
uj5u.com熱心網友回復:
使用pandas按年份對資料進行分組,然后取第 5 列中的值的平均值。
data = [
['425842', '2008', 'Monday', '23:30:00', '10'],
['425843', '2008', 'Tuesday', '23:30:00', '9'],
['425844', '2009', 'Monday', '23:30:00', '2'],
['425845', '2009', 'Monday', '23:30:00', '3'],
['425846', '2010', 'Monday', '23:30:00', '2'],
['425847', '2010', 'Monday', '23:30:00', '10'],
['425848', '2010', 'Tuesday', '23:30:00', '10'],
]
import pandas as pd
df = pd.DataFrame(data, columns=["id", "year", "day","time","value"])
df["value"] = pd.to_numeric(df["value"])
print(df.groupby("year")["value"].mean())
uj5u.com熱心網友回復:
zip在這里根本沒有幫助;你可能想建立一個字典來收集每年的總數,這樣你就可以平均它們。
data = [
['425842', '2008', 'Monday', '23:30:00', '10'],
['425843', '2008', 'Tuesday', '23:30:00', '9'],
['425844', '2009', 'Monday', '23:30:00', '2'],
['425845', '2009', 'Monday', '23:30:00', '3'],
['425846', '2010', 'Monday', '23:30:00', '2'],
['425847', '2010', 'Monday', '23:30:00', '10'],
['425848', '2010', 'Tuesday', '23:30:00', '10'],
]
year_totals = {year: [] for year in set(year for _, year, _, _, _ in data)}
for _, year, _, _, value in data:
year_totals[year].append(int(value))
averages = {y: sum(t) / len(t) for y, t in year_totals.items()}
print(averages) # {'2010': 7.333333333333333, '2008': 9.5, '2009': 2.5}
uj5u.com熱心網友回復:
這應該有效:
data = [['425842', '2008', 'Monday', '23:30:00', '10'],
['425843', '2008', 'Tuesday', '23:30:00', '9'],
['425844', '2009', 'Monday', '23:30:00', '2'],
['425845', '2009', 'Monday', '23:30:00', '3'],
['425846', '2010', 'Monday', '23:30:00', '2'],
['425847', '2010', 'Monday', '23:30:00', '10'],
['425848', '2010', 'Tuesday', '23:30:00', '10']]
sums = {}
for i in data:
if i[1] not in sums:
sums[i[1]] = [int(i[-1])]
else:
sums[i[1]].append(int(i[-1]))
sums = {i: sum(sums[i]) / len(sums[i]) for i in sums}
output = [[i, sums[i]] for i in sums]
的價值output:
[['2008', 9.5], ['2009', 2.5], ['2010', 7.333333333333333]]
uj5u.com熱心網友回復:
您可以使用itertools.groupby按年份對串列進行分組并計算每個組的平均值:
data = [['425842', '2008', 'Monday', '23:30:00', '10'],
['425843', '2008', 'Tuesday', '23:30:00', '9'],
['425844', '2009', 'Monday', '23:30:00', '2'],
['425845', '2009', 'Monday', '23:30:00', '3'],
['425846', '2010', 'Monday', '23:30:00', '2'],
['425847', '2010', 'Monday', '23:30:00', '10'],
['425848', '2010', 'Tuesday', '23:30:00', '10']]
groups = {int(key): list(map(lambda x: int(x[4]), value)) for key, value in
itertools.groupby(data, lambda x: x[1])}
averages = {key: sum(value) / len(value) for key, value in groups.items()}
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/445822.html
