我有一個資料框,其中包含有關日期和所屬集群的資訊(之前是根據每天收集的溫度完成的)。我想按順序繪制這些資料,就像堆積條形圖一樣,根據分配的集群改變每個元素的顏色。這是我的桌子(資訊最多 100 天):
| 日期 | 命令 | ClusterNo2 | 持續的 |
|---|---|---|---|
| 2020-08-07 | 1 | 3.0 | 1 |
| 2020-08-08 | 2 | 0.0 | 1 |
| 2020-08-09 | 3 | 1.0 | 1 |
| 2020-08-10 | 4 | 3.0 | 1 |
| 2020-08-11 | 5 | 1.0 | 1 |
| 2020-08-12 | 6 | 1.0 | 1 |
| 2020-08-13 | 7 | 3.0 | 1 |
| 2020-08-14 | 8 | 2.0 | 1 |
| 2020-08-15 | 9 | 2.0 | 1 |
| 2020-08-16 | 10 | 2.0 | 1 |
| 2020-08-17 | 11 | 2.0 | 1 |
| 2020-08-18 | 12 | 1.0 | 1 |
| 2020-08-19 | 13 | 1.0 | 1 |
| 2020-08-20 | 14 | 0.0 | 1 |
| 2020-08-21 | 15 | 0.0 | 1 |
| 2020-08-22 | 16 | 1.0 | 1 |
Obs:我不能簡單地按集群對資料進行分組,因為繪圖應該是連續的。我想撰寫一個代碼來順序識別每個集群的元素數量,但是我將面臨同樣的繪圖問題。有人知道如何解決這個問題嗎?
預期的結果應該是這樣的(代表集群的條形圖內的數字,x 軸以天為單位的時間,條形寬度是按順序觀察到的具有相同集群的天數:

uj5u.com熱心網友回復:
您可以將日期用作 x 軸,將“常量”列用作 y 軸,將集群 ID 用作著色。
您可以使用彩色矩形串列創建自定義圖例。
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import pandas as pd
import numpy as np
N = 100
df = pd.DataFrame({'Date': pd.date_range('2020-08-07', periods=N, freq='D'),
'order': np.arange(1, N 1),
'ClusterNo2': np.random.randint(0, 4, N).astype(float),
'constant': 1})
df['ClusterNo2'] = df['ClusterNo2'].astype(int) # convert to integers
fig, ax = plt.subplots(figsize=(15, 3))
num_clusters = df['ClusterNo2'].max() 1
colors = plt.cm.Set2.colors
ax.bar(x=range(len(df)), height=df['constant'], width=1, color=[colors[i] for i in df['ClusterNo2']], edgecolor='none')
ax.set_xticks(range(len(df)))
labels = ['' if i % 3 != 0 else day.strftime('%d\n%b %Y') if i == 0 or day.day <= 3 else day.strftime('%d')
for i, day in enumerate(df['Date'])]
ax.set_xticklabels(labels)
ax.margins(x=0, y=0)
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
legend_handles = [plt.Rectangle((0, 0), 0, 0, color=colors[i], label=f'{i}') for i in range(num_clusters)]
ax.legend(handles=legend_handles, title='Clusters', bbox_to_anchor=(1.01, 1.01), loc='upper left')
fig.tight_layout()
plt.show()

uj5u.com熱心網友回復:
您可以繪制一個正常的條形圖,1 個條形對應 1 天。如果將寬度也設為 1,則補丁看起來是連續的。

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm
# simulate data
total_datapoints = 16
total_clusters = 4
order = np.arange(total_datapoints)
clusters = np.random.randint(0, total_clusters, size=total_datapoints)
# map clusters to colors
cmap = plt.cm.tab10
bounds = np.arange(total_clusters 1)
norm = BoundaryNorm(bounds, cmap.N)
colors = [cmap(norm(cluster)) for cluster in clusters]
# plot
fig, ax = plt.subplots()
ax.bar(order, np.ones_like(order), width=1, color=colors, align='edge')
# xticks
change_points = np.where(np.diff(clusters) != 0)[0] 1
change_points = np.unique([0] change_points.tolist() [total_datapoints])
ax.set_xticks(change_points)
# annotate clusters
for ii, dx in enumerate(np.diff(change_points)):
xx = change_points[ii] dx/2
ax.text(xx, 0.5, str(clusters[int(xx)]), ha='center', va='center')
ax.set_xlabel('Time (days)')
plt.show()
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/425809.html
標籤:Python matplotlib
