使用Pandas(Python)從嵌套元組串列創建DataFrame-有解無憂

我正在為一項簡單的任務而苦苦掙扎，在閱讀了大量關于 Pandas 檔案的 StackOverflow 帖子和搜索后，我決定在這里尋求幫助。

問題

我有一個嵌套元組串列，如下所示：

嵌套元組串列的終端日志

使用 Pandas (Python) 從嵌套元組串列創建 DataFrame

我需要為每個內部元組元素創建一個帶有一列的 Pandas DataFrame。當我洗掉每個內部元組的最后兩個元素時（例如 ((58, '2022-04-28', 85.0199966430664, 'BUY'), (67, '2022-05-11', 77.54000091552734, 'STOP BUY') )) 我得到了預期的結果：

具有預期結果的 DataFrame

使用 Pandas (Python) 從嵌套元組串列創建 DataFrame

到目前為止，一切都很好。

但請注意，我正在處理最后兩列中的 NaN 值。這就是我覺得棘手的地方。當我向每個內部元組添加兩個值時（例如 ((58, '2022-04-28', 85.0199966430664, 'BUY', 8501.99966430664, 100 ), (67, '2022-05-11', 77.54000091552734, 'STOP BUY ', -747.9995727539062, -0.08797925220982794 ))) 我得到了一個 DataFrame，其中最后兩列的每個 NaN 值都填充了我添加的新值，如下圖所示： DataFrame with unexpected result

使用 Pandas (Python) 從嵌套元組串列創建 DataFrame

我怎樣才能為內部元組的每個專案創建一個列？

我的代碼如下所示：

# simply converting an existing dictionary into a DataFrame
final_report_df = pd.DataFrame.from_dict(final_report,orient="index")
# I'm using chain only to reduce the level of nested lists I had previously
prepare_data_to_df = list(chain.from_iterable(all_orders))
df_all_orders = pd.DataFrame(prepare_data_to_df, columns=["Id", "Date", "Price", "Label", "Profit/Loss ($)", "Profit/Loss (%)"]
df_all_orders.drop("Id", axis=1, inplace=True)

給定這樣的元組：

（（58，'2022-04-28'，85.0199966430664，'買'，8501.99966430664，100），（67，'2022-05-11'，77.54000091552734

作為預期結果，我想要 7 列：

| 日期 | 價格 | 標簽 | 損益 ($) | 盈虧 (%) | 投資金額 | 股票 |

將為兩個元組填寫日期、價格、標簽，同時僅在與第二個元組相關的行中填寫盈虧 ($) 和盈虧 (%)。最后，Stock Shares 將填充第一個元組的最后一個值，而 Amount Invested 將填充第一個元組的最后一個值。

我希望我的解釋沒有混淆...

提前致謝。

uj5u.com熱心網友回復：

據我了解，您具有以下結構：

[
  (
    (A1, B1, C1, D1, Y, Z),
    (A2, B2, C2, D2, W, X)
  ), ...
]

您正在嘗試使用以下結構轉換為 Dataframe：

A   B   C   D    W    X    Y    Z
----------------------------------
A1  B1  C1  D1  NaN  NaN   Y    Z
A2  B2  C2  D2   W    X   NaN  NaN

我確信有幾種不同的方法可以解決這個問題，我傾向于創建兩個單獨的資料框，一個用于第一組元組，一個用于第二組，然后進行外部合并。

當我使用您的資料樣本進行嘗試時，以下方法有效：

# Create dictionaries from the first and second tuples, respectively
orders = {i: all_orders[i][0] for i in range(len(all_orders))}
stop_orders = {i: all_orders[i][1] for i in range(len(all_orders))}

# Convert dictionaries into DFs and give appropriate column names
orders_df = pd.DataFrame.from_dict(orders, orient="index")
orders_df.columns = ["ID", "Date", "Price", "Label", "Amount Invested", "Stock Shares"]
stop_orders_df = pd.DataFrame.from_dict(stop_orders, orient="index")
stop_orders_df.columns = ["ID", "Date", "Price", "Label", "Profit/Loss ($)", "Profit/Loss (%)"]

# Execute an outer merge so all columns are retained and columns that are in one DF but not in the other are filled with NA
all_orders_df = pd.merge(orders_df, stop_orders_df, how="outer")

希望有幫助！如果您有大量資料，可能會有更高效的方法，但上述方法應該可以完成作業。

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/519502.html

標籤：Python熊猫数据框

上一篇：Pandas.replace內的foriinrange()

下一篇：面臨R的問題，DataframeSource錯誤