有沒有辦法將陣列格式的字典串列轉換為資料框中的各個列？-有解無憂

我不能將 Pyspark 用作 FYI！

我的資料如下所示：

0   { "CountryOfManufacture": "China", "Tags": ["U...
1   { "CountryOfManufacture": "China", "Tags": ["U...
2   { "CountryOfManufacture": "China", "Tags": [] }
3   { "CountryOfManufacture": "Japan", "Tags": ["3...
4   { "CountryOfManufacture": "Japan", "Tags": ["1...
... ...
222 { "CountryOfManufacture": "USA", "ShelfLife": ...
223 { "CountryOfManufacture": "USA", "ShelfLife": ...
224 { "CountryOfManufacture": "USA", "ShelfLife": ...
225 { "CountryOfManufacture": "USA", "ShelfLife": ...
226 { "CountryOfManufacture": "USA", "ShelfLife": .

所以字典中包含不同的值。我只對第一個（制造國）感興趣，并希望將其拆分，然后添加到另一個資料幀中。

謝謝！

uj5u.com熱心網友回復：

如果您所有的詞典都具有相同的鍵（或者即使它們沒有！請參閱下面 Pranav 的評論！），那么效果pandas.DataFrame.from_records會很好（鏈接到檔案頁面）。

import pandas as pd

data = [{'CountryOfManufacture': 'China', 'col_2': 'a'},
        {'CountryOfManufacture': 'Japan', 'col_2': 'b'},
        {'CountryOfManufacture': 'China', 'col_2': 'c'},
        {'CountryOfManufacture': 'USA', 'col_2': 'd'}]

df = pd.DataFrame.from_records(data)
print(df.head())

#   CountryOfManufacture col_2
# 0                China     a
# 1                Japan     b
# 2                China     c
# 3                  USA     d

如果您只需要一列，您可以在, 之后選擇該列df["CountryOfManufacture"]，或者使用exclude關鍵字并提供您不需要的所有列的串列df = pd.DataFrame.from_records(data, exclude=['col_2'])

uj5u.com熱心網友回復：

當我嘗試使用 from_records 時，我的結果如下所示：

                                        CustomFields
0  { "CountryOfManufacture": "China", "Tags": ["U...
1  { "CountryOfManufacture": "China", "Tags": ["U...
2    { "CountryOfManufacture": "China", "Tags": [] }
3  { "CountryOfManufacture": "Japan", "Tags": ["3...
4  { "CountryOfManufacture": "Japan", "Tags": ["1...

我認為這是因為我的資料格式不尋常。我的資料最初是在一個 CSV 檔案中提供的，這是其中一列。所有其他列都是整數/浮點數/物件格式，而當您在 Excel 中查看它時，該列已經是字典格式。

您在下面的示例中使用的資料的格式與我預期的一樣，但這是我轉換為串列時的樣子：

['{ "CountryOfManufacture": "China", "Tags": ["USB Powered"] }', '{ "CountryOfManufacture": "China", "Tags": ["USB Powered"] }', '{ "CountryOfManufacture": "China", "Tags": [] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["32GB","USB Powered"] }', '{ "CountryOfManufacture": "Japan", "Tags": ["16GB","USB Powered"] }', '{ "CountryOfManufacture": "China", "Tags": ["Comedy"] }', ...

如您所見，我在每個字典串列之外都有額外的引號，這里用一行說明：['{ "CountryOfManufacture": "China", "Tags": ["USB Powered"] }'。

有沒有辦法在沒有 pyspark 的情況下解決這個問題？

謝謝！

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/312959.html

標籤：Python 数据框字典

上一篇：如何從將第一個df的col1映射到第二個df的col1的兩個資料幀創建字典，并對所有列執行此操作

下一篇：將dict寫入CSV后，如何將其還原以檢索原始dict