我有一個包含如下資料集的表格。它的 10 列和 30 多行。“姓名”字典單元格中的記錄順序很重要。每個標題可以有 1 個以上。
data =[
['4/18/2005', [{'grantor': 'Company1'}, {'grantee': 'Company2'}]],
['3/29/2005', [{'grantor': 'Company3'}, {'grantor': 'Company1'}, {'grantor': 'Company4'}, {'grantee': 'Company5'}, {'grantee': 'Company2'}]],
['3/29/2005', [{'grantor': 'Company2'}, {'grantor': 'Company9'}, {'grantor': 'Apple'}, {'grantee': 'CompnayX'}, {'grantee': 'CompanyY'}, {'grantee': 'CompanyR'}]]
]
df = pd.DataFrame(data=data, columns=['fdate', 'names'])
我想做以下兩項任務:
- 處理單行。我想讀取選定行的名稱單元格,并將其轉換為這樣的內容(第 1 行)
data = [
[{'Title': 'grantor', 'Company': 'Company1'}],
[{'Title': 'grantee', 'Company': 'Company2'}]
]
df = pd.DataFrame(data)
- 展開整個資料集
data = [
[{'fdate': '4/18/2005', 'pos':'1', 'Title': 'grantor', 'Company': 'Company1'}],
[{'fdate': '4/18/2005', 'pos':'2', 'Title': 'grantee', 'Company': 'Company2'}],
[{'fdate': '3/29/2005', 'pos':'1', 'Title': 'grantor', 'Company': 'Company3'}],
[{'fdate': '3/29/2005', 'pos':'2', 'Title': 'grantor', 'Company': 'Company1'}],
[{'fdate': '3/29/2005', 'pos':'3', 'Title': 'grantor', 'Company': 'Company4'}],
[{'fdate': '3/29/2005', 'pos':'4', 'Title': 'grantor', 'Company': 'Company5'}],
[{'fdate': '3/29/2005', 'pos':'5', 'Title': 'grantee', 'Company': 'Company2'}]
]
df = pd.DataFrame(data)
uj5u.com熱心網友回復:
這是我對解決方案的嘗試:
我已經標記了任務 1 完成的部分
import pprint
import pandas as pd
#Task 1
def process_row(datum, i):
ret = []
new_dict = {}
information = datum[i][1]
for info in information:
for (key, value) in info.items():
new_dict['Title'] = key
new_dict['Company'] = value
ret.append([new_dict.copy()])
new_dict.clear()
return ret
#Need to create two dicts for Task 2
def process_date(datum, i):
return {'fdate': datum[i][0]}
pd.set_option('display.max_colwidth', None)
data =[
['4/18/2005', [{'grantor': 'Company1'}, {'grantee': 'Company2'}]],
['3/29/2005', [{'grantor': 'Company3'}, {'grantor': 'Company1'}, {'grantor': 'Company4'}, {'grantee': 'Company5'}, {'grantee': 'Company2'}]],
['3/29/2005', [{'grantor': 'Company2'}, {'grantor': 'Company9'}, {'grantor': 'Apple'}, {'grantee': 'CompnayX'}, {'grantee': 'CompanyY'}, {'grantee': 'CompanyR'}]]
]
#This line prints the result for Task 1.
#You should iterate if you need the function to apply to other rows.
print(process_row(data, 0))
#print(process_date(data, 0))
new_data = []
for i in range(len(data)):
temp = process_row(data, i)
for index, element in enumerate(temp):
new_dict ={}
new_dict.update(process_date(data, i))
new_dict.update({'pos': index 1})
for kv in element:
new_dict.update(kv)
new_data.append([new_dict])
pp=pprint.PrettyPrinter(indent=2)
pp.pprint(new_data)
這是輸出的圖片
第一行是任務 1 的輸出。輸出的其余部分是任務 2 的輸出,注意它也包括原始資料中的第三個元素。
輸出:https : //imgur.com/a/M6TvO70
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/395690.html
