我正在嘗試在 python 中對以下 JSON 進行 DE-NEST 嵌套以創建 CSV 表,有人可以幫忙嗎?
輸入 JSON
{
"paging": { "start": 0, "count": 10, "links": [] },
"elements": [
{
"followerGains": {
"organicFollowerGain": 2,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634169600000, "end": 1634256000000 }
},
{
"followerGains": {
"organicFollowerGain": -1,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634256000000, "end": 1634342400000 }
},
{
"followerGains": {
"organicFollowerGain": -2,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634342400000, "end": 1634428800000 }
},
{
"followerGains": {
"organicFollowerGain": 0,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634428800000, "end": 1634515200000 }
},
我嘗試了下面的代碼,但它把所有東西都壓平成一行。我在另一個執行緒中讀到使用 json_normalize() 將在列中構建資料。但是有人可以告訴我如何處理這種情況嗎?
我使用的代碼如下 Python代碼
import json
import pandas as pd
from pandas.io.json import json_normalize
data = json.load(open('C:/Users/Muj/Downloads/Linkedin data/follower_statistics_per_day.json'))
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name a '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name str(i) '_')
i = 1
else:
out[name[:-1]] = x
flatten(y)
return out
fd=flatten_json(data)
flat_data=json_normalize(fd)
flat_data.to_csv('C:/Users/Muj/Downloads/Linkedin data/test1.csv', index = False)
有人可以幫忙嗎
所需的輸出如下 -
| 有機追隨者增益 | 付費粉絲收益 | 組織物體 |
|---|---|---|
| 2 | 0 | 骨灰盒:li:組織:28849398 |
| -1 | 0 | 骨灰盒:li:組織:28849398 |
uj5u.com熱心網友回復:
不需要你的flatten_json功能。只需將elements部分直接傳遞給json_nomalize
flat_data = json_normalize(data['elements'])
那回傳
organizationalEntity,followerGains.organicFollowerGain,followerGains.paidFollowerGain,timeRange.start,timeRange.end
urn:li:organization:28849398,2,0,1634169600000,1634256000000
urn:li:organization:28849398,-1,0,1634256000000,1634342400000
urn:li:organization:28849398,-2,0,1634342400000,1634428800000
urn:li:organization:28849398,0,0,1634428800000,1634515200000
然后您只需要重命名列標題并洗掉您不想要的任何列。
# Rename columns to only use the final section in dot name
flat_data.rename(dict((x, x.split('.')[-1]) for x in flat_data.columns if '.' in x), axis=1, inplace=True)
# Drop start and end columns
flat_data.drop(['start', 'end'], axis=1, inplace=True)
然后回傳
organizationalEntity,organicFollowerGain,paidFollowerGain
urn:li:organization:28849398,2,0
urn:li:organization:28849398,-1,0
urn:li:organization:28849398,-2,0
urn:li:organization:28849398,0,0
把所有這些放在一起:
from pandas.io.json import json_normalize
data = {
"paging": { "start": 0, "count": 10, "links": [] },
"elements": [
{
"followerGains": {
"organicFollowerGain": 2,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634169600000, "end": 1634256000000 }
},
{
"followerGains": {
"organicFollowerGain": -1,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634256000000, "end": 1634342400000 }
},
{
"followerGains": {
"organicFollowerGain": -2,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634342400000, "end": 1634428800000 }
},
{
"followerGains": {
"organicFollowerGain": 0,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634428800000, "end": 1634515200000 }
}
]
}
flat_data = json_normalize(data['elements'])
# Rename columns to only use the final section in dot name
flat_data.rename(dict((x, x.split('.')[-1]) for x in flat_data.columns if '.' in x), axis=1, inplace=True)
# Drop start and end columns
flat_data.drop(['start', 'end'], axis=1, inplace=True)
flat_data.to_csv('out.csv', index=False)
uj5u.com熱心網友回復:
試試下面的(不使用任何外部庫 - 只是核心 python)
import csv
data = {
"paging": { "start": 0, "count": 10, "links": [] },
"elements": [
{
"followerGains": {
"organicFollowerGain": 2,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634169600000, "end": 1634256000000 }
},
{
"followerGains": {
"organicFollowerGain": -1,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634256000000, "end": 1634342400000 }
},
{
"followerGains": {
"organicFollowerGain": -2,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634342400000, "end": 1634428800000 }
},
{
"followerGains": {
"organicFollowerGain": 0,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634428800000, "end": 1634515200000 }
},
]}
holder = []
for e in data['elements']:
temp = [str(e['followerGains']['organicFollowerGain'])]
temp.append(str(e['followerGains']['paidFollowerGain']))
temp.append(e['organizationalEntity'])
temp.append(str(e['timeRange']['start']))
temp.append(str(e['timeRange']['end']))
holder.append(temp)
with open('out.csv','w') as f:
f.write('organicFollowerGain,paidFollowerGain,organizationalEntity,start,end\n')
writer = csv.writer(f)
writer.writerows(holder)
輸出.csv
organicFollowerGain,paidFollowerGain,organizationalEntity,start,end
2,0,urn:li:organization:28849398,1634169600000,1634256000000
-1,0,urn:li:organization:28849398,1634256000000,1634342400000
-2,0,urn:li:organization:28849398,1634342400000,1634428800000
0,0,urn:li:organization:28849398,1634428800000,1634515200000
uj5u.com熱心網友回復:
對于任何想知道如何做到這一點的人,感謝@Waylan 和@balderman 答案如下-
from pandas.io.json import json_normalize
data = {
"paging": { "start": 0, "count": 10, "links": [] },
"elements": [
{
"followerGains": {
"organicFollowerGain": 2,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634169600000, "end": 1634256000000 }
},
{
"followerGains": {
"organicFollowerGain": -1,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634256000000, "end": 1634342400000 }
},
{
"followerGains": {
"organicFollowerGain": -2,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634342400000, "end": 1634428800000 }
},
{
"followerGains": {
"organicFollowerGain": 0,
"paidFollowerGain": 0
},
"organizationalEntity": "urn:li:organization:28849398",
"timeRange": { "start": 1634428800000, "end": 1634515200000 }
}
]
}
flat_data = json_normalize(data['elements'])
# Rename columns to only use the final section in dot name
flat_data.rename(dict((x, x.split('.')[-1]) for x in flat_data.columns if '.' in x), axis=1, inplace=True)
# Drop start and end columns
flat_data.drop(['start', 'end'], axis=1, inplace=True)
flat_data.to_csv('out.csv', index=False)
希望這可以幫助有人在路上!干杯!
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/372800.html
