我正在嘗試json_response使用該函式將包含 Twitter 資料的串列中的資料附加到 CSV 檔案中append_to_csv。
我了解json_response. 它包含關注兩位政治家的用戶的資料;分別有 5 個和 13 個用戶。1) author_id, created_at,tweet_id并且text在data. 2) description/bio在['includes']['users']. 3) url/image_url在['includes']['media']. 但是我的嵌套回圈不會將任何資料附加到 sample_data.csv?它不會引發任何錯誤。跟我的身份有關系嗎?
print(json.dumps(json_response, indent=4, sort_keys=True)) # look at json_response object.
[
{
"data": [
{
"author_id": "2877379617",
"created_at": "2021-03-25T12:11:14.000Z",
"id": "1375057688355336195",
"text": "@prettynobodyco She blocked me in 2015 - for pointing out that Tim Kaine enables sexual assault in the military and the evidence was his killing of the MJIA and publicly stated that Military commanders should remain in charge of military rape cases. She's Tanden level awful. Congrats!"
},
{
"author_id": "1265018154444562440",
"created_at": "2021-03-22T19:48:59.000Z",
"id": "1374085719472361474",
"text": "@MehcatCat @AlasscanIsBack @PattyArquette @timkaine Funny, they blocked me. \ud83e\udd23\ud83e\udd23"
},
{
"author_id": "2378324935",
"created_at": "2021-03-07T21:32:13.000Z",
"id": "1368675879312887810",
"text": "@DrWinarick @KatieOGrady4 I apologize for any drama. Katie O Grady blocked me because we had a disagreement about Tim Kaine on one of your older posts. I guess I can't please everyone haha. :/"
},
{
"author_id": "821870502943817729",
"created_at": "2021-02-12T23:53:59.000Z",
"id": "1360376637385244673",
"text": "She blocked me a long ass time ago when I asked her why we shoulf care about Tim Kaine's personal view on abortion if it didn't impact legislation"
},
{
"attachments": {
"media_keys": [
"16_1341045032732770306"
]
},
"author_id": "17232340",
"created_at": "2020-12-21T15:37:07.000Z",
"id": "1341045038420275205",
"text": "@DSingh4Biden @moomintroll8 @timkaine @GovernorVA That's why I replied to you. She blocked me previously, for what silliness I can't remember. Tough being a troll AND a snowflake!"
}
],
"includes": {
"media": [
{
"media_key": "16_1341045032732770306",
"type": "animated_gif"
}
],
"users": [
{
"created_at": "2014-11-15T02:23:57.000Z",
"description": "",
"id": "2877379617",
"name": "Laura Saylor",
"username": "lauraleesaylor"
},
{
"created_at": "2020-05-25T20:33:36.000Z",
"description": "Weird Writer & Lunatic Linguist\nWicked Witch of the East\nshe/her",
"id": "1265018154444562440",
"name": "Zauberkind",
"username": "Zauberkind2"
},
{
"created_at": "2014-03-08T07:22:31.000Z",
"description": "#Resist, #BLM, #Vaxxed, liberal, autistic, kidney transplant survivor, political nerd, mental health advocate, fighter for equality, truth, justice, etc.",
"id": "2378324935",
"name": "Trevor \"Trev\" McKee Achilles",
"username": "MrTAchilles"
},
{
"created_at": "2017-01-19T00:02:52.000Z",
"description": "statist / Progressive Gun Nut/ Single and hating it\n\n / \n\nstraight????? /\n\npronouns / brain worm survivor\n\n",
"id": "821870502943817729",
"name": "Puppet Enthusiast",
"username": "nihilisticpillo"
},
{
"created_at": "2008-11-07T15:09:46.000Z",
"description": "Liberal-Veteran-Dog Lover | Taste for irony, but in moderation | Humor is reason gone mad. ~Groucho Marx | I follow & unfollow back #VeteransResist #Resist",
"id": "17232340",
"name": "anti-Fascist Jim",
"username": "JimnBL"
}
]
},
"meta": {
"newest_id": "1375057688355336195",
"next_token": "b26v89c19zqg8o3fos5vyedr54ngvtx3nuqvnx6pglrb1",
"oldest_id": "1341045038420275205",
"result_count": 5
}
},
{
"data": [
{
"author_id": "737885223858384896",
"created_at": "2021-03-26T21:56:02.000Z",
"id": "1375567243082338314",
"text": "@hogan_1969 @LindseyGrahamSC LOL She Blocked me.. could not admit the truth could she now. okay so where is her source for the shirts? and that is what he said. I (quote) We immediately surge the border all those seeking asylum. What about his lie about the cages? no Answer lol."
},
{
"author_id": "847612931487416323",
"created_at": "2021-03-26T21:55:24.000Z",
"id": "1375567083791073283",
"text": "@hogan_1969 @TeichTerry @thehill @LindseyGrahamSC @hogan_1969 just blocked me for showing her the actual numbers \ud83e\udd23\n\n#LiberalsHateFacts"
},
{
"author_id": "18634205",
"created_at": "2021-03-08T12:29:00.000Z",
"id": "1368901564363051010",
"text": "Huh. Made me think if @LeaderMcConnell @LindseyGrahamSC @marcorubio @SenTedCruz feel trapped under the thumb of Trumpy. And who else? @IvankaTrump? @MELANIATRUMP ? @DonaldJTrumpJr ? I\u2019d say Eric, but he blocked me."
},
{
"author_id": "27327319",
"created_at": "2021-03-02T11:53:16.000Z",
"id": "1366718245521211393",
"text": "@fedupinNHtoo @LindseyGrahamSC Exactly. I asked that question of a Republican on Facebook last night and she blocked me"
},
{
"author_id": "917634626247647232",
"created_at": "2021-02-28T18:16:45.000Z",
"id": "1366089974907432961",
"text": "@gop this is for you! @tedcruz @LindseyGrahamSC @MittRomney @mikepompeo\n#BitchyMcC blocked me!\ud83d\udc4d\nWatch \"Jack Off Jill - Hypocrite lyrics\" on YouTube"
},
{
"author_id": "1231059979844456448",
"created_at": "2021-02-26T04:25:49.000Z",
"id": "1365156089554067459",
"text": "@KelleyALynch1 @marwilliamson @therecount @LindseyGrahamSC She's fine with that just as she's fine with Biden's Nazis in Ukraine. She wants war with Russia, too. She blocked me for this tweet because she couldn't even condemn Biden's Nazis in Ukraine. She's a fauxgressive warmonger, a wolf in sheep's clothing. \n"
},
{
"author_id": "1315477593303310336",
"created_at": "2021-02-23T00:00:41.000Z",
"id": "1364002202843451399",
"text": "@MistyKitty3 @BlairMurray83 @FrankAmari2 @LindseyGrahamSC \ud83e\udd23 Someone didn\u2019t like what I said and blocked me."
},
{
"author_id": "1069115263671562240",
"created_at": "2021-02-22T04:36:06.000Z",
"id": "1363709124891070467",
"text": "@trinkity88 @LindseyGrahamSC Apparently, @Trinkitty88 blocked me because FACTS are TOO HARD to handle!\ud83e\udd23\ud83e\udd23\ud83e\udd23\ud83e\udd23\ud83e\udd23\ud83e\udd23"
},
{
"author_id": "1303321972227690496",
"created_at": "2021-02-20T19:38:49.000Z",
"id": "1363211526316969985",
"text": "@horsin64 @GovMurphy @LindseyGrahamSC You blocked me because you\u2019re a nifkin. It\u2019s not cyber tough you Nancy I\u2019d say it to your face. American lives matter before anyone else. America first and you don\u2019t like it because you have trump derangement. You\u2019re a psycho"
},
{
"author_id": "27943005",
"created_at": "2021-02-19T20:00:38.000Z",
"id": "1362854626924650497",
"text": "@TonyRom31334975 @staceyabrams @AnnaForFlorida @LindseyGrahamSC The guy blocked me on Twitter and had to unblock me after the Knight First Amendment Institute sued him and won> I am certain It won't talk to me, but imagine..hehe?!"
},
{
"attachments": {
"media_keys": [
"3_1361344652264280068"
]
},
"author_id": "1126249378279297027",
"created_at": "2021-02-15T16:00:32.000Z",
"id": "1361344654395011079",
"text": "@Jamie1074 @Breaking911 You know what\n\nIt's funny that they blocked me because I actually did agree with them on Lindsey Graham...\n\nCome on, man !"
},
{
"author_id": "1207432044390699008",
"created_at": "2021-02-14T07:58:21.000Z",
"id": "1360860918687559681",
"text": "@LindseyGrahamSC I really don't know why you haven't blocked me yet. Pile of human shit. I just read a letter that John McCain wrote me and for some reason it made me think about you and what he would think about your behavior. I guarantee you'd be in for an ass whippin'. Dick."
},
{
"author_id": "926909484",
"created_at": "2021-02-13T20:53:03.000Z",
"id": "1360693490880032770",
"text": "@LadyReverbs @themariefonseca @styvanswift @LindseyGrahamSC Lady, you might be able to see Marie\u2019s tweets. She blocked me. She may call this a victory for Trump. The reality is that seven members of the @GOP voted to convict. They are the true patriots of the Republican Party."
}
],
"includes": {
"media": [
{
"media_key": "3_1361344652264280068",
"type": "photo",
"url": ""
}
],
"users": [
{
"created_at": "2016-06-01T05:55:21.000Z",
"description": "Biden Inflation the worst in 30 years. His Handlers trying to Rebrand Brandon is Hilarious.",
"id": "737885223858384896",
"name": "Biden is a complete mess and you know it.",
"username": "zelda3024"
},
{
"created_at": "2017-03-31T00:54:05.000Z",
"description": "Love God, Love Family, Love Country, Love Freedom - if we put those things first everything else will be great. MAGA",
"id": "847612931487416323",
"name": "Joey Bagadonuts",
"username": "AmericanGr8ness"
},
{
"created_at": "2009-01-05T15:25:55.000Z",
"description": "small & local garlic farmer; independent American; old surfer dude; working to find and speak truth to power; \ud83c\uddfa\ud83c\uddf8; mahalo and Maluhia",
"id": "18634205",
"name": "MacGregorGarlic",
"username": "MacGregorGarlic"
},
{
"created_at": "2009-03-28T22:53:28.000Z",
"description": "Let's Go Darwin!",
"id": "27327319",
"name": "Karen Kennedy",
"username": "KayKay68"
},
{
"created_at": "2017-10-10T06:15:18.000Z",
"description": "Mom\ud83d\udc95Cannactivist\ud83c\udf3fSecularHumanist\ud83c\udf10 BLM\u270a\ud83c\udfff\ud83c\udf08Ally\ud83e\udd8bCPTSD\u2695\ufe0f FTD\ud83e\udd14MeToo\ud83c\udf38ProChoice\ud83d\udc93CRPS\ud83d\ude23ClimateChange\ud83c\udf0e DACA\ud83c\uddfa\ud83c\uddf2AdoptDontShop\ud83d\udc3e#Steelers \ud83d\udda4\ud83d\udc9b #Vaxxed2TheMax\u270a\ud83d\udc9a",
"id": "917634626247647232",
"name": "Raven The Hemptress #LegalizeGlobally\ud83d\udc9a\ud83c\udf3f\u267f",
"username": "Kraven_Raven24"
},
{
"created_at": "2020-02-22T03:35:56.000Z",
"description": "Monetarism is the underlying cause of our disease; human progress and peace through development is the cure. Eurasian integration will benefit all of humanity!",
"id": "1231059979844456448",
"name": "\ud83c\udd70pocalypsis \ud83c\udd70pocalypseos \u2014 BRI Is The Future",
"username": "apocalypseos"
},
{
"created_at": "2020-10-12T02:21:21.000Z",
"description": "Father of two beautiful boys. Believer in the Constitution of the United States. Protector of my own rights. #Meatatarian",
"id": "1315477593303310336",
"name": "\ud83e\udd85 Steven Duggin \u2665\ufe0f \ud83c\uddfa\ud83c\uddf8\ud83d\uddfd",
"username": "itsStevenDuggin"
},
{
"created_at": "2018-12-02T06:25:16.000Z",
"description": "",
"id": "1069115263671562240",
"name": "Barhag",
"username": "TheBarhag"
},
{
"created_at": "2020-09-08T13:19:17.000Z",
"description": "Not the liberals cup of tea",
"id": "1303321972227690496",
"name": "Christy",
"username": "Christy54177764"
},
{
"created_at": "2009-03-31T19:34:24.000Z",
"description": "NY-grown, FL-tanned, scribe, word nerd, TV junkie, game show champ, yenta, wife, twin mama, hot sauce collector, Bloody Mary maven &, says @NYPost, savvy gadfly",
"id": "27943005",
"name": "Lesley Abravanel",
"username": "lesleyabravanel"
},
{
"created_at": "2019-05-08T22:15:51.000Z",
"description": "\u2600\ufe0f I post Yuuko Aioi pictures daily \u2600\ufe0f\n\nI also like being wholesome, making new friends, posting about games, my everyday life, cats, NASCAR, good vibes, fumos!",
"id": "1126249378279297027",
"name": "Vaxen #DailyYuuko \u2603\ufe0f",
"username": "YuukoEnjoyer"
},
{
"created_at": "2019-12-18T22:47:10.000Z",
"description": "The Republican party is bad for America. The Conservatives are Trump bootlickers who are afraid to stand up to him. This great nation is in serious trouble.",
"id": "1207432044390699008",
"name": "Angry Patriot",
"username": "AngryPatriot20"
},
{
"created_at": "2012-11-05T05:19:37.000Z",
"description": "Employment lawyer. Represent employers and employees. 30 years ago, my mentor told me to seek the truth as a lawyer. Still do that. Tweets are not legal advice.",
"id": "926909484",
"name": "Alfred Southerland",
"username": "TexasEEOLaw"
}
]
},
"meta": {
"newest_id": "1375567243082338314",
"next_token": "b26v89c19zqg8o3fosnr8q7zstmzppg3jgd1cvynkb919",
"oldest_id": "1360693490880032770",
"result_count": 13
}
}
]
# Create file
csvFile = open("sample_data.csv", "a", newline="", encoding='utf-8')
csvWriter = csv.writer(csvFile)
# Create headers for the data I want to save. I only want to save these columns in my dataset
csvWriter.writerow(
["author_id", "created_at", "tweet_id", "text", "bio", "image_url"])
csvFile.close()
def append_to_csv(json_response, csvFile):
# counter variable
global author_id, created_at, tweet_id, text, bio, image_url
# open CSV file
csvFile = open(csvFile, "a", newline="", encoding='utf-8')
csvWriter = csv.writer(csvFile)
# loop through each tweet
for each_dict in json_response:
# loop 1. author ID, time created, tweet ID tweet text
for tweet in each_dict['data']:
# 1. Author ID
author_id = tweet['author_id']
# 2. Time created
created_at = dateutil.parser.parse(tweet['created_at'])
# 3. Tweet ID
tweet_id = tweet['id']
# 4. Tweet text
text = tweet['text']
# loop 2. description/bio loop
for dic in each_dict['includes']['users']:
# 5. description
if 'description' in dic:
bio = dic['description']
else:
bio = " "
# loop 3. image_url/url loop
for element in each_dict['includes']['media']:
# 6. image url
if 'url' in element:
image_url = element['url']
else:
image_url = " "
# assemble all data in a list
res = [author_id, created_at, tweet_id, text, bio, image_url]
csvWriter.writerow(res)
# close CSV file
csvFile.close()
append_to_csv(json_response, "sample_data.csv")
可以看出df只包含預定義的列名。
# import sample_data.csv as df
df = pd.read_csv(r'path...\sample_data.csv')
print(df)
Empty DataFrame
Columns: [author_id, created_at, tweet_id, text, bio, image_url]
Index: []
編輯:更改了# 3 loop和中的縮進csvFile.close()。
def append_to_csv(json_response, csvFile):
# counter variable
global author_id, created_at, tweet_id, text, bio, image_url
# open CSV file
csvFile = open(csvFile, "a", newline="", encoding='utf-8')
csvWriter = csv.writer(csvFile)
# loop through each tweet
for each_dict in json_response:
# loop 1. author ID, time created, tweet ID tweet text
for tweet in each_dict['data']:
# 1. Author ID
author_id = tweet['author_id']
# 2. Time created
created_at = dateutil.parser.parse(tweet['created_at'])
# 3. Tweet ID
tweet_id = tweet['id']
# 4. Tweet text
text = tweet['text']
# loop 2. description/bio loop
for dic in each_dict['includes']['users']:
# 5. description
if 'description' in dic:
bio = dic['description']
else:
bio = " "
# loop 3. image_url/url loop
for element in each_dict['includes']['media']:
# 6. image url
if 'url' in element:
image_url = element['url']
else:
image_url = " "
# assemble all data in a list
res = [author_id, created_at, tweet_id, text, bio, image_url]
csvWriter.writerow(res)
# close CSV file
csvFile.close()
現在的問題是,append_to_csv對于跟隨第一個政治家的 5 個用戶附加相同的推文 5 次,為跟隨第二個政治家的 13 個用戶附加 13 次相同的推文,結果是df194 行而不是 18 行。
uj5u.com熱心網友回復:
中有兩個 each_dict物件json_response。他們分別有 5 條和 13條推文( each_dict['data'])。此外,還有5組13的元素中each_dict['includes']['users']分別。
您有 194 個元素,因為在第一次迭代中for each_dict in json_response:您保存資料 5x5=25 次(回圈 2對回圈 1 中的每條推文執行 5 次)。而在第二次迭代中,您保存資料 13x13=169 次(回圈 2對回圈 1 中的每條推文執行 13 次)。
您應該將資料附加到回圈 2之外的 csv 中。那是,
for each_dict in json_response:
for tweet in each_dict['data']:
# ...
for dic in each_dict['includes']['users']:
# ...
res = [author_id, created_at, tweet_id, text, bio, image_url]
csvWriter.writerow(res)
此外,我建議使用 pandas 資料框來存盤您需要的資訊并保存到 csv。它使代碼更具可讀性,您不必擔心打開緩沖區。請參閱下面的建議,包括重命名:
import pandas as pd
df = pd.DataFrame()
for each_dict in json_response:
for tweet in each_dict['data']:
row = {}
row["author_id"] = tweet['author_id']
row["created_at"] = dateutil.parser.parse(tweet['created_at'])
row["tweet_id"] = tweet['id']
row["text"] = tweet['text']
for user in each_dict['includes']['users']:
if user["id"] == row["author_id"]:
row["bio"] = user['description']#.encode('utf-16','surrogatepass').decode('utf-16') # uncomment this if you get UnicodeError
for media in each_dict['includes']['media']:
row['image_url'] = media.get('url', ' ')
df = df.append(row, ignore_index=True)
# Note, since the dataframe is initially empty with no columns, appending a dictionary (i.e, row) will automatically generate the header based on the dictionary's keys.
df.to_csv('path/to/file.csv')
輸出
tweet_id author_id created_at ...
0 1375057688355336195 2877379617 2021-03-25T12:11:14.000Z ...
1 1374085719472361474 1265018154444562440 2021-03-22T19:48:59.000Z ...
...
17 1360693490880032770 926909484 2021-02-13T20:53:03.000Z ...
uj5u.com熱心網友回復:
看起來if 'description' in dic:永遠不會執行else 分支。如果您的代碼縮進正確,那么該csvWriter.writerow部分也因此永遠不會執行。
這會導致沒有內容寫入您的檔案。
對代碼風格的評論:
- 使用
with open(file) as file_variable:而不是手動使用打開和關閉。這可以為您節省一些麻煩,例如,當 else 分支確實會被執行并且檔案會被多次關閉時,您會遇到的麻煩 :)
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/408643.html
標籤:
上一篇:for回圈確定區間中前10%的值
