如何在Pandas的列中復制特定行數和位置的值？-有解無憂

我正在json_response使用以下命令轉換為資料框：

df = pd.DataFrame(columns=["created_at", "username", "description", "tweet_id"]) #an empty dataframe to save data

data_nested = pd.json_normalize(json_response['data'])
df_temp = data_nested[["created_at", "username", "description"]].copy()
df = pd.concat([df, df_temp], ignore_index=True)
df.reset_index(inplace=True, drop=True)

以下是我的示例json_response：

{
    "data": [
        {
            "created_at": "2020-01-01T12:24:45.000Z",
            "description": "This is a sample description",
            "id": "12345678",
            "name": "Sample Name",
            "username": "sample_name"
        }
    ],
    "meta": {
        "next_token": "sample_token",
        "result_count": 1
    }
}

此回應是查詢Twitter API V2的“ Retweeted_by ”端點的結果。我正在嘗試通過執行 -> 來針對回圈中的每個回應保存“ tweet_id ”（以了解哪個結果行對應于哪個請求的 tweet_id）df['tweet_id'] = tweet_id。我知道通過使用它，最后一個tweet_id將替換列中的所有其他內容。

我也嘗試使用索引執行以下操作：

idx = df["username"].last_valid_index()
if pd.isnull(idx) or idx is None:
  df.loc[0, "tweet_id"] = tweet_id
else:
  df.loc[idx   1, "tweet_id"] = tweet_id

但這也失敗了，因為如果result_count在 json_response > 1 中，它將保存tweet_id在下一行，而將先前的保留為NaN.

有人可以提出解決方案嗎？謝謝你。

uj5u.com熱心網友回復：

根據我們在評論中的交流，這是我提出的解決方案：

tweet_id_list = [1,2,3] # a list of all of your tweet ids

# here you will start looping through each id, and getting retweets. 
# You could make this async but I would be careful since token limits are very
# strict on twitter. They can disable it if you go over the limit a lot. 

all_dfs=[]
for tweet_id in tweet_id_list:
    response = requests.post("url/tweet_id")
    json_response = json.loads(response.text)

    temp_df = pd.DataFrame.from_records(json_response['data'])
    temp_df['tweet_id'] = tweet_id

    all_dfs.append(temp_df)

# if you want to then have one big table with all the retweets and tweet_ids
# simply do:

df = pd.concat(all_dfs)

只是一點解釋。

您正在為每個 tweet_id 轉推 (temp_df) 創建一個資料框。您還在該資料框中創建了一個額外的列，稱為tweet_id. 當您將值分配給 dataFrame 列時，它會將其傳播到所述 df 的每一行。

然后，您將仔細地將每個資料框的所有資料框收集tweet_id到一個串列中all_dfs。

退出回圈后，您將得到一個資料框串列。如果你想要一個大表，你可以將它們連接起來，就像我在代碼中顯示的那樣。

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/504588.html

標籤：Python json 熊猫数据框

上一篇：Gson：toJson/toJsonTreeminifyEnabled問題

下一篇：如何從.json檔案中獲取資訊