我有以下抓取腳本。我需要遍歷許多鏈接,這些鏈接與資料字典中包含的 T_ID 不同。該腳本僅列印第一個 T_ID 的結果。知道如何改進這個回圈,以便列印所有 T_ID 的結果嗎?
import requests
import json
import csv
import sys
from bs4 import BeautifulSoup
data = {'T_ID': [3396750, 3396753, 3396755, 3396757, 3396759]}
base_url = "XXXX"
username = "XXXX"
password = "XXXX"
toget = data
allowed_results = 50
max_results = "maxResults=" str(allowed_results)
tc = "/tcyc?"
result_count = -1
start_index = 0
df = pd.DataFrame(
columns=['id', 'name', 'gId', 'dKey', 'tPlan'])
for eachId in toget['T_ID']:
while result_count != 0:
start_at = "startAt=" str(start_index)
url = url = f'{base_url}{eachId}{tc}&{start_at}&{max_results}'
response = requests.get(url, auth=(username, password))
json_response = json.loads(response.text)
print(json_response)
page_info = json_response["meta"]["pageInfo"]
start_index = page_info["startIndex"] allowed_results
result_count = page_info["resultCount"]
items2 = json_response["data"]
print(items2)
for item in items2:
new_item = {'id': item['id'], **item['fields']}
df = df.append(new_item, ignore_index=True)
print (item["id"])
print (item["project"])
print (item["fields"]["name"])
print (item["fields"]["gId"])
print (item["fields"]["dKey"])
print (item["fields"]["tPlan"])
uj5u.com熱心網友回復:
它沒有停止,它實際上一直貫穿。問題是start_index它迭代第一個之后eachId不再是0. 因此,當它到達下一個 id 時,它正在查看如下內容:
`'XXXX.com/3396753/tcyc?&startAt=123&maxResults=50'`
然后可能回傳 a result_countof 0,這意味著 while 回圈不運行。然后它轉到下一個id,同樣的事情發生了。
移動你的初始result_count = -1和start_index = 0回圈內,在while. 正如您希望為每個“重置”的那樣'T_ID':
import pandas as pd
import requests
import json
import csv
import sys
from bs4 import BeautifulSoup
data = {'T_ID': [3396750, 3396753, 3396755, 3396757, 3396759]}
base_url = "XXXX"
username = "XXXX"
password = "XXXX"
toget = data
allowed_results = 50
max_results = "maxResults=" str(allowed_results)
tc = "/tcyc?"
df = pd.DataFrame(
columns=['id', 'name', 'gId', 'dKey', 'tPlan'])
for eachId in toget['T_ID']:
start_index = 0
result_count = -1
while result_count != 0:
start_at = "startAt=" str(start_index)
url = url = f'{base_url}{eachId}{tc}&{start_at}&{max_results}'
response = requests.get(url, auth=(username, password))
json_response = json.loads(response.text)
print(json_response)
page_info = json_response["meta"]["pageInfo"]
start_index = page_info["startIndex"] allowed_results
result_count = page_info["resultCount"]
items2 = json_response["data"]
print(items2)
for item in items2:
new_item = {'id': item['id'], **item['fields']}
df = df.append(new_item, ignore_index=True)
print (item["id"])
print (item["project"])
print (item["fields"]["name"])
print (item["fields"]["gId"])
print (item["fields"]["dKey"])
print (item["fields"]["tPlan"])
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/447076.html
