pythonbeautifulsoup如何獲取json格式的資料？-有解無憂

我想以 json 格式獲取資料。現在我正在獲取資料作為字典，這對我來說有點亂。這是我的代碼：

my_dict = {"job_title":[],"time_posted":[],"number_of_proposal":[],"page_link":[]};
for page_num in range(1, 12):
    time.sleep(3)
    url = (
        f'my_url').format(page_num)
    print(url)
    headers = requests.utils.default_headers()
    print(headers)
    headers.update(
        {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0', })
    print(headers)
    r = requests.get(url, headers=headers).text
    soup = BeautifulSoup(r, 'lxml')

    box = soup.select('.item__top_container?ListItem?3pRrO')
    for i in box:
        job_title = i.select('.item__title?ListItem?2FRMT')[0].text.lower()
        job_title = job_title.replace('opportunity', ' opportunity').replace(
            'urgent', ' urgent').strip()
        print(job_title)
        time_posted = i.select('time')[0].text.lower()
        remove_month_year = ["month", "year"]
        print(time_posted)
        proposal = i.select(
            '.item__info?ListItem?1ci50 li:nth-child(3)')[0].text.replace('Proposals', '').strip()
        keywords = ['scrap', 'data mining']
        if(any(key_words in job_title for key_words in keywords)):
            if(not any(remove_m_y in time_posted for remove_m_y in remove_month_year)):
                   my_dict["job_title"].append(job_title)
                   my_dict["time_posted"].append(time_posted)
                   my_dict["number_of_proposal"].append(proposal)
                   my_dict["page_link"].append(url)

我的字典資料如下所示：

{'job_title': ['web scraping of product reviews', 'yell web scraping in python', 'google business scraping',],'time_posted': ['6 days ago', '9 days ago', '3 days ago'], 'page_link': ['url1','url2','url3']}

我的預期結果將如下所示：

{"job_title":"web scraping of product reviews","time_posted":"6 days ago","page_link":"url1"},{"job_title":"yell web scraping in python","time_posted":"9 days ago","page_link":"url2"}

uj5u.com熱心網友回復：

您可以使用以下代碼更改結構：

my_list = []

for i in range(len(my_dict["job_title"])):
    my_list.append({
        "job_title": my_dict["job_title"][i],
        "time_posted": my_dict["time_posted"][i],
        "number_of_proposal": my_dict["number_of_proposal"][i],
        "page_link": my_dict["page_link"][i]
    })

更好的是直接在第一個回圈中創建串列，就像你最終需要它一樣。

my_list = []
for i in box:
    job_title = i.select('.item__title?ListItem?2FRMT')[0].text.lower()
    job_title = job_title.replace('opportunity', ' opportunity').replace(
        'urgent', ' urgent').strip()
    print(job_title)
    time_posted = i.select('time')[0].text.lower()
    remove_month_year = ["month", "year"]
    print(time_posted)
    proposal = i.select(
        '.item__info?ListItem?1ci50 li:nth-child(3)')[0].text.replace('Proposals', '').strip()
    keywords = ['scrap', 'data mining']
    if(any(key_words in job_title for key_words in keywords)):
        if(not any(remove_m_y in time_posted for remove_m_y in remove_month_year)):
            my_list.append({
                "job_title": job_title,
                "time_posted": time_posted,
                "number_of_proposal": number_of_proposal,
                "page_link": page_link
            })

uj5u.com熱心網友回復：

我認為你定義你的資料結構是錯誤的。根據您的預期結果，我了解您想要： {"job_title": "title 1", "time_posted":"6 days ago" ... }, {"job_title": "title2"...}

所以，一個字典串列。現在你有了一個包含串列型別值的字典。

你有兩個選擇：

1.- 處理您的字典以獲取您想要的結構

final_list = []

for _ in range(len(my_dict["job_title"])):
    item_dict = {}
    for key in my_dict:
        item_dict[key] = my_dict[key].pop(0)
    final_list.append(item_dict)

print(final_list) 
# [{'job_title': 'web scraping of product reviews', 'time_posted': '6 days ago', 'page_link': 'url1'}, {'job_title': 'yell web scraping in python', 'time_posted': '9 days ago', 'page_link': 'url2'}, {'job_title': 'google business scraping', 'time_posted': '3 days ago', 'page_link': 'url3'}]

2.-與用戶jugi提到的相同，這是最好的選擇。他在我寫這篇文章時已經回答了，所以我還是會發布這個，因為我的選項 1 略有不同。

uj5u.com熱心網友回復：

您可以使用理解為每個條目創建一個字典：

# Just using x because it's shorter. This does not create a copy
x = my_dict
x = [{'job_title': x['job_title'][i], 'time_posted': x['time_posted'][i],
      'page_link': x['page_link'][i]} for i in range(len(x['page_link']))]

>>> x
[{'job_title': 'web scraping of product reviews',
  'page_link': 'url1',
  'time_posted': '6 days ago'},
 {'job_title': 'yell web scraping in python',
  'page_link': 'url2',
  'time_posted': '9 days ago'},
 {'job_title': 'google business scraping',
  'page_link': 'url3',
  'time_posted': '3 days ago'}]

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/472716.html

標籤：Python json python-3.x 列表美丽的汤

上一篇：使用條件串列理解獲取索引

下一篇：使用字串將字典轉換為串列字典