我正在嘗試抓取此頁面:
https://workspace.google.com/marketplace/search/word
我試過 PhantomJS BeautifulSoup(失敗),然后 Playwright 抓取頁面的全部內容,但我看不到擴展的鏈接。只有當游標懸停在它們上面時才會生成它們嗎?有沒有辦法得到它們?
這是我使用的劇作家代碼:
from playwright.sync_api import Playwright, sync_playwright, expect
def run(playwright: Playwright) -> None:
browser = playwright.chromium.launch(headless=True)
context = browser.new_context()
page = context.new_page()
page.goto("https://workspace.google.com/marketplace/search/word")
print(page.content())
# ---------------------
context.close()
browser.close()
with sync_playwright() as playwright:
run(playwright)
uj5u.com熱心網友回復:
你可以試試下一個例子playwright with bs4。
代碼:
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
data = []
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
context = browser.new_context(viewport={"width": 1920, "height": 1080})
page = context.new_page()
page.goto('https://workspace.google.com/marketplace/search/word')
page.wait_for_timeout(4000)
soup = BeautifulSoup(page.content(), 'lxml')
for card in soup.select('a.RwHvCd'):
link = 'https://workspace.google.com' card.get('href').replace('./', '/')
print(link)
輸出:
https://workspace.google.com/marketplace/app/word_cloud_generator/360115564222
https://workspace.google.com/marketplace/app/pdf_to_word_doc_converter/363901784508
https://workspace.google.com/marketplace/app/word_cloud_generator/1066049374643
https://workspace.google.com/marketplace/app/word_cloud_for_docs/58662699010
https://workspace.google.com/marketplace/app/drive_word_cloud/401630517929
https://workspace.google.com/marketplace/app/word_search_game_with_google_drive/766902391959
https://workspace.google.com/marketplace/app/word_cloud_generator/251884431535
https://workspace.google.com/marketplace/app/word_counter_max_for_google_docs/364683295233
https://workspace.google.com/marketplace/app/online_word_cloud/275091946896
https://workspace.google.com/marketplace/app/bjorns_word_clouds/423122543905
https://workspace.google.com/marketplace/app/glue_word_finder/295291845080
https://workspace.google.com/marketplace/app/bjorns_3d_word_clouds/845381453179
https://workspace.google.com/marketplace/app/bjorns_word_clouds/775726653147
https://workspace.google.com/marketplace/app/bjorns_wordzones/635623875397
https://workspace.google.com/marketplace/app/dochub_pdf_sign_and_edit/1179802238
https://workspace.google.com/marketplace/app/doctopus/979668934766
https://workspace.google.com/marketplace/app/yet_another_mail_merge_mail_merge_for_gm/52669349336
https://workspace.google.com/marketplace/app/form_publisher/827172627657
https://workspace.google.com/marketplace/app/tex_equation_editor/197218123452
https://workspace.google.com/marketplace/app/clozeit/679357385347
https://workspace.google.com/marketplace/app/spellright/225027406829
https://workspace.google.com/marketplace/app/vocabularycom/903174061747
https://workspace.google.com/marketplace/app/mail_merge/218858140171
https://workspace.google.com/marketplace/app/create_print_labels_label_maker_for_aver/585829216542
https://workspace.google.com/marketplace/app/mailmeteor_mail_merge_for_gmail/1008170693301
https://workspace.google.com/marketplace/app/adobe_acrobat_
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/530367.html
