我想廢棄https://ens.dk/en/our-services/oil-and-gas-related-data/monthly-and-yearly-production這個網站。有 2 組鏈接SI units 和Oil Field units
我試圖廢棄鏈接串列SI units并創建名為get_gas_links
import io
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs, SoupStrainer
import re
url = "https://ens.dk/en/our-services/oil-and-gas-related-data/monthly-and-yearly-production"
first_page = requests.get(url)
soup = bs(first_page.content)
def pasrse_page(link):
print(link)
df = pd.read_html(link, skiprows=1, headers=1)
return df
def get_gas_links():
glinks=[]
gas_links = soup.find_all("a", href = re.compile("si.htm"))
for i in gas_links:
glinks.append("https://ens.dk/" i.get("herf"))
return glinks
get_gas_links()
scrap 3 tables from every link然而,在報廢表之前,我試圖報廢的主要動機list of links
但它顯示錯誤:TypeError: must be str, not NoneType
error_image
uj5u.com熱心網友回復:
您以錯誤的方式使用了錯誤的正則運算式。這就是為什么湯找不到任何符合條件的鏈接。您可以檢查以下來源并根據需要驗證extracted_link。
def get_gas_links():
glinks=[]
gas_links = soup.find('table').find_all('a')
for i in gas_links:
extracted_link = i['href']
#you can validate the extracted link however you want
glinks.append("https://ens.dk/" extracted_link)
return glinks
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/426950.html
上一篇:易趣類更改問題
