我想從這個 url 中抓取 3 個表https://ens.dk/sites/ens.dk/files/OlieGas/mp202112ofu.htm
表 1:石油產量 表 2:天然氣產量 表 3:水產量
不需要包含圖表只需 3 個表格
我已經撰寫了代碼來抓取鏈接,但不知道如何從鏈接中抓取表格
import io
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs, SoupStrainer
import re
url = "https://ens.dk/en/our-services/oil-and-gas-related-data/monthly-and-yearly-production"
first_page = requests.get(url)
soup = bs(first_page.content)
def pasrse_page(link):
print(link)
df = pd.read_html(link, skiprows=1, headers=1)
return df
def get_gas_links():
glinks=[]
gas_links = soup.find('table').find_all('a')
for i in gas_links:
extracted_link = i['href']
#you can validate the extracted link however you want
glinks.append("https://ens.dk/" extracted_link)
return glinks
get_gas_links()
輸出鏈接
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/424263.html
