我有一個從 HTML 代碼轉換而來的討厭的字串,如下所示:
<p><topic url="car-colours">Toyota Camry</topic> has <a href="/colours/dark-red">Dark Red</a><span> (2020)</span>, <a href="/colours/pearl-white">Pearl White</a><span> (2016 - 2017)
我想從這個字串中提取顏色的名稱并將它們放在一個串列中。我在想也許我提取了 ">" 和 "<" 字符之間的所有子字串,因為所有顏色都包含在其中,但我不知道如何。
我的目標是有一個串列來存盤豐田凱美瑞的所有顏色,例如:
toyota_camry_colours = ["Dark Red", "Pearl White"]
任何想法我怎么能做到這一點?在 bash 中,我會使用 grep 或 awk 之類的東西,但不知道 python。
uj5u.com熱心網友回復:
BeautifulSoup 模塊旨在決議 HTML。
from bs4 import BeautifulSoup
str = """\
<p><topic url="car-colours">Toyota Camry</topic> has <a href="/colours/dark-red">Dark Red</a><span> (2020)</span>, <a href="/colours/pearl-white">Pearl White</a><span> (2016 - 2017)"""
soup = BeautifulSoup(str, 'html.parser')
for link in soup.find_all('a'):
print( link.text )
輸出:
Dark Red
Pearl White
uj5u.com熱心網友回復:
一個簡單的正則運算式會有所幫助 /colours/([\w-] )
import re
txt = '<p><topic url="car-colours">Toyota Camry</topic> has <a href="/colours/dark-red">Dark Red</a><span>' \
' (2020)</span>, <a href="/colours/pearl-white">Pearl White</a><span> (2016 - 2017)'
colors = re.findall(r"/colours/([\w-] )", txt)
print(colors) # ['dark-red', 'pearl-white']
colors = [" ".join(word.capitalize() for word in color.split("-")) for color in colors]
print(colors) # ['Dark Red', 'Pearl White']
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/348816.html
