我有帶有 2 個標簽、鏈接和文本的 html 檔案,我設法替換了鏈接,但我不知道如何替換標簽內的文本。我真的不知道標簽如何變化,我想了解
我的代碼:
import requests
from bs4 import BeautifulSoup
link = 'http://127.0.0.1:5500/dat.html'
response = requests.get(link).text
with open('parse.html', 'w', encoding= 'utf-8') as file:
file.write(response)
soup = BeautifulSoup(response, 'lxml')
res = response.replace("https://www.google.com/", "https://reddit.com/")
with open("parse.html", "w") as outf:
outf.write(res)
html:
<body>
<h1>
<a href="https://google.com/" target="_blank">google</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>
我需要
<body>
<h1>
<a href="https://https://www.reddit.com//" target="_blank">reddit</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>
uj5u.com熱心網友回復:
您可以找到所有相關<a>標簽并更改它們的屬性/ .string:
from bs4 import BeautifulSoup
html_doc = """
<body>
<h1>
<a href="https://google.com/" target="_blank">google</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">wiki</a>
</h1>
</body>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for a in soup.select('a[href*="google.com"]'):
a["href"] = "https://reddit.com/"
a.string = "reddit"
print(soup.prettify())
印刷:
<body>
<h1>
<a href="https://reddit.com/" target="_blank">
reddit
</a>
</h1>
<h1>
<a href="https://ru.wikipedia.org/wiki/" target="_blank">
wiki
</a>
</h1>
</body>
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/473569.html
上一篇:決議ping的輸出
下一篇:使用Yacc和Lex生成決議樹
