class BlogApi(object):
def __init__(self):
json = "https://remaster.realmofthemadgod.com/index.php?rest_route=/wp/v2/posts/"
with urllib.request.urlopen(f"{rotmgjson}") as url:
self.post_json = json.loads(url.read().decode())
async def content(self, thread=0, parse=True):
"""Returns content of blog post as string.
Thread is 0 (latest) by default.
Parse is True by default."""
dirty_content = self.post_json[thread]['content']['rendered']
if not parse:
return dirty_content
else:
soup = BeautifulSoup(dirty_content, features="html.parser")
images = []
for img in soup.findAll('img'):
images.append(img.get('src'))
images = soup.find_all('img', {'src':re.compile('.png')})
return images, soup.text
我正在使用上面的類從 HTML 字串中獲取所有文本和影像 URL。完整的字串看起來像這樣https://controlc.com/c3cdf2ef。
我的問題是,影像 URL 顯然與文本不在同一個字串中。我的目標是讓它們與網頁中的文本位置相同。例如,我回傳的字串應該是這樣的:
https://remaster.realmofthemadgod.com/wp-content/uploads/2022/05/steam_forgerenovation.png
Realmers,
The Forge is about to change. Coming in June the blacksmith will present the Heroes with her renovated forge. She’s equipped it with better and more reliable equipment, capable of making items no one thought could be made that way.
Here’s what’s going to change:
https://remaster.realmofthemadgod.com/wp-content/uploads/2022/04/c5c640b1-a033-4547-aabb-5af37a8ce4c5-1024x616.png ...
它實際上更長的時間有更多的影像。但是,是的。
uj5u.com熱心網友回復:
您可以繼續用文本替換<img src=my_image.png/>元素,例如src
for image in (images := soup.find_all('img', {'src':re.compile('.png')})):
image.replace_with(image.get('src'))
這將使您在呼叫時只留下文本soup.text。不過,這更像是一種“務實的解決方案”,而不是任何花哨的方法,更不用說推薦的方法了。
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/477292.html
上一篇:如何移動字串中的空格?
