我想將影像存盤在 excel 表 CSV 中,但給我這個"data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw=="而不是影像 url
class NewsSpider(scrapy.Spider):
name = "articles"
def start_requests(self):
url = input("Enter the article url: ")
yield scrapy.Request(url, callback=self.parse_dir_contents)
def parse_dir_contents(self, response):
Feature_Image =response.xpath('//*[@id="article-wrapper"]/article/section[2]/div/div/div/img//@src').get()
Feature_Image = response.urljoin(Feature_Image)
yield{
'Publication Date': Published_Date,
'Feature_Image': Feature_Image,
'Article Content': Content
}
# =============== Data Store
Data = [[Category,Headlines,Author,Source,Published_Date,Feature_Image,Content,url]]
try:
df = pd.DataFrame (Data, columns = ['Category','Headlines','Author','Source','Published_Date','Feature_Image','Content','URL'])
print(df)
with open('C:/Users/Public/pagedata.csv', 'a') as f:
df.to_csv(f, header=False)
except:
df = pd.DataFrame (Data, columns = ['Category','Headlines','Author','Source','Published_Date','Feature_Image','Content','URL'])
print(df)
df.to_csv('C:/Users/Public/pagedata.csv', mode='a')
uj5u.com熱心網友回復:
圖片網址是絕對網址。所以不需要再次使用絕對url使用
urljoin()方法,這是不抓取原始影像url的主要原因。您的影像 url 選擇的 xpath 運算式僅選擇單個影像。所以擺脫來自@src 的額外正斜杠
您沒有獲得正確的圖片網址,因為@src 選擇了作為您的輸出的圖片網址,但原始圖片網址的屬性是
@data-src
嘗試:
import scrapy
class NewsSpider(scrapy.Spider):
name = "articles"
def start_requests(self):
#https://skift.com/2022/10/08/american-express-travels-rebound-and-other-top-stories-this-week/
url = input("Enter the article url: ")
yield scrapy.Request(url, callback=self.parse_dir_contents)
def parse_dir_contents(self, response):
Feature_Image =response.xpath('//*[@id="article-wrapper"]/article/section[2]/div/div/div/img/@data-src').get()
yield {
#'Publication Date': Published_Date,
'Feature_Image': Feature_Image,
#'Article Content': Content
}
輸出:
{'Feature_Image': 'https://skift.com/wp-content/uploads/2022/10/American_Express_office_in_Rome-1-e1665181357253-1024x682.jpg'}
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/515571.html
上一篇:用pandas從csv實體化
下一篇:對于打字稿,如何在類中定義介面
