我有一個 url,我必須使用gem抓取所有影像mechanize,但有些影像 url 在rel=icon.
我必須從這個 url 獲取影像:
<link rel="icon" href="https://mywebsite.com/wp-content/uploads/2021/10/cropped-favicon-32x32.png" sizes="32x32">
這是我試過的代碼,它只抓取影像。如何讓兩者合而為一。
require 'mechanize'
url = "https://mywebsite.com/"
agent = Mechanize.new
page = agent.get(url)
page.images.each do |image|
puts image #getting here all images here from image tag
end
uj5u.com熱心網友回復:
我查看了機械化頁面鏈接,但它只回傳anchors.
試過 xpath
page.xpath('//link[contains(@rel, "icon")]').each do |icon|
p icon.attr('href')
end
并收到:
"https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-32x32.png"
"https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-192x192.png"
"https://ownwebsite.com/wp-content/uploads/2021/10/cropped-favicon-180x180.png"
這是一個回傳所有影像的Replit。
uj5u.com熱心網友回復:
page.search('link').each do |link|
if link['href'].to_s.include?(".gif") or link['href'].to_s.include?(".png") or link['href'].to_s.include?(".jpg") or link['href'].to_s.include?(".jpeg")
puts link['href']
end
end
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/364006.html
標籤:红宝石轨道 红宝石 网页抓取 ruby-on-rails-5 机械化
上一篇:Rails查詢-加入非主列
