我有一個帶有 http URL 串列的 csv。如果可以通過 http 訪問 URL,我需要檢查列出的每個 URL。我怎樣才能做到這一點?
uj5u.com熱心網友回復:
您可以使用 python 腳本檢查 URL。
作為輸入,您需要此 csv 結構
name,link
google,https://google.com
bla,https://doesnot.exist.com
將以下 python 代碼復制到一個檔案中:check_url.py
然后執行它:python3 check_url.py
import csv
import urllib.parse
import urllib.request
import socket
# try to resolve the hostname
def hostname_resolves(hostname):
try:
socket.gethostbyname(hostname)
return 1
except socket.error:
return 0
# open file
file = open("links.csv")
csvreader = csv.reader(file)
# extract headers
header = []
header = next(csvreader)
# extract data
rows = []
for row in csvreader:
rows.append(row)
rows
file.close()
# iterate over the links and check if they can be reached and respond with a valid http response code
for row in rows:
# extract url
url = row[1]
print("check url: " url)
# extract host
parsed_url = urllib.parse.urlparse(url)
host = parsed_url.netloc
# try to resolve host over dns
resolvable = hostname_resolves(host)
# if the host could be resolve, try to do a http request
url_reacheable_over_http = 0
if resolvable == 1:
http_status_code = urllib.request.urlopen(url).getcode()
if http_status_code < 500:
url_reacheable_over_http = 1
row.append(url_reacheable_over_http)
# write the result to a new csv file
with open('links_checked_result.csv', 'w', encoding='UTF8') as f:
writer = csv.writer(f)
# write the header
writer.writerow(header)
for row in rows:
# write the data
writer.writerow(row)
輸出應該是links_checked_result.csv包含以下內容的檔案:
name,link
google,https://google.com,1
bla,https://https://doesnot.exist.com,0
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/422553.html
標籤:
上一篇:如何檢查字串c#中重復出現的字符
下一篇:用戶電子郵件和密碼驗證
