洗掉.compython之后的所有內容-有解無憂

我在站點 urls.tmp 檔案中得到了這個檔案，其中包含 3 個 url：

https://site1.com.br/wp-content/uploads/2020/06/?SD
https://site2.com.br/wp-content/uploads/tp-datademo/home-4/data/tp-hotel-booking/?SD
https://site3.com.br/wp-content/uploads/revslider/hotel-home/?MD

我想洗掉每個“com.br/”之后的所有內容。

我試過這段代碼：

# open the file
sys.stdout = open("urls.tmp", "w")

# start remove
for i in "urls.tmp":
    url_parts = urllib.parse.urlparse(i)
    result = '{uri.scheme}://{uri.netloc}/'.format(uri=url_parts)
    print(result) #overwrite the file

# close the file
sys.stdout.close()

但是輸出給了我這個奇怪的東西：

:///
:///
:///
:///
:///
:///
:///
:///

我是初學者，我做錯了什么？

uj5u.com熱心網友回復：

您正在迭代"urls.tmp"字串本身，但想要逐行瀏覽打開的檔案物件。

所以試試這個：

with open("urls.tmp", "r") as urls_file:
    for line in urls_file:
        url_parts = urllib.parse.urlparse(line)
        result = "{uri.scheme}://{uri.netloc}/".format(uri=url_parts)
        print(result)

編輯：作者更新了原始問題，提到源檔案內容應該用處理后的 url 重寫，這是示例：

new_urls = []

with open("urls.tmp", "r") as urls_file:
    old_urls = urls_file.readlines()

for line in old_urls:
    url_parts = urllib.parse.urlparse(line)
    proc_url = "{uri.scheme}://{uri.netloc}/\n".format(uri=url_parts)
    new_urls.append(proc_url)

with open("urls.tmp", "w") as urls_file:
    urls_file.writelines(new_urls)

uj5u.com熱心網友回復：

請參閱Savva Surenkov回答以解決您的問題。

您可以使用字串的拆分方法，例如：

url = r"https://site1.com.br/wp-content/uploads/2020/06/?SD"

split_by = "com.br/"

new_url = url.split(split_by)[0]   split_by
# this gives you the part before <split_by> and then we can attach it again
new_url == r"https://site1.com.br"

如果您想添加一些額外的檢查，您可以查看正則運算式。

您沒有要求但可能會幫助您作為初學者的東西。我建議使用

with open("urls.tmp", "w") as f:
   # do something with f

或者

import pathlib

urls = pathlib.Path("urls.tmp").read_text()
# which gives you all lines in single string

平淡無奇open。如果您想了解更多資訊，我建議您查看背景關系管理器。

f-strings從 Python 3.6 開始，我認為它比"{}".format.

uj5u.com熱心網友回復：

您可以繼續使用字串的 find() 方法。

urllist=[
'https://site1.com.br/wp-content/uploads/2020/06/?SD',
'https://site2.com.br/wp-content/uploads/tp-datademo/home-4/data/tp-hotel-booking/?SD',
'https://site3.com.br/wp-content/uploads/revslider/hotel-home/?MD']

newlist=[]
breaktext='com.br/'
for item in urllist:
    position=item.find(breaktext)
    newlist.append(item[:position len(breaktext)])

print (newlist)

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/424448.html

標籤：Python 循环网址

上一篇：如何為流或Flux替換while回圈以迭代我的元素

下一篇：遍歷行并確定哪些列為真，為新列分配column.header的名稱