具有開放功能的生成器理解-有解無憂

我試圖找出在逐行決議檔案時使用生成器的最佳方式。哪個使用生成器理解會更好。

第一個選項。

with open('some_file') as file:
    lines = (line for line in file)

第二種選擇。

lines = (line for line in open('some_file'))

我知道它會產生相同的結果，但哪一個會更快/更有效？

uj5u.com熱心網友回復：

您不能組合生成器和背景關系管理器（with陳述句）。

發電機是懶惰的。他們不會真正讀取他們的源資料，直到有東西從他們那里請求一個專案。

這似乎有效：

with open('some_file') as file:
    lines = (line for line in file)

但是當你真正嘗試在程式中稍后閱讀一行時

for line in lines:
    print(line)

它會失敗 ValueError: I/O operation on closed file.

這是因為背景關系管理器已經關閉了檔案——這是它生命中的唯一目的——并且在for回圈開始實際請求資料之前，生成器還沒有開始讀取它。

你的第二個建議

lines = (line for line in open('some_file'))

遭受相反的問題。你open()是檔案，但除非你手動 close()它（你不能因為你不知道檔案句柄），它會永遠保持打開狀態。這正是背景關系管理器解決的情況。

總的來說，如果你想讀取檔案，你可以...讀取檔案：

with open('some_file') as file:
    lines = list(file)

或者您可以使用真正的生成器：

def lazy_reader(*args, **kwargs):
    with open(*args, **kwargs) as file:
        yield from file

然后你可以做

for line in lazy_reader('some_file', encoding="utf8"):
    print(line)

并將lazy_reader()在讀取最后一行時關閉檔案。

uj5u.com熱心網友回復：

如果你想測驗這樣的東西，我建議查看timeit模塊。

讓我們為您的兩個測驗設定一個作業版本，我將添加一些額外的選項，這些選項都具有相同的性能。

這里有幾個選項：

def test1(file_path):
    with open(file_path, "r", encoding="utf-8") as file_in:
        return [line for line in file_in]

def test2(file_path):
    return [line for line in open(file_path, "r", encoding="utf-8")]

def test3(file_path):
    with open(file_path, "r", encoding="utf-8") as file_in:
        return file_in.readlines()

def test4(file_path):
    with open(file_path, "r", encoding="utf-8") as file_in:
        return list(file_in)

def test5(file_path):
    with open(file_path, "r", encoding="utf-8") as file_in:
        yield from file_in

讓我們用一個文本檔案來測驗它們，該檔案是莎士比亞全集的 10 倍，我碰巧有這樣的測驗。

如果我做：

print(test1('shakespeare2.txt') == test2('shakespeare2.txt'))
print(test1('shakespeare2.txt') == test3('shakespeare2.txt'))
print(test1('shakespeare2.txt') == test4('shakespeare2.txt'))
print(test1('shakespeare2.txt') == list(test5('shakespeare2.txt')))

我看到所有測驗都產生相同的結果。

現在讓我們計時：

import timeit

setup = '''
file_path = "shakespeare2.txt"

def test1(file_path):
    with open(file_path, "r", encoding="utf-8") as file_in:
        return [line for line in file_in]

def test2(file_path):
    return [line for line in open(file_path, "r", encoding="utf-8")]

def test3(file_path):
    with open(file_path, "r", encoding="utf-8") as file_in:
        return file_in.readlines()

def test4(file_path):
    with open(file_path, "r", encoding="utf-8") as file_in:
        return list(file_in)

def test5(file_path):
    with open(file_path, "r", encoding="utf-8") as file_in:
        yield from file_in
'''

print(timeit.timeit("test1(file_path)", setup=setup, number=100))
print(timeit.timeit("test2(file_path)", setup=setup, number=100))
print(timeit.timeit("test3(file_path)", setup=setup, number=100))
print(timeit.timeit("test4(file_path)", setup=setup, number=100))
print(timeit.timeit("list(test5(file_path))", setup=setup, number=100))

在我的筆記本電腦上，這顯示了我：

9.65
9.79
9.29
9.08
9.85

向我建議從性能角度選擇哪個并不重要。所以不要使用你的test2()策略:-)

請注意，test5()從記憶體管理的角度來看（歸功于@tomalak）可能很重要！。

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/385101.html

標籤：Python 文件发电机

上一篇：要求用戶輸入我稍后閱讀的檔案名

下一篇：如何從沒有javaClass的資源中讀取文本檔案