減少函式的時間（Python）-有解無憂

我正在嘗試在 python 中創建一個函式，該函式從字串串列回傳給我一個字典，其中 key(index) 顯示所有字串之間每個索引的最重復字符。例如 list1 = ['one', 'two', 'twin', 'who'] 應該回傳 index 0=t index 1=w index 2=o index 3=n 實際上是索引 1 處最常見的字符所有字串之間都是'w'。我找到了一個解決方案，但如果我有包含數千個字串的串列，它將需要太多時間來執行。我想知道您是否可以給我一些幫助以減少執行時間。

這是我試圖做的，但似乎太慢了，無法在里面執行數千個字串的串列

list1 = ['one', 'two', 'twin', 'who']

width = len(max(list1, key=len))

chars = {}

for i, item in enumerate(zip(*[s.ljust(width) for s in list1])):
    set1 = set(item)
    if ' ' in set1:
        set1.remove(' ')
    chars[i] = max(set1, key=item.count)
print(chars)

uj5u.com熱心網友回復：

是否足夠快是用例的問題，但這個解決方案需要幾秒鐘來瀏覽 OS X 下可用的默認單詞表。

Python為您collections.Counter實作了一個計數器物件，因此您無需自己跟蹤多個可能值的計數。

我已將它defaultdict與.

from collections import defaultdict, Counter


with open("/usr/share/dict/words") as f:
    words = f.read().splitlines()
    letters = defaultdict(Counter)

    for word in words:
        for idx, letter in enumerate(word):
            letters[idx].update((letter, ))

for idx, counter in letters.items():
    print(idx, counter.most_common(1))

這是否足夠快取決于您提到的用例；如有必要，它可以更快地完成，但它可能已經足夠快了。對于 235 886 字，運行時間為：

python3 letterfreq.py  2.67s user 0.04s system 99% cpu 2.734 total

這假設每個單詞都是小寫的，如果不是，則在將其添加到您的 Counter 物件之前將其小寫。

如果您想在不使用標準庫的部分Counter或defaultdict部分的情況下實作它（這些只是幫助功能以避免重復實作相同的小代碼），您可以自己手動執行確切的操作：

with open("/usr/share/dict/words") as f:
    words = f.read().splitlines()
    letter_positions = {}

    for word in words:
        for idx, letter in enumerate(word):
            if idx not in letter_positions:
                letter_positions[idx] = {}

            if letter not in letter_positions[idx]:
                letter_positions[idx][letter] = 0

            letter_positions[idx][letter]  = 1

final_dict = {}

for idx, counts in letter_positions.items():
    most_popular = sorted(counts.items(), key=lambda v: v[1], reverse=True)
    print(idx, most_popular)
    final_dict[idx] = most_popular[0][0]

print(final_dict)

most_popular然后在之后瀏覽串列時根據需要選擇盡可能多的條目。

由于我們不再使用defaultdictandCounter抽象，我們的運行時間現在大約是之前的三分之一：

python3 letterfreq2.py  1.08s user 0.03s system 98% cpu 1.124 total

完成您正在嘗試做的事情并制定策略通常是一個好主意 - 即“好的，我需要跟蹤一個字母在這個位置出現了多少次......所以為此我需要一些方法來保留每個索引的值 .. 然后為每個字母 ..”。

uj5u.com熱心網友回復：

我只是根據你的演算法做了一些改進。

首先，您可以使用itertools.zip_longest()而不是zip()來消除需要ljust()和width變數：

from itertools import zip_longest

list1 = ['one', 'two', 'twin', 'who']

chars = {}

for i, item in enumerate(zip_longest(*list1)):
    set1 = set(item)
    if None in set1:
        set1.remove(None)
    chars[i] = max(set1, key=item.count)
print(chars)

然后，max(set1, key=item.count)用更有效的方式 Counter(item).most_common(1)[0][0]替換，結合or set1.most_common(2)[1][0]過濾None值

from itertools import zip_longest
from collections import Counter

list1 = ['one', 'two', 'twin', 'who']

chars = {}

for i, item in enumerate(zip_longest(*list1)):
    set1 = Counter(item)
    chars[i] = set1.most_common(1)[0][0] or set1.most_common(2)[1][0]
print(chars)

作為Python 內置模塊，您可以在沒有它們的情況下直接匯入itertools它們。collectionspip install

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/533224.html

標籤：Python表现

上一篇：為什么torch.tanh的計算效率比直接表達高很多？

下一篇：如何通過并行處理加快這個python代碼的速度？