樣本輸入
[
"hi #weekend",
"good morning #madrid #fun",
"spend my #weekend in #madrid",
"#madrid <3"
]
預期產出
{'weekend': 2, 'madrid': 3, 'fun': 1}
規則:
該程式不應將空主題標簽(“#”)視為一個
不以字母開頭的主題標簽不應被視為主題標簽
它應該將小寫和大寫主題標簽視為不同的主題標簽
這是我到目前為止。我的目標是在程式中包含規則
from collections import Counter
def analyze(posts):
counter = Counter(
x[1:] for x in ' '.join(posts).split() if x.startswith('#')
)
return dict(counter)
posts = [
"hi #weekend",
"good morning #madrid #fun",
"spend my #weekend in #madrid",
"#madrid <3"]
print(analyze(posts))
uj5u.com熱心網友回復:
鑒于主題標簽應以字母開頭的條件,我建議使用正則運算式來提取所有以字母開頭的主題標簽:
import re
from collections import Counter
def analyze(posts):
hits = re.findall('#[A-Za-z] [A-Za-z0-9]*', ' '.join(data))
return Counter([i[1:] for i in hits])
uj5u.com熱心網友回復:
嘗試這個:
a = [
"hi #weekend",
"good morning #madrid #fun",
"spend my #weekend in #madrid",
"#madrid <3"
]
my_dict = {}
for j in a:
for i in j.split():
if i.startswith("#") and i[1].isalpha():
if i[1:] in my_dict:
my_dict[i[1:]] = 1
else:
my_dict[i[1:]] = 1
print(my_dict)
輸出:
{'weekend': 2, 'madrid': 3, 'fun': 1}
uj5u.com熱心網友回復:
您可以使用collections.Counter一個簡單的理解:
l = [
"hi #weekend",
"good morning #madrid #fun",
"spend my #weekend in #madrid",
"#madrid <3"
]
from collections import Counter
counts = Counter(w[1:] for w in ' '.join(l).split()
if len(w)>1 and w.startswith('#') and w[1].isalpha())
輸出:
>>> counts
Counter({'weekend': 2, 'madrid': 3, 'fun': 1})
# as dictionary
>>> dict(counts)
{'weekend': 2, 'madrid': 3, 'fun': 1}
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/347797.html
