我有一個包含推文串列的串列:
twitter_dataset_list = [['322185112684994561', '@Bill_Porter nice to know that your site is back :-)'], ['322185112684994545', 'I had a bad day']]
我想將每個元素的訊息與以下串列進行比較,看看它們是正面還是負面
positive_keyword_list = ['nice']
negative_keyword_list = ['bad']
如果它們是正/負,那么我想在每個初始串列中附加一個標志,如下所示:
[['322185112684994561', '@Bill_Porter nice to know that your site is back :-)', 1], ['322185112684994545', 'I had a bad day', -1]]
我已經這樣做了,但我不確定如何迭代和子索引
for element in twitter_dataset_list:
if any(word in twitter_dataset_list[0][1] for word in positive_keyword_list) == True:
twitter_dataset_list.append('1')
elif any(word in twitter_dataset_list[0][1] for word in negative_keyword_list) == True:
twitter_dataset_list.append('-1')
else:
twitter_dataset_list[0][1].append('0')
print(twitter_dataset_list)
So how do I iterate over the twitter_dataset_list
uj5u.com熱心網友回復:
首先,該enumerate函式在這里很有用,因為它會在您遍歷串列時為您提供索引和值。
其次,您可以使用for i, (id, text) in語法隨時解包。
最后,您可以使用_在回圈中實際未使用的任何解包。(這里,我不需要ID,所以我只是_告訴python不要擔心它。)
有關在回圈中解包的不同方法的更多詳細資訊,請參閱Python 檔案的資料結構。
for i, (_, text) in enumerate(twitter_dataset_list):
if any(word in text for word in positive_keyword_list):
twitter_dataset_list[i].append(1)
elif any(word in text for word in negative_keyword_list):
twitter_dataset_list[i].append(-1)
else:
twitter_dataset_list[i].append(0)
uj5u.com熱心網友回復:
我建議不要更改原始資料,而是回傳一個新串列:
positive_keyword_set = {"nice",}
negative_keyword_set = {"bad",}
tweets_with_sentiments = []
for tweet_id, tweet in twitter_dataset_list:
sentiment = 0
words = tweet.lower().split()
if negative_keyword_set.intersection(words):
sentiment = -1
elif positive_keyword_set.intersection(words):
sentiment = 1
tweets_with_sentiments.append([tweet_id, tweet, sentiment])
請注意,我還將您的關鍵字串列轉換為set. 這允許O(1)查找,因為存盤在 中的值set可以被散列。它還允許您簡單地使用set.intersection()推文的詞來查找關鍵字:
>>> tweet = '@Bill_Porter nice to know that your site is back :-)'
>>> tweet.lower().split()
['@bill_porter',
'nice',
'to',
'know',
'that',
'your',
'site',
'is',
'back',
':-)']
>>> positive_keyword_set.intersection(tweet.split())
{'nice'}
事實上,我建議使用 adict來存盤推文情緒:
positive_keyword_set = {"nice",}
negative_keyword_set = {"bad",}
tweets_with_sentiments = {}
for tweet_id, tweet in twitter_dataset_list:
sentiment = 0
if negative_keyword_set.intersection(tweet.split()):
sentiment = -1
elif positive_keyword_set.intersection(tweet.split()):
sentiment = 1
tweets_with_sentiments[int(tweet_id)] = dict(tweet=tweet, sentiment=sentiment)
現在可以O(1)通過推文 ID 訪問您的資料結構:
>>> tweets_with_sentiments
{322185112684994561: {'tweet': '@Bill_Porter nice to know that your site is back :-)', 'sentiment': 1},
322185112684994545: {'tweet': 'I had a bad day', 'sentiment': -1}}
>>> tweets_with_sentiments[322185112684994561]["sentiment"]
1
uj5u.com熱心網友回復:
我會創建一個函式來處理情緒,因為我假設這部分代碼可能會發展(我將 Blobtext lib 用于類似的應用程式):
twitter_dataset_list = [['322185112684994561', '@Bill_Porter nice to know that your site is back :-)'],
['322185112684994545', 'I had a bad day']]
def text_positivity(tweet_text:str)->list:
# https://www.adamsmith.haus/python/answers/how-to-check-if-a-string-contains-an-element-from-a-list-in-python#:~:text=Use any() to check,to build the generator expression.
positive_keyword_list = ['nice']
negative_keyword_list = ['bad']
if any(' ' keyword.lower() ' ' in tweet_text.lower() for keyword in positive_keyword_list):
return [1]
if any(' ' keyword.lower() ' ' in tweet_text.lower() for keyword in negative_keyword_list):
return [-1]
return [0]
twitter_dataset_list = [tweet_details text_positivity(tweet_text=tweet_details[1]) for tweet_details in twitter_dataset_list]
print(twitter_dataset_list)
編輯答案以考慮 Kumar 關于部分匹配的評論。同樣,使用 lower() 匹配大寫或小寫匹配
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/463639.html
