如何從特定行和列中找到最常見的單詞并列出它在data.csv中出現的頻率？[復制]-有解無憂

這個問題在這里已經有了答案：如何根據列值從 DataFrame 中選擇行？ (13 個回答) Pandas 資料框從指定列中選擇具有最高值的整行 4 個答案計算熊貓資料框中單詞的頻率 3 個答案 18 小時前關閉。

我想通過使用 Python從data.csv中對前 10 部最長電影的描述中獲取20 個最常見的單詞。到目前為止，我獲得了前 10 部最長的電影，但是我無法從這些特定電影中獲得最常見的詞，我的代碼只給出了整個data.csv本身中最常見的詞。我嘗試過 Counter、Pandas、Numpy、Mathlib，但我不知道如何讓 Python準確查找資料表的特定行和列（電影描述）中最常見的單詞

我的代碼：

import pandas as pd
import numpy as np
df = pd.read_csv("data.csv")
small_df = df[['title','duration_min','description']]
result_time = small_df.sort_values('duration_min', ascending=False)
print("TOP 10 LONGEST: ")
print(result_time.head(n=10))

most_common = pd.Series(' '.join(result_time['description']).lower().split()).value_counts()[:20]
print("20 Most common words from TOP 10 longest movies: ")
print(most_common)

我的輸出：

TOP 10 LONGEST: 
                             title  duration_min                                        description
6840        The School of Mischief         253.0  A high school teacher volunteers to transform ...
4482                No Longer kids         237.0  Hoping to prevent their father from skipping t...
3687            Lock Your Girls In         233.0  A widower believes he must marry off his three...
5100               Raya and Sakina         230.0  When robberies and murders targeting women swe...
5367                        Sangam         228.0  Returning home from war after being assumed de...
3514                        Lagaan         224.0  In 1890s India, an arrogant British commander ...
3190                  Jodhaa Akbar         214.0  In 16th-century India, what begins as a strate...
6497                  The Irishman         209.0  Hit man Frank Sheeran looks back at the secret...
3277      Kabhi Khushi Kabhie Gham         209.0  Years after his father disowns his adopted bro...
4476  No Direction Home: Bob Dylan         208.0  Featuring rare concert footage and interviews ...
20 Most common words from TOP 10 longest movies: 
a        10134
the       7153
to        5653
and       5573
of        4691
in        3840
his       3005
with      1967
her       1803
an        1727
for       1558
on        1528
their     1468
when      1320
this      1240
from      1114
as        1050
is         988
by         894
after      865
dtype: int64

這是資料表： https ://www.dropbox.com/s/hxch4v08bkthvz1/data.csv?dl=1

uj5u.com熱心網友回復：

您可以使用選擇資料框的前 10 行iloc[0:10]。

在這種情況下，解決方案將如下所示，對現有代碼的修改最少：

import pandas as pd
import numpy as np    
df = pd.read_csv("data.csv")
small_df = df[['title','duration_min','description']]
result_time = small_df.sort_values('duration_min', ascending=False)
print("TOP 10 LONGEST: ")
print(result_time.head(n=10))

most_common = pd.Series(' '.join(result_time.iloc[0:10]['description']).lower().split()).value_counts()[:20]
print("20 Most common words from TOP 10 longest movies: ")
print(most_common)

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/437563.html

標籤：Python 熊猫列表麻木的字典

上一篇：pythonpandas：將不同長度的字典添加為列

下一篇：如何在這里列印密鑰？