這個問題在這里已經有了答案: 如何根據列值從 DataFrame 中選擇行? (13 個回答) Pandas 資料框從指定列中選擇具有最高值的整行 4 個答案 計算熊貓資料框中單詞的頻率 3 個答案 18 小時前關閉。
我想通過使用 Python從data.csv中對前 10 部最長電影的描述中獲取20 個最常見的單詞。到目前為止,我獲得了前 10 部最長的電影,但是我無法從這些特定電影中獲得最常見的詞,我的代碼只給出了整個data.csv本身中最常見的詞。我嘗試過 Counter、Pandas、Numpy、Mathlib,但我不知道如何讓 Python準確查找資料表的特定行和列(電影描述)中最常見的單詞
我的代碼:
import pandas as pd
import numpy as np
df = pd.read_csv("data.csv")
small_df = df[['title','duration_min','description']]
result_time = small_df.sort_values('duration_min', ascending=False)
print("TOP 10 LONGEST: ")
print(result_time.head(n=10))
most_common = pd.Series(' '.join(result_time['description']).lower().split()).value_counts()[:20]
print("20 Most common words from TOP 10 longest movies: ")
print(most_common)
我的輸出:
TOP 10 LONGEST:
title duration_min description
6840 The School of Mischief 253.0 A high school teacher volunteers to transform ...
4482 No Longer kids 237.0 Hoping to prevent their father from skipping t...
3687 Lock Your Girls In 233.0 A widower believes he must marry off his three...
5100 Raya and Sakina 230.0 When robberies and murders targeting women swe...
5367 Sangam 228.0 Returning home from war after being assumed de...
3514 Lagaan 224.0 In 1890s India, an arrogant British commander ...
3190 Jodhaa Akbar 214.0 In 16th-century India, what begins as a strate...
6497 The Irishman 209.0 Hit man Frank Sheeran looks back at the secret...
3277 Kabhi Khushi Kabhie Gham 209.0 Years after his father disowns his adopted bro...
4476 No Direction Home: Bob Dylan 208.0 Featuring rare concert footage and interviews ...
20 Most common words from TOP 10 longest movies:
a 10134
the 7153
to 5653
and 5573
of 4691
in 3840
his 3005
with 1967
her 1803
an 1727
for 1558
on 1528
their 1468
when 1320
this 1240
from 1114
as 1050
is 988
by 894
after 865
dtype: int64
這是資料表: https ://www.dropbox.com/s/hxch4v08bkthvz1/data.csv?dl=1
uj5u.com熱心網友回復:
您可以使用 選擇資料框的前 10 行iloc[0:10]。
在這種情況下,解決方案將如下所示,對現有代碼的修改最少:
import pandas as pd
import numpy as np
df = pd.read_csv("data.csv")
small_df = df[['title','duration_min','description']]
result_time = small_df.sort_values('duration_min', ascending=False)
print("TOP 10 LONGEST: ")
print(result_time.head(n=10))
most_common = pd.Series(' '.join(result_time.iloc[0:10]['description']).lower().split()).value_counts()[:20]
print("20 Most common words from TOP 10 longest movies: ")
print(most_common)
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/437563.html
下一篇:如何在這里列印密鑰?
