我正在研究一個對班級有一類挑戰的資料框。我需要能夠將它們識別為基于“linux”、“window”或“primer”。我這樣創建了一個字典:
import pandas as pd
topic_keywords_dict = {
'Linux':
{
'identification':['linux'],
'topic':
[
'bash','boot','process','auditing'
]},
'Windows':
{
'identification':['windows','memory'],
'topic':
[
'boot','process','artifacts','memory','active_directory','sysinternal'
]},
'Primer':
{
'identification':['primer'],
'topic':
[
'kernel','CLI','registry','process','NTFS','boot','auditing','security','active_directory','networking','surveys'
]}
}
我有一個看起來像這樣的資料框:
challenge_count_df = pd.DataFrame({'Challenge': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
'Count' : [32, 22, 40, 12, 10, 60, 32, 22, 44, 90],
'Value' : ["0","5","10","15","5","10","5","10","15","10"],
'Category' : ['linux_bash','primer_02','windows_active_directory','basic_linux','linux_kitty','alpha_primer','windows_auditing','linux_logging', 'linux', 'primer']})
這會給我這樣的東西:
>>> challenge_count_df
Challenge Count Value Category
0 A 32 0 linux_bash
1 B 22 5 primer_02
2 C 40 10 windows_active_directory
3 D 12 15 basic_linux
4 E 10 5 linux_kitty
5 F 60 10 alpha_primer
6 G 32 5 windows_auditing
7 H 22 10 linux_logging
8 I 44 15 linux
9 J 90 10 primer
我正在考慮使用這樣的東西:
challenge_count_df[challenge_count_df['Category'].contains('|'.join(topic_keywords_dict[dict_key]['identification']))]
并且可能將其以使用上述方法應用 lambda 的形式
challenge_count_df['key_dict'] = challenge_count_df['Category'].apply(lambda x: key_dict if x .contains('|'.join(topic_keywords_dict[dict_key]['identification'])) for key_dict in topic_keywords_dict)
但我想我在 lambda 中做錯了 for 回圈......有人可以幫我理解我做錯了什么嗎?
- - - - - - - - - - 編輯 - - - - - - - - -
預期結果如下所示:
>>> challenge_count_df
Challenge Count Value Category key_dict
0 A 32 0 linux_bash linux
1 B 22 5 primer_02 primer
2 C 40 10 windows_active_directory windows
3 D 12 15 basic_linux linux
4 E 10 5 linux_kitty linux
5 F 60 10 alpha_primer primer
6 G 32 5 windows_auditing windows
7 H 22 10 linux_logging linux
8 I 44 15 linux linux
9 J 90 10 primer primer
uj5u.com熱心網友回復:
我建議只做一個函式:對于單行來說,這有點多。嘗試這樣的事情:
def func(category):
for platform, data in topic_keywords_dict.items():
if any(x in category for x in data['identification']:
return platform
return None
df['key_dict'] = df['Category'].apply(func)
uj5u.com熱心網友回復:
用于Series.str.contains識別匹配的行并分配字典的鍵:
for k, v in topic_keywords_dict.items():
m = challenge_count_df['Category'].str.contains('|'.join(v['identification']))
challenge_count_df.loc[m, 'key_dict'] = k
print (challenge_count_df)
Challenge Count Value Category key_dict
0 A 32 0 linux_bash Linux
1 B 22 5 primer_02 Primer
2 C 40 10 windows_active_directory Windows
3 D 12 15 basic_linux Linux
4 E 10 5 linux_kitty Linux
5 F 60 10 alpha_primer Primer
6 G 32 5 windows_auditing Windows
7 H 22 10 linux_logging Linux
8 I 44 15 linux Linux
9 J 90 10 primer Primer
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/527447.html
標籤:Python熊猫拉姆达
