如果另一個列值包含串列中的字串，如何用關鍵字標記行？-有解無憂

我正在研究一個對班級有一類挑戰的資料框。我需要能夠將它們識別為基于“linux”、“window”或“primer”。我這樣創建了一個字典：

import pandas as pd

topic_keywords_dict = {
    'Linux': 
        {
            'identification':['linux'],
            'topic':
            [
                'bash','boot','process','auditing'
            ]},

    
    'Windows':
        {
            'identification':['windows','memory'],
            'topic':
            [
                'boot','process','artifacts','memory','active_directory','sysinternal'
        ]},
    'Primer':
    {
        'identification':['primer'],
        'topic':
        [
            'kernel','CLI','registry','process','NTFS','boot','auditing','security','active_directory','networking','surveys'
    ]}
}

我有一個看起來像這樣的資料框：

challenge_count_df = pd.DataFrame({'Challenge': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
                                  'Count' : [32, 22, 40, 12, 10, 60, 32, 22, 44, 90],
                                  'Value' : ["0","5","10","15","5","10","5","10","15","10"],
                                  'Category' : ['linux_bash','primer_02','windows_active_directory','basic_linux','linux_kitty','alpha_primer','windows_auditing','linux_logging', 'linux', 'primer']})

這會給我這樣的東西：

>>> challenge_count_df
  Challenge  Count Value                  Category
0         A     32     0                linux_bash
1         B     22     5                 primer_02
2         C     40    10  windows_active_directory
3         D     12    15               basic_linux
4         E     10     5               linux_kitty
5         F     60    10              alpha_primer
6         G     32     5          windows_auditing
7         H     22    10             linux_logging
8         I     44    15                     linux
9         J     90    10                    primer

我正在考慮使用這樣的東西：

challenge_count_df[challenge_count_df['Category'].contains('|'.join(topic_keywords_dict[dict_key]['identification']))]

并且可能將其以使用上述方法應用 lambda 的形式

challenge_count_df['key_dict'] = challenge_count_df['Category'].apply(lambda x: key_dict if x .contains('|'.join(topic_keywords_dict[dict_key]['identification'])) for key_dict in topic_keywords_dict)

但我想我在 lambda 中做錯了 for 回圈......有人可以幫我理解我做錯了什么嗎？

- - - - - - - - - - 編輯 - - - - - - - - -

預期結果如下所示：

>>> challenge_count_df
  Challenge  Count Value                  Category   key_dict
0         A     32     0                linux_bash   linux
1         B     22     5                 primer_02   primer
2         C     40    10  windows_active_directory   windows
3         D     12    15               basic_linux   linux
4         E     10     5               linux_kitty   linux
5         F     60    10              alpha_primer   primer
6         G     32     5          windows_auditing   windows
7         H     22    10             linux_logging   linux
8         I     44    15                     linux   linux
9         J     90    10                    primer   primer

uj5u.com熱心網友回復：

我建議只做一個函式：對于單行來說，這有點多。嘗試這樣的事情：

def func(category):
    for platform, data in topic_keywords_dict.items():
        if any(x in category for x in data['identification']:
             return platform
    return None

df['key_dict'] = df['Category'].apply(func)

uj5u.com熱心網友回復：

用于Series.str.contains識別匹配的行并分配字典的鍵：

for k, v in topic_keywords_dict.items():
    m = challenge_count_df['Category'].str.contains('|'.join(v['identification']))
    challenge_count_df.loc[m, 'key_dict'] = k

print (challenge_count_df)
  Challenge  Count Value                  Category key_dict
0         A     32     0                linux_bash    Linux
1         B     22     5                 primer_02   Primer
2         C     40    10  windows_active_directory  Windows
3         D     12    15               basic_linux    Linux
4         E     10     5               linux_kitty    Linux
5         F     60    10              alpha_primer   Primer
6         G     32     5          windows_auditing  Windows
7         H     22    10             linux_logging    Linux
8         I     44    15                     linux    Linux
9         J     90    10                    primer   Primer

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/527447.html

標籤：Python熊猫拉姆达

上一篇：如何在JetpackCompose材料3中顯示小吃吧

下一篇：如何修剪傳遞給Pandas查詢函式的查詢字串中的字串？