如何修改資料框-有解無憂

我有這樣的資料框

Keyword                   devops       aws       ec2   python     java    dotnet
optional/mandatory        optional mandatory mandatory optional optional mandatory
Candidate Name                                                         
User1                       1          1          1        1        1         1
User2                       1          0          0        1        1         1
...                        ...        ...        ...      ...      ...       ...

User67                      1          1          1        0        0         0

我想計算每個候選人姓名的百分比匹配并添加新的百分比匹配列，可選/必填行在計算百分比中起著重要作用。就像關鍵字aws ec2 dotnet是強制性的，如果計數為 0，則將該候選的百分比匹配設定為 0%。意味著所有強制關鍵字必須具有計數 1，然后僅計算該候選的百分比。

預期輸出：

Keyword                   devops       aws       ec2   python     java    dotnet    Percentage
optional/mandatory        optional mandatory mandatory optional optional mandatory    Match
Candidate Name                                                         
User1                       1          1          1        1        1         1        100%
User2                       1          1          1        0        0         1        66.66%
...                        ...        ...        ...      ...      ...       ...

User67                      1          1          1        0        0         0         0%

計算百分比：

用戶 1為 100%，因為所有強制都計數為 1

User2為 66.66%，因為兩個可選的計數為 0，然后Total Number of ons's/Total Keywords*100 (4/6)100

User3為 0%，因為在 3 個中強制一個的計數為零，直接將百分比設定為 0%，而與可選關鍵字計數無關

關鍵字行和候選人名稱列不斷變化，將來可能會有更多關鍵字和候選人名稱。如何實作期望輸出。

我的代碼：

import numpy as np
import pandas as pd
read_csv = pd.read_csv('Sample.csv')
df = pd.DataFrame(read_csv)
df["Count"] = np.where(df["Count"] > 0, 1, 0)
print(df)
df = df.pivot_table(index="Candidate Name", columns=["optional/mandatory","Keyword"], values="Count")
print(df)

輸出1：

Candidate Name    optional/mandatory Keyword  Count
      User1       optional           devops      1
      User1      mandatory              aws      0
      User1      mandatory              ec2      1
      User1       optional           python      1
      User1       optional             java      1
      User1      mandatory           dotnet      0
      User2       optional           devops      1
      User2      mandatory              aws      1
      User2      mandatory              ec2      0
      User3       optional           devops      1
      User3      mandatory              ec2      1
      User3      mandatory              aws      0
      User3       optional             java      1

optional/mandatory        mandatory             optional            
Keyword              aws dotnet  ec2   devops java python
Candidate Name                                           
    User1            0.0    0.0  1.0      1.0  1.0    1.0
    User2            1.0    NaN  0.0      1.0  NaN    NaN
    User3            0.0    NaN  1.0      1.0  1.0    NaN

通過嘗試獲得所需的輸出：

import pandas as pd

cols = pd.MultiIndex.from_arrays([
    ['optional', 'mandatory', 'mandatory', 'mandatory', 'optional', 'optional'],
    ['devops', 'aws', 'ec2', 'dotnet', 'python', 'java']
])

data = [
    ['User 1', 1,1,1,1,1,1],
    ['User 2',1,0,0,1,1,1],
    # ...,
    ['User N',0,1,1,1,0,0]
]

df = pd.DataFrame(
    columns=cols,
    data=[applicant[1:] for applicant in data],
    index=pd.Index([applicant[0] for applicant in data], name='Candidate Name')
)

df.sort_index(axis='columns', inplace=True)
And then, we can use apply to run a function against each of the rows:


def calculate_skill_coverage_percent(r):
    if sum(r['mandatory']) < len(r['mandatory']):
        return 0
    else:
        return sum(r) / len(r)
df['Percent_Match'] = df.apply(calculate_skill_coverage_percent, axis='columns')

print(df)

輸出：

                 mandatory            optional             Percent_Match
                 aws dotnet ec2   devops java python              
Candidate Name                                                        
User 1                 1      1   1        1    1      1           1.0
User 2                 0      1   0        1    1      1           0.0
...
User N                 1      1   1        0    0      0           0.5

我有點困惑，因為我在pd.MultiIndex.from_arrays和data中有硬編碼值。如何使用Output1中的上述 Dataframe或任何其他建議來獲取這些值

cols = pd.MultiIndex.from_arrays([
    ['optional', 'mandatory', 'mandatory', 'mandatory', 'optional', 'optional'],
    ['devops', 'aws', 'ec2', 'dotnet', 'python', 'java']
])

data = [
    ['User 1', 1,1,1,1,1,1],
    ['User 2',1,0,0,1,1,1],
    # ...,
    ['User N',0,1,1,1,0,0]
]

uj5u.com熱心網友回復：

而是由所有行calculate_skill_coverage_percent使用-如果平均值不太像，則它mean是總和除以長度與集合：0mandatory1Series.mask

s = df.xs('mandatory', level=0, axis=1).mean(axis=1)
df['Percent_Match'] = df.mean(axis=1).mask(s.lt(1), 0)

print(df)
               mandatory            optional             Percent_Match
                     aws dotnet ec2   devops java python              
Candidate Name                                                        
User 1                 1      1   1        1    1      1           1.0
User 2                 0      1   0        1    1      1           0.0
User N                 1      1   1        0    0      0           0.5

編輯：

print (df)
   Candidate Name optional/mandatory Keyword  Count
0           User1           optional  devops      1
1           User1          mandatory     aws      0
2           User1          mandatory     ec2      1
3           User1           optional  python      1
4           User1           optional    java      1
5           User1          mandatory  dotnet      0
6           User2           optional  devops      1
7           User2          mandatory     aws      1
8           User2          mandatory     ec2      0
9           User3           optional  devops      1
10          User3          mandatory     ec2      1
11          User3          mandatory     aws      0
12          User3           optional    java      1

用于DataFrame.pivot重塑：

df = (df.pivot('Candidate Name', ['optional/mandatory','Keyword'], 'Count')
        .fillna(0)
        .astype(int)
        .sort_index(axis='columns'))
print (df)
optional/mandatory mandatory            optional            
Keyword                  aws dotnet ec2   devops java python
Candidate Name                                              
User1                      0      0   1        1    1      1
User2                      1      0   0        1    0      0
User3                      0      0   1        1    1      0

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/518091.html

標籤：Python熊猫数据框麻木的

上一篇：根據條件手動編輯熊貓資料框中的單元格

下一篇：如何在空的numpy陣列中添加值？