我有這樣的資料框
Keyword devops aws ec2 python java dotnet
optional/mandatory optional mandatory mandatory optional optional mandatory
Candidate Name
User1 1 1 1 1 1 1
User2 1 0 0 1 1 1
... ... ... ... ... ... ...
User67 1 1 1 0 0 0
我想計算每個候選人姓名的百分比匹配并添加新的百分比匹配列,可選/必填行在計算百分比中起著重要作用。就像關鍵字aws ec2 dotnet是強制性的,如果計數為 0,則將該候選的百分比匹配設定為 0%。意味著所有強制關鍵字必須具有計數 1,然后僅計算該候選的百分比。
預期輸出:
Keyword devops aws ec2 python java dotnet Percentage
optional/mandatory optional mandatory mandatory optional optional mandatory Match
Candidate Name
User1 1 1 1 1 1 1 100%
User2 1 1 1 0 0 1 66.66%
... ... ... ... ... ... ...
User67 1 1 1 0 0 0 0%
計算百分比:
用戶 1為 100%,因為所有強制都計數為 1
User2為 66.66%,因為兩個可選的計數為 0,然后Total Number of ons's/Total Keywords*100 (4/6)100
User3為 0%,因為在 3 個中強制一個的計數為零,直接將百分比設定為 0%,而與可選關鍵字計數無關
關鍵字行和候選人名稱列不斷變化,將來可能會有更多關鍵字和候選人名稱。如何實作期望輸出。
我的代碼:
import numpy as np
import pandas as pd
read_csv = pd.read_csv('Sample.csv')
df = pd.DataFrame(read_csv)
df["Count"] = np.where(df["Count"] > 0, 1, 0)
print(df)
df = df.pivot_table(index="Candidate Name", columns=["optional/mandatory","Keyword"], values="Count")
print(df)
輸出1:
Candidate Name optional/mandatory Keyword Count
User1 optional devops 1
User1 mandatory aws 0
User1 mandatory ec2 1
User1 optional python 1
User1 optional java 1
User1 mandatory dotnet 0
User2 optional devops 1
User2 mandatory aws 1
User2 mandatory ec2 0
User3 optional devops 1
User3 mandatory ec2 1
User3 mandatory aws 0
User3 optional java 1
optional/mandatory mandatory optional
Keyword aws dotnet ec2 devops java python
Candidate Name
User1 0.0 0.0 1.0 1.0 1.0 1.0
User2 1.0 NaN 0.0 1.0 NaN NaN
User3 0.0 NaN 1.0 1.0 1.0 NaN
通過嘗試獲得所需的輸出:
import pandas as pd
cols = pd.MultiIndex.from_arrays([
['optional', 'mandatory', 'mandatory', 'mandatory', 'optional', 'optional'],
['devops', 'aws', 'ec2', 'dotnet', 'python', 'java']
])
data = [
['User 1', 1,1,1,1,1,1],
['User 2',1,0,0,1,1,1],
# ...,
['User N',0,1,1,1,0,0]
]
df = pd.DataFrame(
columns=cols,
data=[applicant[1:] for applicant in data],
index=pd.Index([applicant[0] for applicant in data], name='Candidate Name')
)
df.sort_index(axis='columns', inplace=True)
And then, we can use apply to run a function against each of the rows:
def calculate_skill_coverage_percent(r):
if sum(r['mandatory']) < len(r['mandatory']):
return 0
else:
return sum(r) / len(r)
df['Percent_Match'] = df.apply(calculate_skill_coverage_percent, axis='columns')
print(df)
輸出:
mandatory optional Percent_Match
aws dotnet ec2 devops java python
Candidate Name
User 1 1 1 1 1 1 1 1.0
User 2 0 1 0 1 1 1 0.0
...
User N 1 1 1 0 0 0 0.5
我有點困惑,因為我在pd.MultiIndex.from_arrays和data中有硬編碼值。如何使用Output1中的上述 Dataframe或任何其他建議來獲取這些值
cols = pd.MultiIndex.from_arrays([
['optional', 'mandatory', 'mandatory', 'mandatory', 'optional', 'optional'],
['devops', 'aws', 'ec2', 'dotnet', 'python', 'java']
])
data = [
['User 1', 1,1,1,1,1,1],
['User 2',1,0,0,1,1,1],
# ...,
['User N',0,1,1,1,0,0]
]
uj5u.com熱心網友回復:
而是由所有行calculate_skill_coverage_percent使用-如果平均值不太像,則它mean是總和除以長度與集合:0mandatory1Series.mask
s = df.xs('mandatory', level=0, axis=1).mean(axis=1)
df['Percent_Match'] = df.mean(axis=1).mask(s.lt(1), 0)
print(df)
mandatory optional Percent_Match
aws dotnet ec2 devops java python
Candidate Name
User 1 1 1 1 1 1 1 1.0
User 2 0 1 0 1 1 1 0.0
User N 1 1 1 0 0 0 0.5
編輯:
print (df)
Candidate Name optional/mandatory Keyword Count
0 User1 optional devops 1
1 User1 mandatory aws 0
2 User1 mandatory ec2 1
3 User1 optional python 1
4 User1 optional java 1
5 User1 mandatory dotnet 0
6 User2 optional devops 1
7 User2 mandatory aws 1
8 User2 mandatory ec2 0
9 User3 optional devops 1
10 User3 mandatory ec2 1
11 User3 mandatory aws 0
12 User3 optional java 1
用于DataFrame.pivot重塑:
df = (df.pivot('Candidate Name', ['optional/mandatory','Keyword'], 'Count')
.fillna(0)
.astype(int)
.sort_index(axis='columns'))
print (df)
optional/mandatory mandatory optional
Keyword aws dotnet ec2 devops java python
Candidate Name
User1 0 0 1 1 1 1
User2 1 0 0 1 0 0
User3 0 0 1 1 1 0
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/518091.html
