Python之sklearn:LabelEncoder函式簡介(編碼與編碼還原)、使用方法、具體案例之詳細攻略
目錄
LabelEncoder函式的簡介(編碼與編碼還原)
Methods
LabelEncoder函式的使用方法
LabelEncoder函式的具體案例
1、基礎案例
2、在資料缺失和test資料記憶體在新值(train資料未出現過)環境下的資料LabelEncoder化
LabelEncoder函式的簡介(編碼與編碼還原)
| class LabelEncoder Found at: sklearn.preprocessing._labelclass LabelEncoder(TransformerMixin, BaseEstimator): |
""對目標標簽進行編碼,值在0到n_class -1之間, 這個轉換器應該用于編碼目標值,*即' y ',而不是輸入' X ', 更多內容見:ref: ' User Guide ', |
| .. versionadded:: 0.12 Attributes ---------- classes_ : array of shape (n_class,) Holds the label for each class. Examples -------- `LabelEncoder` can be used to normalize labels. >>> from sklearn import preprocessing >>> le = preprocessing.LabelEncoder() >>> le.fit([1, 2, 2, 6]) LabelEncoder() >>> le.classes_ array([1, 2, 6]) >>> le.transform([1, 1, 2, 6]) array([0, 0, 1, 2]...) >>> le.inverse_transform([0, 0, 1, 2]) array([1, 1, 2, 6]) It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels. >>> le = preprocessing.LabelEncoder() >>> le.fit(["paris", "paris", "tokyo", "amsterdam"]) LabelEncoder() >>> list(le.classes_) ['amsterdam', 'paris', 'tokyo'] >>> le.transform(["tokyo", "tokyo", "paris"]) array([2, 2, 1]...) >>> list(le.inverse_transform([2, 2, 1])) ['tokyo', 'tokyo', 'paris'] See also -------- sklearn.preprocessing.OrdinalEncoder : Encode categorical features using an ordinal encoding scheme. sklearn.preprocessing.OneHotEncoder : Encode categorical features as a one-hot numeric array. | . .versionadded:: 0.12
>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
|
| """ Parameters Returns Parameters Returns Parameters Returns Parameters Returns |
Methods
|
| Fit label encoder |
|
| Fit label encoder and return encoded labels |
|
| Get parameters for this estimator. |
|
| Transform labels back to original encoding. |
|
| Set the parameters of this estimator. |
|
| Transform labels to normalized encoding. |
LabelEncoder函式的使用方法
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from DataScienceNYY.DataAnalysis import dataframe_fillAnyNull,Dataframe2LabelEncoder
#構造資料
train_data_dict={'Name':['張三','李四','王五','趙六','張七','李八','王十','un'],
'Age':[22,23,24,25,22,22,22,None],
'District':['北京','上海','廣東','深圳','山東','河南','浙江',' '],
'Job':['CEO','CTO','CFO','COO','CEO','CTO','CEO','']}
test_data_dict={'Name':['張三','李四','王十一',None],
'Age':[22,23,22,'un'],
'District':['北京','上海','廣東',''],
'Job':['CEO','CTO','UFO',' ']}
train_data_df = pd.DataFrame(train_data_dict)
test_data_df = pd.DataFrame(test_data_dict)
print(train_data_df,'\n',test_data_df)
#缺失資料填充
for col in train_data_df.columns:
train_data_df[col]=dataframe_fillAnyNull(train_data_df,col)
test_data_df[col]=dataframe_fillAnyNull(test_data_df,col)
print(train_data_df,'\n',test_data_df)
#資料LabelEncoder化
train_data,test_data=Dataframe2LabelEncoder(train_data_df,test_data_df)
print(train_data,'\n',test_data)
LabelEncoder函式的具體案例
1、基礎案例
LabelEncoder can be used to normalize labels.
>>>
>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])
It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.
>>>
>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1]...)
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']
2、在資料缺失和test資料記憶體在新值(train資料未出現過)環境下的資料LabelEncoder化
參考文章:Python之sklearn:LabelEncoder函式的使用方法之使用LabelEncoder之前的必要操作
import numpy as np
from sklearn.preprocessing import LabelEncoder
#訓練train資料
LE= LabelEncoder()
LE.fit(train_df[col])
#test資料中的新值添加到LE.classes_
test_df[col] =test_df[col].map(lambda s:'Unknown' if s not in LE.classes_ else s)
LE.classes_ = np.append(LE.classes_, 'Unknown')
#分別轉化train、test資料
train_df[col] = LE.transform(train_df[col])
test_df[col] = LE.transform(test_df[col])
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/199943.html
標籤:其他
上一篇:Python-心電預處理
下一篇:python notes
