Python之sklearn：LabelEncoder函式簡介(編碼與編碼還原)、使用方法、具體案例之詳細攻略-有解無憂

Python之sklearn：LabelEncoder函式簡介(編碼與編碼還原)、使用方法、具體案例之詳細攻略

LabelEncoder函式的簡介(編碼與編碼還原)

Methods

LabelEncoder函式的使用方法

LabelEncoder函式的具體案例

1、基礎案例

2、在資料缺失和test資料記憶體在新值(train資料未出現過)環境下的資料LabelEncoder化

LabelEncoder函式的簡介(編碼與編碼還原)

class LabelEncoder Found at: sklearn.preprocessing._labelclass LabelEncoder(TransformerMixin, BaseEstimator):
"""Encode target labels with value between 0 and n_classes-1.
This transformer should be used to encode target values, *i.e.* `y`, and not the input `X`.
Read more in the :ref:`User Guide <preprocessing_targets>`.

""對目標標簽進行編碼，值在0到n_class -1之間，

這個轉換器應該用于編碼目標值，*即' y '，而不是輸入' X '，

更多內容見:ref: ' User Guide '，

.. versionadded:: 0.12

Attributes
----------
classes_ : array of shape (n_class,)
Holds the label for each class.

Examples
--------
`LabelEncoder` can be used to normalize labels.

>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])

It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.

>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1]...)
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']

See also
--------
sklearn.preprocessing.OrdinalEncoder : Encode categorical features using an ordinal encoding scheme.
sklearn.preprocessing.OneHotEncoder : Encode categorical features as a one-hot numeric array.

. .versionadded:: 0.12

屬性
----------
classes_:形狀陣列(n_class，)
保存每個類的標簽，

例子
--------
“LabelEncoder”可用于規范化標簽，

>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])

它還可以用于將非數字標簽(只要它們是可hashable和可比的)轉換為數字標簽，

>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1]...)
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']

另請參閱
--------
sklearn.preprocessing.OrdinalEncoder :序號編碼器:使用序號編碼方案編碼分類特征，
sklearn.preprocessing.OneHotEncoder : 將分類特性編碼為一個熱的數字陣列，

"""
def fit(self, y):
"""Fit label encoder

Parameters
----------
y : array-like of shape (n_samples,)
Target values.

Returns
-------
self : returns an instance of self.
"""
y = column_or_1d(y, warn=True)
self.classes_ = _encode(y)
return self

def fit_transform(self, y):
"""Fit label encoder and return encoded labels

Parameters
----------
y : array-like of shape [n_samples]
Target values.

Returns
-------
y : array-like of shape [n_samples]
"""
y = column_or_1d(y, warn=True)
self.classes_, y = _encode(y, encode=True)
return y

def transform(self, y):
"""Transform labels to normalized encoding.

Parameters
----------
y : array-like of shape [n_samples]
Target values.

Returns
-------
y : array-like of shape [n_samples]
"""
check_is_fitted(self)
y = column_or_1d(y, warn=True)
# transform of empty array is empty array
if _num_samples(y) == 0:
return np.array([])
_, y = _encode(y, uniques=self.classes_, encode=True)
return y

def inverse_transform(self, y):
"""Transform labels back to original encoding.

Parameters
----------
y : numpy array of shape [n_samples]
Target values.

Returns
-------
y : numpy array of shape [n_samples]
"""
check_is_fitted(self)
y = column_or_1d(y, warn=True)
# inverse transform of empty array is empty array
if _num_samples(y) == 0:
return np.array([])
diff = np.setdiff1d(y, np.arange(len(self.classes_)))
if len(diff):
raise ValueError(
"y contains previously unseen labels: %s" % str(diff))
y = np.asarray(y)
return self.classes_[y]

def _more_tags(self):
return {'X_types':['1dlabels']}

Methods

`fit`(y)	Fit label encoder
`fit_transform`(y)	Fit label encoder and return encoded labels
`get_params`([deep])	Get parameters for this estimator.
`inverse_transform`(y)	Transform labels back to original encoding.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(y)	Transform labels to normalized encoding.

LabelEncoder函式的使用方法

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from DataScienceNYY.DataAnalysis import dataframe_fillAnyNull,Dataframe2LabelEncoder


#構造資料
train_data_dict={'Name':['張三','李四','王五','趙六','張七','李八','王十','un'],
                'Age':[22,23,24,25,22,22,22,None],
                'District':['北京','上海','廣東','深圳','山東','河南','浙江',' '],
                'Job':['CEO','CTO','CFO','COO','CEO','CTO','CEO','']}
test_data_dict={'Name':['張三','李四','王十一',None],
                'Age':[22,23,22,'un'],
                'District':['北京','上海','廣東',''],
                'Job':['CEO','CTO','UFO',' ']}
train_data_df = pd.DataFrame(train_data_dict)
test_data_df = pd.DataFrame(test_data_dict)
print(train_data_df,'\n',test_data_df)


#缺失資料填充
for col in train_data_df.columns:
        train_data_df[col]=dataframe_fillAnyNull(train_data_df,col)
        test_data_df[col]=dataframe_fillAnyNull(test_data_df,col)
print(train_data_df,'\n',test_data_df)


#資料LabelEncoder化
train_data,test_data=Dataframe2LabelEncoder(train_data_df,test_data_df)
print(train_data,'\n',test_data)

LabelEncoder函式的具體案例

1、基礎案例

LabelEncoder can be used to normalize labels.

>>>
>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])
It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.

>>>
>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1]...)
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']

2、在資料缺失和test資料記憶體在新值(train資料未出現過)環境下的資料LabelEncoder化

參考文章：Python之sklearn：LabelEncoder函式的使用方法之使用LabelEncoder之前的必要操作

import numpy as np
from sklearn.preprocessing import LabelEncoder

#訓練train資料
LE= LabelEncoder()
LE.fit(train_df[col])

#test資料中的新值添加到LE.classes_
test_df[col] =test_df[col].map(lambda s:'Unknown' if s not in LE.classes_ else s) 
LE.classes_ = np.append(LE.classes_, 'Unknown') 
 
#分別轉化train、test資料
train_df[col] = LE.transform(train_df[col]) 
test_df[col] = LE.transform(test_df[col])

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/199943.html

標籤：其他

上一篇：Python-心電預處理

下一篇：python notes