pandas映射與資料轉換-有解無憂

在 pandas 中提供了利用映射關系來實作某些操作的函式，具體如下：

replace() 函式：替換元素；
map() 函式：新建一列；
rename() 函式：替換索引，

一、replace() 用映射替換元素

在資料處理時，經常會遇到需要將資料結構中原來的元素根據實際需求替換成新元素的情況，要想用新元素替換原來元素，就需要定義一組映射關系，在映射關系中，將舊元素作為鍵，新元素作為值，

例如，創建字典 fruits 用于指明水果標識和水果名稱的映射關系，

fruits={101:'orange',102:'apple',103:'banana'}

如要將用于存盤水果標識、水果數量和單價的 DataFrame 物件中的水果標識替換成水果名稱，就需要運用 replace() 函式，通過 fruits 映射關系來實作元素的替換，

replace() 函式的基本語法格式如下：

obj.replace(to_replace=None,value=https://www.cnblogs.com/aitree/p/None,inplace=False,limit=None,regex=
False,method='pad')

函式中的引數說明如下：

obj：DataFrame 或 Series 物件；
to_replace：接收 str、regex、list、dict、Series、int、float 或者 None，表示將被替換的值；
value：接收標量、字典、串列、str、正則運算式，默認為 None；用于替換與 to_replace 匹配的任何值的值；對于 DataFrame，可以使用值的 dict 來指定每列使用哪個值（不在 dict 中的列將不會被填充）；還允許使用正則運算式、字串和串列或這些物件的 dict；
inplace：接收布林值，默認為 False，如果是 True，將修改原來的資料；
limit：接收 int，默認為 None，用于限制填充次數；
regex：接收 bool 或與 to_replace 相同的型別，默認為 False，表示是否將 to_replace 或 value 解釋為正則運算式，如果是 True，那么 to_replace 必須是一個字串，當是正則運算式或正則運算式的串列、字典或陣列時，to_replace 必須為 None；
method：取值為 {'pad'，'ffill'，'bfill'，無}，表示替換時使用的方法，與缺失值填充方法類似，當 to_replace 是標量、串列或元組時，值為 None，

【例 1】利用 replace() 函式和映射關系實作將水果資料框中水果標識替換成水果名稱，
示例代碼 test1.py 如下：

import numpy as np
import pandas as pd
#創建水果標識與水果名稱的映射關系
fruits = {101:'orange',102:'apple',103:'banana'}
#創建水果資料框DataFrame
data = https://www.cnblogs.com/aitree/p/pd.DataFrame({'fru_No':[101,102,103]
                    ,'fru_Num':[1000,2000,3000]
                    ,'price':[3.56,4.2,2.5]})
#用映射替換fru_No列的元素
newDf = data.replace(fruits)
print(newDf)
#輸出如下
  fru_No  fru_Num  price
0 orange   1000    3.56
1 apple    2000    4.20
2 banana   3000    2.50

replace() 函式應用的示例代碼 example1.py 如下：

import numpy as np
import pandas as pd
from pandas import Series,DataFrame
s = Series([-1000,-999,2,3,4,5,-2000])
#單數值替換
print(s.replace(-2000,np.nan))
0 -1000.0
1 -999.0
2 2.0
3 3.0
4 4.0
5 5.0
6 NaN

#將多個數值替換
print(s.replace([-1000,-999],0))
0 0
1 0
2 2
3 3
4 4
5 5
6 -2000

#不同的值進行不同的替換
print(s.replace([-1000,-999],[np.nan,0]))
0 NaN
1 0.0
2 2.0
3 3.0
4 4.0
5 5.0
6 -2000.0

#用字典方式進行不同的替換
print(s.replace({-1000:np.nan,-999:0,-2000:np.nan}))
0 NaN
1 0.0
2 2.0
3 3.0
4 4.0
5 5.0
6 NaN

二、用映射添加元素

在【例 1】中介紹了利用函式和映射來實作將水果標識替換成水果名稱的方法，但是有時需要保留水果標識，將水果名稱添加到資料集中，

那么，這時可利用 map() 函式，通過構建 fruits 映射關系來實作元素的添加，

map() 函式是作用于 Series 或 DataFrame 物件的一列，它接收一個函式或表示映射關系的字典作為引數，它的基本語法格式如下：

Series.map(arg,na_action=None)

函式中的引數說明如下：

arg：接收 function、dict 或 Series，表示映射通信；
na_action：取值為{無，'忽略'}，默認值為 None，如果為'忽略'，則傳播 NA 值，而不將它們傳遞給映射對應關系，

【例 2】利用 map() 函式和映射關系實作將水果名稱添加到水果資料框中，
示例代碼 test2.py 如下：

import pandas as pd
#創建水果標識與水果名稱的映射關系
fruits = {101:'orange',102:'apple',103:'banana'}
#創建水果資料框DataFrame
data = https://www.cnblogs.com/aitree/p/pd.DataFrame({'fru_No':[101,102,103],'fru_Num':[1000,2000,3000],'price':
                    [3.56,4.2,2.5]})
#用映射為data添加fru_name列元素
data['fru_name'] = data['fru_No'].map(fruits)
print(data)
  fru_No fru_Num price fru_name
0  101   1000    3.56  orange
1  102   2000    4.20  apple
2  103   3000    2.50  banana

三、重命名行/列索引

在資料處理中，有時需要使用映射關系轉換軸標簽，pandas 的 rename() 函式是以表示映射關系的字典物件作為引數，替換軸的索引標簽，
rename() 函式的基本語法格式如下：

DataFrame.rename(mapper=None,index=None,columns=None,axis=None,copy=True,
inplace=False,level=None)
或
Series.rename(index=None,**kwargs)

函式中的引數說明如下：

mapper、index、columns：接收 dict或 function，表示將 dict 或函式轉換為應用于該軸的值，使用 mapper 引數要指定映射器；使用 columns 引數可重命名各列；
axis：接收 int 或 str，可選，表示映射器定位的軸，可以是軸名稱（“index”，“columns”）或數字（0,1），默認為“index”；
copy：接收 boolean，默認為 True，表示是否復制資料；
inplace：接收 boolean，默認為 False，如果為 True，將會修改原來的資料；
level：接收 int 或 level name，默認為 None，如果是 MultiIndex，只重命名指定級別中的標簽，

rename() 函式回傳值是 DataFrame 或 Series，
【例 3】利用 rename() 函式和映射關系重命名水果資料框的行索引和列索引，
示例代碼 test3.py 如下：

import pandas as pd
#創建行索引的映射關系
reindex = {0:'row1',1:'row2',2:'row3'}
#創建水果資料框DataFrame
data = https://www.cnblogs.com/aitree/p/pd.DataFrame({'fru_No':[101,102,103],'fru_Num':[1000,2000,3000],'price':
                    [3.56,4.2,2.5]})
  fru_No fru_Num price
0   101  1000   3.56
1   102  2000   4.20
2   103  3000   2.50

#用映射重命名水果資料框的行索引,產生新DataFrame，但原資料不改變
newDf = data.rename(reindex)
print(newDf)
    fru_No fru_Num price
row1  101   1000    3.56
row2  102   2000    4.20
row3  103   3000    2.50

#用映射重命名水果資料框的行索引,產生新DataFrame，但原資料改變
newDf = data.rename(reindex,inplace=True)
print(newDf) #newDf是None，data原資料改變
#創建列索引的映射關系
recolumns = {'fru_No':'col1','fru_Num':'col2','price':'col3'}
#用映射重命名水果資料框中的行索引和列索引
newDf = data.rename(index=reindex,columns=recolumns)
print(newDf)
    col1 col2 col3
row1 101 1000 3.56
row2 102 2000 4.20
row3 103 3000 2.50

#用映射重命名水果資料框的單個行索引和單個列索引
newDf = data.rename(index={'row2':'s1'},columns={'fru_No':'111'})
print(newDf)
     111 fru_Num price
row1 101  1000   3.56
s1   102  2000   4.20
row3 103  3000   2.50

注意：rename() 函式回傳一個經過改動的新 DataFrame 物件，但原 DataFrame 物件仍保持不變，如果要改變呼叫函式的物件本身，可使用 inplace 選項，并將其值設定為 True，

參考：https://www.92python.com/view/145.html

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/251397.html

標籤：Python

上一篇：萌新入門之python基礎語法

下一篇：pandas DataFrame的新增行列，修改、洗掉、篩選、判斷元素以及轉置操作