我正在處理來自 csv 檔案的資料。我已經使用 pd.read.csv 讀取了資料幀。如果條目在“Mobile_phone”列中有一個值,我想復制該行并將“Mobile_phone”值放在“Work_phone”列中。
這是我開始的資料-
Full name Work_phone Mobile_phone Company
1 Amanda Brown 1234567896 77895641225 A company
2 Bert Sutherland 1234567897 B company
3 Charlie Chaplin 1234567898 C company
4 Derek Simpson 1234567899 77895641228 D company
這是我想回傳的資料。因此消除了對“Mobile_phone”資料的需求,以便我可以使用另一個資料集 -
Full name Work_phone Mobile_phone Company
1 Amanda Brown 1234567896 A company
2 Amanda Brown 77895641225 A company
3 Bert Sutherland 1234567897 B company
4 Charlie Chaplin 1234567898 C company
5 Derek Simpson 1234567899 D company
6 Derek Simpson 77895641228 D company
uj5u.com熱心網友回復:
我們可以使用set_index stack進行從寬格式到長格式的重塑。然后通過droplevel舊的列標題進行清理,reset_index以恢復 RangeIndex 并再次創建 DataFrame,然后對列重新排序:
new_df = (
df.set_index(['Full name', 'Company']) # Columns to save
.stack() # go to long format
.droplevel(-1) # remove old column headers
.reset_index(name='Work_phone') # Restore Index and name new column
[['Full name', 'Work_phone', 'Company']] # re-order columns
)
new_df:
Full name Work_phone Company
0 Amanda Brown 1234567896 A company
1 Amanda Brown 77895641225 A company
2 Bert Sutherland 1234567897 B company
3 Charlie Chaplin 1234567898 C company
4 Derek Simpson 1234567899 D company
5 Derek Simpson 77895641228 D company
此外,如果需要,我們可以reindex不選擇列來添加回Mobile_phone列:
new_df = (
df.set_index(['Full name', 'Company']) # Columns to save
.stack() # go to long format
.droplevel(-1) # remove old column headers
.reset_index(name='Work_phone') # Restore Index and name new column
.reindex(
# re-order columns and add missing columns
columns=['Full name', 'Work_phone', 'Mobile_phone', 'Company']
)
)
new_df:
Full name Work_phone Mobile_phone Company
0 Amanda Brown 1234567896 NaN A company
1 Amanda Brown 77895641225 NaN A company
2 Bert Sutherland 1234567897 NaN B company
3 Charlie Chaplin 1234567898 NaN C company
4 Derek Simpson 1234567899 NaN D company
5 Derek Simpson 77895641228 NaN D company
使用的設定:
import pandas as pd
from numpy import nan
df = pd.DataFrame({
'Full name': ['Amanda Brown', 'Bert Sutherland', 'Charlie Chaplin',
'Derek Simpson'],
'Work_phone': [1234567896, 1234567897, 1234567898, 1234567899],
'Mobile_phone': ['77895641225', nan, nan, '77895641228'],
'Company': ['A company', 'B company', 'C company', 'D company']
})
注意:如果 Mobile_phone 包含空字串 ( '') 而不是NaN可能需要先洗掉那些,mask否則stack不會自動洗掉不需要的行:
df['Mobile_phone'] = df['Mobile_phone'].mask(df['Mobile_phone'].eq(''))
uj5u.com熱心網友回復:
TLDR
work_phone_df = df.drop("Mobile_phone", axis=1)
mobile_phone_df = df.drop("Work_phone", axis=1).dropna(subset=["Mobile_phone"]).rename(columns={"Mobile_phone": "Work_phone"})
new_df = pd.concat([work_phone_df, mobile_phone_df])
# if you need to sort your data and fix the index
new_df = new_df.sort_values(["Full name"]).reset_index(drop=True)
每個步驟解釋:
首先,您可以獲得包含每個人的姓名、公司和作業電話的資料框副本。
work_phone_df = df.drop("Mobile_phone", axis=1)
Full name Work_phone Company
0 Amanda Brown 1234567896 A company
1 Bert Sutherland 1234567897 B company
2 Charlie Chaplin 1234567898 C company
3 Derek Simpson 1234567899 D company
然后,與每個擁有手機的人一起獲取資料幀的副本,但將"Mobile_phone"列重命名為"Work_phone".
mobile_phone_df = df.drop("Work_phone", axis=1).dropna(subset=["Mobile_phone"]).rename(columns={"Mobile_phone": "Work_phone"})
Full name Work_phone Company
0 Amanda Brown 77895641225 A company
3 Derek Simpson 77895641228 D company
現在,您可以將它們連接在一起。
new_df = pd.concat([work_phone_df, mobile_phone_df])
Full name Work_phone Company
0 Amanda Brown 1234567896 A company
1 Bert Sutherland 1234567897 B company
2 Charlie Chaplin 1234567898 C company
3 Derek Simpson 1234567899 D company
0 Amanda Brown 77895641225 A company
3 Derek Simpson 77895641228 D company
我不確定您是否需要對此結果進行排序,但您可以使用
new_df = new_df.sort_values(["Full name"])
Full name Work_phone Company
0 Amanda Brown 1234567896 A company
0 Amanda Brown 77895641225 A company
1 Bert Sutherland 1234567897 B company
2 Charlie Chaplin 1234567898 C company
3 Derek Simpson 1234567899 D company
3 Derek Simpson 77895641228 D company
如果您需要重新編號索引,您可以執行以下操作
new_df = new_df.reset_index(drop=True)
Full name Work_phone Company
0 Amanda Brown 1234567896 A company
1 Amanda Brown 77895641225 A company
2 Bert Sutherland 1234567897 B company
3 Charlie Chaplin 1234567898 C company
4 Derek Simpson 1234567899 D company
5 Derek Simpson 77895641228 D company
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/369028.html
上一篇:如何隨機填充熊貓資料框中的X行?
