我一直在使用這種用法,但還沒有找到好的解決方案。我要求在 python 中找到一個解決方案,但在 R 中的解決方案也會有幫助。
我一直在得到看起來像這樣的資料:
import pandas as pd
data = {'Col1': ['Bob', '101', 'First Street', '', 'Sue', '102', 'Second Street', '', 'Alex' , '200', 'Third Street', '']}
df = pd.DataFrame(data)
Col1
0 Bob
1 101
3
4 Sue
5 102
6 Second Street
7
8 Alex
9 200
10 Third Street
11
我的真實資料中的模式確實像這樣重復。有時有一個空行(或多于 1 個),有時沒有任何空行。這里的重要部分是我需要將此列轉換為一行。
我希望資料看起來像這樣。
Name Address Street
0 Bob 101 First Street
1 Sue 102 Second Street
2 Alex 200 Third Street
我試過玩這個,但沒有任何效果。我的想法是一次遍歷幾行,將值分配給適當的列,然后逐行構建一個資料框。
x = len(df['Col1'])
holder = pd.DataFrame()
new_df = pd.DataFrame()
while x < 4:
temp = df.iloc[:5]
holder['Name'] = temp['Col1'].iloc[0]
holder['Address'] = temp['Col1'].iloc[1]
holder['Street'] = temp['Col1'].iloc[2]
new_df = pd.concat([new_df, holder])
df = temp[5:]
df.reset_index()
holder = pd.DataFrame()
x = len(df['Col1'])
new_df.head(10)
uj5u.com熱心網友回復:
在R,
data <- data.frame(
Col1 = c('Bob', '101', 'First Street', '', 'Sue', '102', 'Second Street', '', 'Alex' , '200', 'Third Street', '')
)
k<-which(grepl("Street", data$Col1) == TRUE)
j <- k-1
i <- k-2
data.frame(
Name = data[i,],
Adress = data[j,],
Street = data[k,]
)
Name Adress Street
1 Bob 101 First Street
2 Sue 102 Second Street
3 Alex 200 Third Street
或者,如果Street不是結尾Street但Adress總是一個數字,你也可以試試
j <- which(apply(data, 1, function(x) !is.na(as.numeric(x)) ))
i <- j-1
k <- j 1
uj5u.com熱心網友回復:
蟒蛇3
在 Python 3 中,您可以將 DataFrame 轉換為陣列,然后對其進行整形。
n = df.shape[0]
df2 = pd.DataFrame(
data=df.to_numpy().reshape((n//4, 4), order='C'),
columns=['Name', 'Address', 'Street', 'Empty'])
這會為您的示例資料生成:
Name Address Street Empty
0 Bob 101 First Street
1 Sue 102 Second Street
2 Alex 200 Third Street
如果您愿意,可以洗掉最后一列:
df2 = df2.drop(['Empty'], axis=1)
Name Address Street
0 Bob 101 First Street
1 Sue 102 Second Street
2 Alex 200 Third Street
一行代碼
df2 = pd.DataFrame(data=df.to_numpy().reshape((df.shape[0]//4, 4), order='C' ), columns=['Name', 'Address', 'Street', 'Empty']).drop(['Empty'], axis=1)
Name Address Street
0 Bob 101 First Street
1 Sue 102 Second Street
2 Alex 200 Third Street
uj5u.com熱心網友回復:
在 python 中,我相信這可能對你有幫助。
1 import pandas as pd
2
3 data = {'Col1': ['Bob', '101', 'First Street', '', 'Sue', '102', 'Second Street', '', 'Alex' , '200', 'Third Street', '']}
4
5 var = list(data.values())[0]
6 var2 = []
7 for aux in range(int(len(var)/4)):
8 var2.append(var[aux*4: aux*4 3])
9 data = pd.DataFrame(var2, columns=['Name', 'Address','Street',])
10 print(data)
uj5u.com熱心網友回復:
使用pandas,我們可以pivot在使用后完成cumcount并獲得密鑰
df.index=df.index//4
df['key'] = df.groupby(df.index).cumcount()
out = df.pivot(columns='key',values='Col1')
out.columns = ['Name', 'Address', 'Street', 'Empty']
out
Name Address Street Empty
0 Bob 101 First Street
1 Sue 102 Second Street
2 Alex 200 Third Street
uj5u.com熱心網友回復:
另一個 R 解決方案。此解決方案基于tidyverse包。示例資料框data來自 Park 的帖子 ( https://stackoverflow.com/a/69833814/7669809 )。
library(tidyverse)
data2 <- data %>%
mutate(ID = cumsum(Col1 %in% "")) %>%
filter(!Col1 %in% "") %>%
group_by(ID) %>%
mutate(Type = case_when(
row_number() == 1L ~"Name",
row_number() == 2L ~"Address",
row_number() == 3L ~"Street",
TRUE ~NA_character_
)) %>%
pivot_wider(names_from = "Type", values_from = "Col1") %>%
ungroup()
data2
# # A tibble: 3 x 4
# ID Name Address Street
# <int> <chr> <chr> <chr>
# 1 0 Bob 101 First Street
# 2 1 Sue 102 Second Street
# 3 2 Alex 200 Third Street
uj5u.com熱心網友回復:
DataFrame的值由未知行和4列重新整形,然后整個陣列的前3列通過切片取出并轉換為DataFrame,最后通過set_axis重置DataFrame的列
result = pd.DataFrame(df.values.reshape(-1, 4)[:, :-1])\
.set_axis(['Name', 'Address', 'Street'], axis=1)
result
>>>
Name Address Street
0 Bob 101 First Street
1 Sue 102 Second Street
2 Alex 200 Third Street
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/347960.html
