因為我是 python 的新手,因為我試圖拆分文本資料并轉換為 excel 列和行記錄。假設我有 100 條記錄,因為我需要拆分為 1-7 是一列,8-8 是第二列,9-10 是第三列,11-18 是第四列,第 5 列是 19-24,第 6 列是25-124,7th 列是 125-1000。下面的示例記錄在 text.txt 中。我想根據上述字符轉換成excel檔案。誰能幫助我將不勝感激。
示例文本格式:
animals210 redwingsclearmist
animals220 redwingsclearmist
animals230 redwingsclearmist
animals240 redwingsclearmist
輸出格式示例:
0 1 2 3 4
0 animals 210 red wings clearmist
1 animals 210 red wings clearmist
2 animals 210 red wings clearmist
3 animals 210 red wings clearmist
uj5u.com熱心網友回復:
你可以結合itertools.tee和zip_longest
拆分函式:
from itertools import tee, zip_longest
def split_by_index(s):
indices = [0,7,10,14,20]
start, end = tee(indices)
next(end)
return " ".join([s[i:j] for i,j in zip_longest(start, end)])
你的資料:
import pandas as pd
df = pd.DataFrame()
df["sentence"] = ["animals120 redlivinginjungle",
"animals140 redlivinginjungle",
"animals160 redlivinginjungle"]
sentence
0 animals120 redlivinginjungle
1 animals140 redlivinginjungle
2 animals160 redlivinginjungle
然后應用函式來創建新的資料框:
new_df = df["sentence"].apply(split_by_index).str.split(expand=True)
輸出
print(new_df)
0 1 2 3 4
0 animals 120 red living injungle
1 animals 140 red living injungle
2 animals 160 red living injungle
uj5u.com熱心網友回復:
使用.str存取器
column_splits = {'first': [0, 7], 'second': [7, 10]}
for column, limits in column_splits.items():
start, end = limits
df[column] = df['your_column'].str[start: end]
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/383418.html
上一篇:根據條件從現有資料框創建新資料框
