冰天雪地裸體騰空720度，求一個pandas 取某一列資料，并替換空格和換行符錯誤-有解無憂

import codecs
import os
import regex
import pandas

# pandas 讀取之前清洗文本資料中的單引號

# pandas 處理資料
def pandas_fomat(filepath):
    print('開始清洗資料============')

    read_table = pandas.read_table(filepath, engine='python', sep='|', header=None, error_bad_lines=False)
    #處理第73列的資料，每一行都多個連續空格，且都有一個換行符
    read_table[[73]]=read_table[[73]].apply(lambda x: fix_lines(x), axis = 1)
    #就是上面這個lambda，也是稀里糊涂，請求大神幫忙
    read_table[[0,1,3,73]].to_csv('AppendTxt.txt', mode='a', header=True, index=None)


    print('============清洗資料完成')


def fix_lines(x):# 處理串列中的資料，更新到sql資料庫中
    result=''
    x=' '.join(x.split())
    #x = x.replace('                                ', ',')
    result = x.strip()
    return result

# 遍歷檔案夾
def walkFile(filespath):
    for root, dirs, files in os.walk(filespath):

        # root 表示當前正在訪問的檔案夾路徑
        # dirs 表示該檔案夾下的子目錄名list
        # files 表示該檔案夾下的檔案list

        # 遍歷檔案
        for f in files:

            fullpath=os.path.join(root, f)
            print(fullpath)
            pandas_fomat(fullpath)

        # 遍歷所有的檔案夾
        for d in dirs:
            print(os.path.join(root, d))

def main():
    current_dir = os.path.join(os.path.abspath(os.path.dirname(__file__)),'TextOut\\')
   # print(current_dir)
   # print('yishagndizhi')
    walkFile(current_dir)

if __name__ == '__main__':
    main()

資料格式：
1     2       2        3                                             5
6   6  4  f  4 4  s    er
以上是一行資料哈，最后我要處理成為
1,2,2,3,5,6,6,4,f,4,4,s,er

uj5u.com熱心網友回復：

1 2 2 3 5
6 6 4 f 4 4 s er
這些是73列的內容是嗎？

uj5u.com熱心網友回復：

參考 1 樓 chuifengde 的回復:

1 2 2 3 5
6 6 4 f 4 4 s er
這些是73列的內容是嗎？

是個這個是73列的其中一行的類容，
沒一行都是這樣的格式，實際上我只需要取分割后的倒數第2個逗號后面的額內容；也就是那個s

uj5u.com熱心網友回復：

def fix_lines(x):

    c = x.replace("\n",' ')

    while c.find("  ")>0:

        c = c.replace("  "," ")

    c = c.strip().replace(" ",',')

    return c

uj5u.com熱心網友回復：

#如果只要倒數第二個：

def fix_lines(x):

    c = x.replace("\n",' ')

    while c.find("  ")>0:

        c = c.replace("  "," ")

    c = c.strip().replace(" ",',')

    return c.split(",")[-2] if c.find(",")>0 else c



#--呼叫:

df['73'].fillna(" ").apply(fix_lines)

uj5u.com熱心網友回復：

    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'find'

提示這個錯誤，各位我開個百度云鏈接大家幫我看看唄‘
https://pan.baidu.com/s/1ScKuEfyhFmILpp8FDQ9KDQ
提取碼:xfou

10|000010|B|可口可樂330ml4                          |Coke 330ml4                             |組(Pcs)   |6|    330.00|ml(毫升)  |4|   |G|M|0|0|N|6| |20200409|C|N| | |               |正品        |上海        |          |               |.00|9999999.99|.00|201107|      | |0|       |.00|0|       |315|S|4957|YVON|6|可口可樂330ml4    |Y|            |可口可樂330ml4    |1|0|360|D           |                    |                                        |585958|67|P           |Y|Y|Y|N|N|N|N|    | |N|N|N|6|1| P|99| 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  COCACOLA                                COCACOLA                                可口可樂                                可口可樂L2                                                                                                                                                                                                           INV99901L20000                                                   10000810                   P                  99                   0                     1030307010000000000

以上是一條數
--------------------------------------
這一條資料按照 | 拆分后，我需要第1/2/4/73這四列的資料，其中第73列的資料里面，我僅僅需要中的  【可口可樂】這4個字，也就是【可口可樂L2】前面的那個4個子

uj5u.com熱心網友回復：

import pandas as pd



def fix_lines(x):

    c = x.replace("\n", ' ')

    while "  " in c:

        c = c.replace("  ", ' ')

    c = c.strip().split(' ')

    return c[1]





col = range(1, 75)

read_table = pd.read_csv(r"c:\31520200511ITM - 源.txt", sep='|', encoding='utf-8', dtype={2: str}, names=col)

print(read_table[74].fillna(" ").apply(fix_lines))

uj5u.com熱心網友回復：



s='''

10|000010|B|可口可樂330ml4                          |Coke 330ml4                             |組(Pcs)   |6|    330.00|ml(毫升)  |4|   |G|M|0|0|N|6| |20200409|C|N| | |               |正品        |上海        |          |               |.00|9999999.99|.00|201107|      | |0|       |.00|0|       |315|S|4957|YVON|6|可口可樂330ml4    |Y|            |可口可樂330ml4    |1|0|360|D           |                    |                                        |585958|67|P           |Y|Y|Y|N|N|N|N|    | |N|N|N|6|1| P|99| 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  COCACOLA                                COCACOLA                                可口可樂                                可口可樂L2                                                                                                                                                                                                           INV99901L20000                                                   10000810                   P                  99                   0                     1030307010000000000

'''

s1=s.split("|")[73]

s2=[ss for ss in s1.split(" ") if ss!=""]

print(s2)

['7', 'COCACOLA', 'COCACOLA', '可口可樂', '可口可樂L2', 'INV99901L20000', '10000810', 'P', '99', '0', '1030307010000000000\n']
沒有其他的資料支持，只能幫你寫到這里，自己提取吧

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/48388.html

標籤：腳本語言(Perl/Python)

上一篇：python 挑戰，是兄弟就來跑一下這個

下一篇：怎么用python讀取excel表里面的圖啊