str.extract()與正則運算式-有解無憂

我需要使用正則運算式和str.extract()（假設這是最好的）將一列分成兩部分

    df = pd.DataFrame({
                        'Product': ['Truly Mix 2/12Pk Cans - 12Z',
                                    'Bud 16Z - LOOSE -  16Z',
                                    'Blue Moon (Case 12x - 22Z)',
                                    '2 for the show (6/4PK - 16Z)']
             })

我想要這樣的結果：

df_result = pd.DataFrame({
                          'Product': ['Truly Mix', 'Bud', 'Blue Moon', '2 for the show'],
                          'Packaging': ['2/12Pk Cans - 12Z',
                                        '16Z - LOOSE -  16Z',
                                        'Case 12x - 22Z',
                                        '6/4PK - 16Z' ]
                 })

我嘗試了很多東西，但仍然在使用正則運算式掙扎，即使經過大量的在線學習。

這是我獲得產品的最后一次嘗試：

pattern = r'(\D )[^\w][^(Case][^0-9]'

df['Product'] = df['Product'].str.extract(pattern)

str.replace() 應該可以很好地擺脫括號，只是不能走那么遠。

3小時后我什至沒有關閉。

uj5u.com熱心網友回復：

您可以將每個條目的兩個部分提取到兩列中，然后洗掉它們所在字串的開頭/結尾處的(和)：

import pandas as pd
df = pd.DataFrame({'Product': ['Truly Mix 2/12Pk Cans - 12Z','Bud 16Z - LOOSE -  16Z','Blue Moon (Case 12x - 22Z)','2 for the show (6/4PK - 16Z)']})
pattern = r'^(.*?)\s*((?:\((?:Case\b)?|\d (?:/\d )?[A-Za-z] \b).*)'
df[['Product', 'Packaging']] = df['Product'].str.extract(pattern, expand=True)
df['Packaging'] = df['Packaging'].str.replace(r'^\((.*)\)$', r'\1', regex=True)
# => >>> print(df['Packaging'])
#    0     2/12Pk Cans - 12Z
#    1    16Z - LOOSE -  16Z
#    2        Case 12x - 22Z
#    3           6/4PK - 16Z
# => >>> print(df['Product'])
#    0         Truly Mix
#    1               Bud
#    2         Blue Moon
#    3    2 for the show

請參閱正則運算式演示。正則運算式詳細資訊：

^ - 字串的開始
(.*?)- 第 1 組：盡可能少的除換行符以外的零個或多個字符 - \s*- 零個或多個空格
((?:\((?:Case\b)?|\d (?:/\d )?[A-Za-z] \b).*) - 第 2 組：
- (?:\((?:Case\b)?|\d (?:/\d )?[A-Za-z] \b) - 任何一個
  - \((?:Case\b)?- a(然后是一個可選的全字Case
  - | - 或者
  - \d (?:/\d )?[A-Za-z] \b- 一個或多個數字，一個或多個數字的可選序列/，一個或多個字母（后跟單詞邊界）
- .* - 盡可能多的除換行符以外的零個或多個字符

該.replace(r'^$(.*)$$', r'\1', regex=True)部分洗掉(和)在字串的開頭和結尾，它們都存在。

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/384381.html

標籤：正则表达式熊猫

上一篇：基于多索引值重新排列pandasDataFrame中的行pandas-way

下一篇：在Python中創建圖形儀表板