我設法從資料源中提取了一個串列。串列元素的格式如下(注意第一個數字不是索引):
0 cheese 100
1 cheddar cheese 1100
2 gorgonzola 1300
3 smoked cheese 200
等等
這意味著列印時,一行包含“ 0 cheese 100”,所有空格。
我想做的是決議每個條目以將其分成兩個串列。我不需要第一個數字。相反,我想要奶酪型別和后面的數字。
例如:
cheese
cheddar cheese
gorgonzola
smoked cheese
和:
100
1100
1300
200
最終目標是能夠將這兩個串列歸因于 pd.DataFrame 中的列,以便可以以自己的方式處理它們。
任何幫助深表感謝。
uj5u.com熱心網友回復:
如果目標是一個資料框,為什么不直接制作它而不是兩個串列。如果你把你的字串變成一個系列,你可以pandas.Series.str.extract()把它分成你想要的列:
import pandas as pd
s = '''0 cheese 100
1 cheddar cheese 1100
2 gorgonzola 1300
3 smoked cheese 200'''
pd.Series(s.split('\n')).str.extract(r'.*?\s (?P<type>.*?)\s (?P<value>\d )')
這給出了一個資料框:
type value
0 cheese 100
1 cheddar cheese 1100
2 gorgonzola 1300
3 smoked cheese 200
uj5u.com熱心網友回復:
IIUC 您的字串是串列的元素。您可以re.split在找到兩個或多個空格的位置使用拆分:
import re
import pandas as pd
your_list = [
"0 cheese 100",
"1 cheddar cheese 1100",
"2 gorgonzola 1300",
"3 smoked cheese 200",
]
df = pd.DataFrame([re.split(r'\s{2,}', s)[1:] for s in your_list], columns=["type", "value"])
輸出:
type value
0 cheese 100
1 cheddar cheese 1100
2 gorgonzola 1300
3 smoked cheese 200
uj5u.com熱心網友回復:
我認為這些方面的某些東西可能會起作用:
import pandas as pd
import re
mylist=['0 cheese 100','1 cheddar cheese 200']
numbers = '[0-9]'
list1=[i.split()[-1] for i in mylist]
list2=[re.sub(numbers, '', i).strip() for i in mylist]
your_df=pd.DataFrame({'name1':list1,'name2':list2})
your_df
uj5u.com熱心網友回復:
我可以建議這個簡單的解決方案:
lines = [
"1 cheddar cheese 1100 ",
"2 gorgonzola 1300 ",
"3 smoked cheese 200",
]
for line in lines:
words = line.strip().split()
print( ' '.join( words[1:-1]), words[-1])
結果:
cheddar cheese 1100
gorgonzola 1300
smoked cheese 200
uj5u.com熱心網友回復:
如果你有:
text = '''0 cheese 100
1 cheddar cheese 1100
2 gorgonzola 1300
3 smoked cheese 200'''
# OR
your_list = [
'0 cheese 100',
'1 cheddar cheese 1100',
'2 gorgonzola 1300',
'3 smoked cheese 200'
]
text = '\n'.join(your_list)
正在做:
from io import StringIO
df = pd.read_csv(StringIO(text), sep='\s\s ', names=['col1', 'col2'], engine='python')
print(df)
輸出:
col1 col2
0 cheese 100
1 cheddar cheese 1100
2 gorgonzola 1300
3 smoked cheese 200
- 這會將第一個數字視為索引,但您可以
df=df.reset_index(drop=True)根據需要重置它。
uj5u.com熱心網友回復:
您可以通過使用切片來實作這一點:
from curses.ascii import isdigit
inList = ['0 cheese 100', '1 cheddar cheese 1100', '2 gorgonzola 1300', '3 smoked cheese 200']
cheese = []
prices = []
for i in inList:
temp = i[:19:-1] #Cuts out first number and all empty spaces until first character and reverses the string
counter = 0
counter2 = 0
for char in temp: #Temp is reversed, meaning the number e.g. '100' for 'cheese' is in front but reversed
if char.isdigit():
counter = 1
else: #If the character is an empty space, we know the number is over
prices.append((temp[:counter])[::-1]) #We know where the number begins (at position 0) and ends (at position counter), we flip it and store it in prices
cheeseWithSpace = (temp[counter:]) #Since we cut out the number, the rest has to be the cheese name with some more spaces in front
for char in cheeseWithSpace:
if char == ' ': #We count how many spaces are in front
counter2 = 1
else: #If we reach something other than an empty space, we know the cheese name begins.
cheese.append(cheeseWithSpace[counter2:][::-1]) #We know where the cheese name begins (at position counter2) cut everything else out, flip it and store it
break
break
print(prices)
print(cheese)
查看代碼內注釋以了解該方法。基本上,您使用 [::-1] 翻轉字串以使它們更易于處理。然后你一個一個地洗掉每個部分。
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/525825.html
標籤:Python熊猫列表解析
上一篇:如何訪問divid、cheerio節點js中的資料狀態
下一篇:評估代碼是否是JS功能的簡化子集
