我有資料框:
import pandas as pd
d = {'domain': ['linkedin.com','aumniversal.tumblr.com','plasticdrea.ms','linkedin.com','s-lw.tumblr.com','newsonline.media','creshendo.co.vu','deadly-skz-gods-cb.tumblr.com','deo.progr.am']}
df = pd.DataFrame(d)
df
我想提取最后一個單詞之前的單詞(例如,在 .com 之前,但我那里不僅有 .com)。所以它將是:
domain words
0 linkedin.com linkedin
1 aumniversal.tumblr.com tumblr
2 plasticdrea.ms plasticdrea
3 linkedin.com linkedin
4 s-lw.tumblr.com tumblr
5 newsonline.media newsonline
6 creshendo.co.vu co
7 deadly-skz-gods-cb.tumblr.com tumblr
8 deo.progr.am progr
uj5u.com熱心網友回復:
利用str.extract
df['words'] = df['domain'].str.extract(r'([^.] )\.[^.]*$')
輸出:
domain words
0 linkedin.com linkedin
1 aumniversal.tumblr.com tumblr
2 plasticdrea.ms plasticdrea
3 linkedin.com linkedin
4 s-lw.tumblr.com tumblr
5 newsonline.media newsonline
6 creshendo.co.vu co
7 deadly-skz-gods-cb.tumblr.com tumblr
8 deo.progr.am progr
正則運算式演示
([^.] ) # capture word
\.[^.]* # followed by .xxx
$ # and end of line
uj5u.com熱心網友回復:
Series.str.split通過索引使用和選擇上一個最后一個值:
df['words'] = df['domain'].str.split('\.').str[-2]
print (df)
domain words
0 linkedin.com linkedin
1 aumniversal.tumblr.com tumblr
2 plasticdrea.ms plasticdrea
3 linkedin.com linkedin
4 s-lw.tumblr.com tumblr
5 newsonline.media newsonline
6 creshendo.co.vu co
7 deadly-skz-gods-cb.tumblr.com tumblr
8 deo.progr.am progr
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/521241.html
標籤:Python熊猫数据框
上一篇:迭代器模式回傳無限物件
下一篇:Flutter自定義日期格式
