我從抓取的資料中date使用逗號分隔和 day in獲得了 兩次date column。我的目標是洗掉這一December 13, 2021Mon,部分,并希望為其創建一個單獨的/新列days,我還想洗掉最后一列,表示Volumn column.
腳本
import requests
import pandas as pd
isins=['LU0526609390:EUR','IE00BHBX0Z19:EUR']
dfs = []
for isin in isins:
html = requests.get(f'https://markets.ft.com/data/funds/tearsheet/historical?s={isin}').content
dfs.extend(pd.read_html(html))
df = pd.concat(dfs)
print(df)
預期產出
Day Date Open High Low Close
Monday Dec 13, 2021 77.77 77.77 77.77 77.77
Friday Dec 10, 2021 77.61 77.61 77.61 77.61
Thursday Dec 09, 2021 77.60 77.60 77.60 77.60
Wednesday Dec 08, 2021 77.47 77.47 77.47 77.47
Tuesday Dec 07, 2021 77.64 77.64 77.64 77.64
電流輸出
Date Open High Low Close Volume
Monday, December 13, 2021Mon, Dec 13, 2021 77.77 77.77 77.77 77.77 00.00
Friday, December 10, 2021Fri, Dec 10, 2021 77.61 77.61 77.61 77.61 ----
Thursday, December 09, 2021Thu, Dec 09, 2021 77.60 77.60 77.60 77.60 ----
Wednesday, December 08, 2021Wed, Dec 08, 2021 77.47 77.47 77.47 77.47 ----
Tuesday, December 07, 2021Tue, Dec 07, 2021 77.64 77.64 77.64 77.64 ----
先謝謝了!
uj5u.com熱心網友回復:
我在您的代碼中添加了必要的步驟:
import requests
import pandas as pd
isins=['LU0526609390:EUR','IE00BHBX0Z19:EUR']
dfs = []
for isin in isins:
html = requests.get(f'https://markets.ft.com/data/funds/tearsheet/historical?s={isin}').content
dfs.extend(pd.read_html(html))
df = pd.concat(dfs)
# get the Day column
df.insert(0, 'Day', df['Date'].apply(lambda d: d[:d.find(',')]))
# reformat Date to the desired format
df['Date'] = df['Date'].apply(lambda d: d[-12:])
# remove the Volume column
df.pop('Volume')
print(df)
在這三個操作之后,df看起來像這樣:
Day Date Open High Low Close
0 Monday Dec 13, 2021 77.77 77.77 77.77 77.77
1 Friday Dec 10, 2021 77.61 77.61 77.61 77.61
2 Thursday Dec 09, 2021 77.60 77.60 77.60 77.60
3 Wednesday Dec 08, 2021 77.47 77.47 77.47 77.47
4 Tuesday Dec 07, 2021 77.64 77.64 77.64 77.64
5 Monday Dec 06, 2021 77.70 77.70 77.70 77.70
6 Friday Dec 03, 2021 77.72 77.72 77.72 77.72
...
uj5u.com熱心網友回復:
我會在這里使用正則運算式進行拆分。然后你可以組合它們并在之后以你喜歡的方式決議:
import requests
import pandas as pd
isins=['LU0526609390:EUR','IE00BHBX0Z19:EUR']
dfs = []
for isin in isins:
html = requests.get(f'https://markets.ft.com/data/funds/tearsheet/historical?s={isin}').content
dfs.extend(pd.read_html(html))
df = pd.concat(dfs)
print(df)
df[['Date_alpha', 'Date_beta']] = df['Date'].str.split(r'(\d{4})(\w{1,3})',expand=True)[[0,1]]
df['Date'] = df['Date_alpha'] df['Date_beta']
df = df.drop(['Date_alpha', 'Date_beta'], axis=1)
輸出:
print(df)
Date Open High Low Close Volume
0 Monday, December 13, 2021 77.77 77.77 77.77 77.77 ----
1 Friday, December 10, 2021 77.61 77.61 77.61 77.61 ----
2 Thursday, December 09, 2021 77.60 77.60 77.60 77.60 ----
3 Wednesday, December 08, 2021 77.47 77.47 77.47 77.47 ----
4 Tuesday, December 07, 2021 77.64 77.64 77.64 77.64 ----
5 Monday, December 06, 2021 77.70 77.70 77.70 77.70 ----
6 Friday, December 03, 2021 77.72 77.72 77.72 77.72 ----
7 Thursday, December 02, 2021 77.56 77.56 77.56 77.56 ----
8 Wednesday, December 01, 2021 77.51 77.51 77.51 77.51 ----
9 Tuesday, November 30, 2021 77.52 77.52 77.52 77.52 ----
10 Monday, November 29, 2021 77.37 77.37 77.37 77.37 ----
11 Friday, November 26, 2021 77.44 77.44 77.44 77.44 ----
12 Thursday, November 25, 2021 77.11 77.11 77.11 77.11 ----
13 Wednesday, November 24, 2021 77.10 77.10 77.10 77.10 ----
14 Tuesday, November 23, 2021 77.02 77.02 77.02 77.02 ----
15 Monday, November 22, 2021 77.32 77.32 77.32 77.32 ----
16 Friday, November 19, 2021 77.52 77.52 77.52 77.52 ----
17 Thursday, November 18, 2021 77.38 77.38 77.38 77.38 ----
18 Wednesday, November 17, 2021 77.26 77.26 77.26 77.26 ----
19 Tuesday, November 16, 2021 77.24 77.24 77.24 77.24 ----
20 Monday, November 15, 2021 77.30 77.30 77.30 77.30 ----
0 Monday, December 13, 2021 11.09 11.09 11.09 11.09 ----
1 Friday, December 10, 2021 11.08 11.08 11.08 11.08 ----
2 Thursday, December 09, 2021 11.08 11.08 11.08 11.08 ----
3 Wednesday, December 08, 2021 11.06 11.06 11.06 11.06 ----
4 Tuesday, December 07, 2021 11.08 11.08 11.08 11.08 ----
5 Monday, December 06, 2021 11.09 11.09 11.09 11.09 ----
6 Friday, December 03, 2021 11.08 11.08 11.08 11.08 ----
7 Thursday, December 02, 2021 11.08 11.08 11.08 11.08 ----
8 Wednesday, December 01, 2021 11.05 11.05 11.05 11.05 ----
9 Tuesday, November 30, 2021 11.07 11.07 11.07 11.07 ----
10 Monday, November 29, 2021 11.07 11.07 11.07 11.07 ----
11 Friday, November 26, 2021 11.08 11.08 11.08 11.08 ----
12 Thursday, November 25, 2021 11.04 11.04 11.04 11.04 ----
13 Wednesday, November 24, 2021 11.03 11.03 11.03 11.03 ----
14 Tuesday, November 23, 2021 11.04 11.04 11.04 11.04 ----
15 Monday, November 22, 2021 11.07 11.07 11.07 11.07 ----
16 Friday, November 19, 2021 11.09 11.09 11.09 11.09 ----
17 Thursday, November 18, 2021 11.06 11.06 11.06 11.06 ----
18 Wednesday, November 17, 2021 11.05 11.05 11.05 11.05 ----
19 Tuesday, November 16, 2021 11.05 11.05 11.05 11.05 ----
20 Monday, November 15, 2021 11.05 11.05 11.05 11.05 ----
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/381165.html
上一篇:如何在網頁中獲取日期-https://www.cricbuzz.com/cricket-series/2810/indian-premier-league-2019/matchesinR
下一篇:scrapy沒有通過鏈接爬行
