我想學習 Python,并為此選擇了一個小型私人足球資料專案。我有以下問題:我想拉取過去4個賽季的資料。到目前為止,這適用于下面的代碼。但現在我想過濾掉每個聯賽的球隊,這些球隊不是所有 4 個賽季(降級球隊應該消失)。我不知道該怎么做,因為它只適用于個別聯賽。所以它必須在每個聯賽的每個賽季迭代,而不是所有聯賽的所有賽季。
import pandas as pd
import numpy as np
# leagues for England. E0 is Premier League, E1 is Championship and so on...
leagues = ["E0", "E1", "E2", "E3", "EC"]
seasons = ["2223", "2122", "2021", "1920"]
baseUrl = "https://www.football-data.co.uk/mmz4281/"
urls = []
for league in leagues:
for season in seasons:
url = str(baseUrl) str(season) "/" str(league) ".csv"
urls.append(url)
# load the data.
column_names = ["Div", "HomeTeam", "AwayTeam", "FTHG", "FTAG", "FTR"]
dfs = [pd.read_csv(url, encoding='cp1252', usecols=column_names)
for url in urls]
df = pd.concat(dfs, ignore_index=True)
舉個例子:如果一支球隊在 Season中從降級E0到,那么它就不會出現在 Season 中。如果是這種情況,所有 4 個季節中的所有行E12021E02122E0則應洗掉該團隊出現
我該如何實施?
uj5u.com熱心網友回復:
您的代碼幾乎準備就緒。您只需要添加一支小型for-loop過濾球隊,這些球隊參加過多個磁區:
print(df.shape)
# (8264, 6)
for team in df.HomeTeam.unique():
played_divs = df[df.HomeTeam==team].Div.unique()
if len(played_divs) > 1:
df = df[(df.HomeTeam != team)*(df.AwayTeam != team)]
print(df.shape)
# (2948, 6) (5316 rows were filtered for me)
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/533454.html
標籤:Python熊猫麻木的
