目錄
- 專案來源
- 專案簡介
- 匯入模塊
- 將country表和league表連接起來
- 按隊名排序中的前十名
- 輸出spain主客隊比賽的資訊
- 統計各個國家的各個聯賽的各個賽季中stage大于10的球隊主客隊平均得分,主客隊平均分之和與差,以及總和
- 列出幾支球隊的各賽季平均得分趨勢圖
- 列出幾支球隊的各賽季平均主客隊分差趨勢圖
- 求各賽季各球隊的積分(贏球得3分,平局得1分,輸球不得分)
- 列出幾支球隊的各賽季積分趨勢圖
- 結束語
今日份分享!
專案來源
https://www.kaggle.com/hugomathien/soccer
專案簡介
利用SQL和pandas對11支球隊在7個賽季中的25979場比賽資料,分析各球隊在每個賽季的主客場得分情況以及聯賽積分情況,
PS: 專案本身提供的是sqlite檔案,為了更貼近現實中的作業場景,我將其中的表都輸出成csv檔案,再利用Navicat for MySQL 匯入到MySQL當中,(文末附資源鏈接)
本次專案同樣在jupyter上運行,
匯入模塊
import pymysql
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei'] # 設定漢字字體,優先使用黑體
plt.rcParams['font.size'] = 12 # 設定字體大小
plt.rcParams['axes.unicode_minus'] = False # 設定正常顯示負號
資料庫中的表格:
conn = pymysql.connect(
host = 'localhost',
user = 'root',
password = '',
db = 'data',
port = 3306
)
df = pd.read_sql("show tables",conn)
df

國家名單:
countries = pd.read_sql("""SELECT *
FROM Country;""", conn)
countries

將country表和league表連接起來
leagues = pd.read_sql("""SELECT *
FROM League
JOIN Country ON Country.id = League.country_id;""", conn)
leagues

按隊名排序中的前十名
teams = pd.read_sql("""SELECT *
FROM Team
ORDER BY team_long_name
LIMIT 10;""", conn)
teams

輸出spain主客隊比賽的資訊
detailed_matches = pd.read_sql("""SELECT Match.id,
Country.name AS country_name,
League.name AS league_name,
season,
stage,
date,
HT.team_long_name AS home_team,
AT.team_long_name AS away_team,
home_team_goal,
away_team_goal
FROM `match`
JOIN Country on Country.id = Match.country_id
JOIN League on League.id = Match.league_id
LEFT JOIN Team AS HT on HT.team_api_id = Match.home_team_api_id
LEFT JOIN Team AS AT on AT.team_api_id = Match.away_team_api_id
WHERE country.name = 'Spain'
ORDER by date
LIMIT 10;""", conn)
detailed_matches

統計各個國家的各個聯賽的各個賽季中stage大于10的球隊主客隊平均得分,主客隊平均分之和與差,以及總和
leages_by_season = pd.read_sql("""SELECT Country.name AS country_name,
League.name AS league_name,
season,
count(distinct stage) AS number_of_stages,
count(distinct HT.team_long_name) AS number_of_teams,
avg(home_team_goal) AS avg_home_team_scors,
avg(away_team_goal) AS avg_away_team_goals,
avg(home_team_goal-away_team_goal) AS avg_goal_dif,
avg(home_team_goal+away_team_goal) AS avg_goals,
sum(home_team_goal+away_team_goal) AS total_goals
FROM `match`
JOIN Country on Country.id = Match.country_id
JOIN League on League.id = Match.league_id
LEFT JOIN Team AS HT on HT.team_api_id = Match.home_team_api_id
LEFT JOIN Team AS AT on AT.team_api_id = Match.away_team_api_id
WHERE country.name in ('Spain', 'Germany', 'France', 'Italy', 'England')
GROUP BY Country.name, League.name, season
HAVING count(distinct stage) > 10
ORDER BY Country.name, League.name, season DESC
;""", conn)
leages_by_season.head(10)

列出幾支球隊的各賽季平均得分趨勢圖
df = pd.DataFrame(index=np.sort(leages_by_season['season'].unique()), columns=leages_by_season['country_name'].unique())
df.loc[:,'Germany'] = list(leages_by_season.loc[leages_by_season['country_name']=='Germany','avg_goals'])
df.loc[:,'Spain'] = list(leages_by_season.loc[leages_by_season['country_name']=='Spain','avg_goals'])
df.loc[:,'France'] = list(leages_by_season.loc[leages_by_season['country_name']=='France','avg_goals'])
df.loc[:,'Italy'] = list(leages_by_season.loc[leages_by_season['country_name']=='Italy','avg_goals'])
df.loc[:,'England'] = list(leages_by_season.loc[leages_by_season['country_name']=='England','avg_goals'])
df.plot(figsize=(12,5),title='各賽季平均得分趨勢圖')

列出幾支球隊的各賽季平均主客隊分差趨勢圖
df = pd.DataFrame(index=np.sort(leages_by_season['season'].unique()), columns=leages_by_season['country_name'].unique())
df.loc[:,'Germany'] = list(leages_by_season.loc[leages_by_season['country_name']=='Germany','avg_goal_dif'])
df.loc[:,'Spain'] = list(leages_by_season.loc[leages_by_season['country_name']=='Spain','avg_goal_dif'])
df.loc[:,'France'] = list(leages_by_season.loc[leages_by_season['country_name']=='France','avg_goal_dif'])
df.loc[:,'Italy'] = list(leages_by_season.loc[leages_by_season['country_name']=='Italy','avg_goal_dif'])
df.loc[:,'England'] = list(leages_by_season.loc[leages_by_season['country_name']=='England','avg_goal_dif'])
df.plot(figsize=(12,5),title='各賽季平均主客隊分差趨勢圖')

求各賽季各球隊的積分(贏球得3分,平局得1分,輸球不得分)
team_season_score = pd.read_sql("""SELECT
m.season,
t.team_long_name,
t.team_short_name,
t.team_api_id,
COUNT(*) AS 'matches_played',
SUM(
CASE
WHEN (t.team_api_id=m.home_team_api_id AND m.home_team_goal>m.away_team_goal) OR
(t.team_api_id=m.away_team_api_id AND m.home_team_goal<m.away_team_goal) THEN 3
WHEN (t.team_api_id=m.home_team_api_id AND m.home_team_goal=m.away_team_goal) OR
(t.team_api_id=m.away_team_api_id AND m.home_team_goal=m.away_team_goal) THEN 1
WHEN (t.team_api_id=m.home_team_api_id AND m.home_team_goal<m.away_team_goal) OR
(t.team_api_id=m.away_team_api_id AND m.home_team_goal>m.away_team_goal) THEN 0
END
) AS 'points',
SUM(IF(t.team_api_id=m.home_team_api_id, m.home_team_goal, m.away_team_goal)) AS 'goal_for',
SUM(IF(t.team_api_id=m.home_team_api_id, m.away_team_goal, m.home_team_goal)) AS 'goal_against',
SUM(IF(t.team_api_id=m.home_team_api_id, m.home_team_goal, m.away_team_goal))-SUM(IF(t.team_api_id=m.home_team_api_id, m.away_team_goal, m.home_team_goal)) AS 'goal_diff'
FROM Team t JOIN `match` m ON t.team_api_id=m.home_team_api_id OR t.team_api_id=m.away_team_api_id
WHERE t.team_api_id in (10260, 8634, 8650, 9904, 8636)
GROUP BY team_api_id, season
ORDER BY season, points DESC, goal_diff DESC, team_long_name;""", conn)
team_season_score.head(10)

列出幾支球隊的各賽季積分趨勢圖
df = pd.DataFrame(index=np.sort(team_season_score['season'].unique()), columns=team_season_score['team_short_name'].unique())
df.loc[:,'MUN'] = list(team_season_score.loc[team_season_score['team_short_name']=='MUN','points'])
df.loc[:,'BAR'] = list(team_season_score.loc[team_season_score['team_short_name']=='BAR','points'])
df.loc[:,'LIV'] = list(team_season_score.loc[team_season_score['team_short_name']=='LIV','points'])
df.loc[:,'HAN'] = list(team_season_score.loc[team_season_score['team_short_name']=='HAN','points'])
df.loc[:,'INT'] = list(team_season_score.loc[team_season_score['team_short_name']=='INT','points'])
df.plot(figsize=(12,5),title='各賽季積分趨勢圖')

結束語
此次專案的資料提供了不少資料,就連球員的EA SPORT游戲中的資料都有,不過這個專案我主要也是拿來練手,并沒有刻意去將全部的資料都拿來分析,大概感興趣的話也可以自己嘗試一下,
我把最初kaggle提供的sqlite檔案,后面我自己輸出的csv檔案和sql檔案以及本次完整代碼都放到了網盤上,需要的朋友自行下載:
鏈接:https://pan.baidu.com/s/1CbrdX0q18W9v-PgNyZfRPg
提取碼:1024
推薦關注的專欄
👨?👩?👦?👦 機器學習:分享機器學習實戰專案和常用模型講解
👨?👩?👦?👦 資料分析:分享資料分析實戰專案和常用技能整理
CSDN@報告,今天也有好好學習
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/305209.html
標籤:python
上一篇:圖表可視化配色學習——自動提取圖片主色調與配色方案(python、pillow、haishoku和seaborn),并生成十六進制顏色編碼
