Python機器學習實踐（三）監督學習篇2（線性模型--分類）-有解無憂

Python機器學習學習筆記與實踐
環境：win10 + Anaconda Python3.8
該篇總結各類監督學習演算法的實踐使用方法

1、二分類

線性模型也廣泛應用于分類問題，我們首先來看二分類，這時可以利用下面的公式進行預測：
y? = w[0] * x[0] + w[1] * x[1] + …+ w[p] * x[p] + b > 0
如果函式值小于 0，我們就預測類別 -1；如果函式值大于 0，我們就預測類別 +1，對于所有用于分類的線性模型，這個預測規則都是通用的，

最常見的兩種線性分類演算法是 Logistic 回歸（logistic regression）和線性支持向量機（linear support vector machine，線性 SVM，將 LogisticRegression 和 LinearSVC 模型應用到 forge 資料集上，并將線性模型找到的決策邊界可視化，代碼如下：

from sklearn.linear_model import LogisticRegression 
from sklearn.svm import LinearSVC 
import mglearn
import matplotlib.pyplot as plt
#產生forge資料集
X, y = mglearn.datasets.make_forge() 
fig, axes = plt.subplots(1, 2, figsize=(10, 3)) 
#畫出決策邊界
for model, ax in zip([LinearSVC(), LogisticRegression()], axes): 
    clf = model.fit(X, y) 
    mglearn.plots.plot_2d_separator(clf, X, fill=False, eps=0.5, ax=ax, alpha=.7) 
    mglearn.discrete_scatter(X[:, 0], X[:, 1], y, ax=ax) 
    ax.set_title("{}".format(clf.__class__.__name__)) 
    ax.set_xlabel("Feature 0") 
    ax.set_ylabel("Feature 1") 
axes[0].legend()
plt.show()

結果為：
在這里插入圖片描述

（1）make_forge()函式定義如下：

def make_forge():
	# a carefully hand-designed dataset lol
	X, y = make_blobs(centers=2, random_state=4, n_samples=30)
	y[np.array([7, 27])] = 0
	mask = np.ones(len(X), dtype=np.bool)
	mask[np.array([0, 1, 5, 26])] = 0
	X, y = X[mask], y[mask]
	return X, y

（2）plt.subplots函式簡單解釋如下：

fig, axes = plt.subplots(1, 2, figsize=(10, 3)) ，產生一行兩列共兩個子圖，
該函式回傳的是：
Returns:
fig：Figure
axes：axes.Axes or array of Axes
ax can be either a single Axes object or an array of Axes objects if more than one subplot was created.

（3）zip（）函式的簡單用法：

zip() 函式用于將可迭代的物件作為引數，將物件中對應的元素打包成一個個元組，然后回傳由這些元組組成的串列，

如果各個迭代器的元素個數不一致，則回傳串列長度與最短的物件相同，利用 * 號運算子，可以將元組解壓為串列，
以下實體展示了 zip 的使用方法：

>>>a = [1,2,3]
>>>b = [4,5,6]
>>>c = [4,5,6,7,8]
>>> zipped = zip(a,b)     # 打包為元組的串列
[(1, 4), (2, 5), (3, 6)]
>>> zip(a,c)              # 元素個數與最短的串列一致
[(1, 4), (2, 5), (3, 6)]

（4）兩個模型得到了相似的決策邊界，注意，兩個模型中都有兩個點的分類是錯誤的，兩個模型都默認使用 L2 正則化，就像 Ridge 對回歸所做的那樣，決定正則化強度的權衡引數叫作 C，C 值越大，對應的正則化越弱，上面的結果是默認引數c=1的結果，下面給出C=100和C=0.01時的結果：
C=100:
在這里插入圖片描述
C=0.01:

對比可見，C越大，則正則化約束越弱，直線越趨近于把所有的訓練點正確的分類，這樣一來直線的斜率可能會很大，容易發生過擬合，C越小，則直線越平穩、越保守，比較不容易過擬合，

2、多分類

將二分類演算法推廣到多分類演算法的一種常見方法是“一對其余”（one-vs.-rest）方法，說白了，就是把每一個類別都用一條直線和其他的類別分開，最后決策的時候，在對應類別上分數最高的分類器“勝出”，將這個類別標簽回傳作為預測結果，
我們將“一對其余”方法應用在一個簡單的三分類資料集上，我們用到了一個二維資料集，每個類別的資料都是從一個高斯分布中采樣得出的，
代碼如下：

from sklearn.datasets import make_blobs
from sklearn.svm import LinearSVC 
import mglearn
import matplotlib.pyplot as plt
import numpy as np
#匯入資料集并觀察
X,y = make_blobs(random_state=42)
mglearn.discrete_scatter(X[:, 0], X[:, 1], y)
plt.xlabel("Feature 0")
plt.ylabel("Feature 1")
plt.legend(["Class 0", "Class 1", "Class 2"])
plt.show()
#訓練模型
linear_svm = LinearSVC().fit(X, y)
print("Coefficient shape: ", linear_svm.coef_.shape)
print("Intercept shape: ", linear_svm.intercept_.shape)
#畫圖
mglearn.discrete_scatter(X[:, 0], X[:, 1], y)
line = np.linspace(-15, 15)
for coef, intercept, color in zip(linear_svm.coef_, linear_svm.intercept_,['b', 'r', 'g']):
    plt.plot(line, -(line * coef[0] + intercept) / coef[1], c=color)
plt.ylim(-10, 15)
plt.xlim(-10, 8)
plt.xlabel("Feature 0")
plt.ylabel("Feature 1")
plt.legend(['Class 0', 'Class 1', 'Class 2', 'Line class 0', 'Line class 1','Line class 2'], loc=(1.01, 0.3))
plt.show()

運行結果：
產生的資料集如下：
在這里插入圖片描述
模型訓練之后的直線系數和截距的Array形狀：

可視化圖：

（1）從系數和截距的維度或從圖中都可以明顯地看到，確實產生了3條直線

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/237240.html

標籤：python

上一篇：手寫演算法-python代碼實作Kmeans++以及優化

下一篇：pip安裝命令出現錯誤