因為本人在學習這塊內容之后,發現網路上大部分現有代碼的不簡潔以及運行報錯,再者想要的表達方法的不同,所以自己動手結合網路上已有的代碼改寫了一個,運行正常,
代碼及資料集以上傳到GitHub:https://github.com/zhurui-king/aaa
# -*- coding:utf-8 -*-
# Author: 非魚子焉
# Creation_time: 2020.11.11
# Content: 基于西瓜資料集的KNN演算法實作
# Blog: https://zhu-rui.blog.csdn.net/
# GitHub: https://github.com/zhurui-king/aaa
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# KNN演算法類
class KNN(object):
def __init__(self, x, y, K):#x:密度;y:含糖率;k:近鄰數
self.x = x
self.y = y
self.K = K
self.n = len(x)
# 計算距離
def distance(self, p1, p2):
return np.linalg.norm(np.array(p1) - np.array(p2))
#演算法實作
def knn(self, x):
distance = []
for i in range(self.n):
dist = self.distance(x, self.x[i])
distance.append([self.x[i], self.y[i], dist])
distance.sort(key=lambda x: x[2])
neighbors = []
neighbors_labels = []
for k in range(self.K):
neighbors.append(distance[k][0]) # 近鄰具體資料
neighbors_labels.append(distance[k][1]) # 近鄰標記
return neighbors, neighbors_labels
#選擇多數投票數
def vote(self, x):
neighbors, neighbors_labels = self.knn(x)
vote = {} # 投票法
for label in neighbors_labels:
vote[label] = vote.get(label, 0) + 1
sort_vote = sorted(vote.items(), key=lambda x:x[1], reverse=True)
return sort_vote[0][0] # 回傳投票數最多的標記
#對應標記
def fit(self):
labels = []
for sample in self.x:
label = self.vote(sample)
labels.append(label)
return labels # 回傳所有樣本的標記
# 計算正確率
def accuracy(self):
predict_labels = self.fit()
real_labels = self.y
correct = 0
for predict, real in zip(predict_labels, real_labels):
if int(predict) == int(real):
correct += 1
return correct / self.n
#讀取資料
def getdata(path):
dataSet = pd.read_csv(path, delimiter=",")
X = dataSet[['density', 'sugar_rate']].values
Y = dataSet['label']
return X,Y
# 進行繪圖
def drawpictures(x_positive, y_positive,x_negative, y_negative):
plt.scatter(x_positive, y_positive, marker='o', color='red', label='1')
plt.scatter(x_negative, y_negative, marker='o', color='blue', label='0')
plt.xlabel('密度')
plt.ylabel('含糖率')
plt.rcParams['font.sans-serif'] = ['SimHei'] # 用來正常顯示中文標簽
plt.legend(loc='upper left')
plt.show()
#訓練資料
def train(X,Y):
for k in range(1, 9):
print("*****第%d次*****" %k)
print('本次knn的k值選取為{}'.format(k))
knn = KNN(X, Y, k)
predict = knn.fit()
print('本次knn的正確率為{}'.format(knn.accuracy()))
x_positive = []
y_positive = []
x_negative = []
y_negative = []
for i in range(len(X)):
if int(predict[i]) == 1:
x_positive.append(X[i][0])
y_positive.append(X[i][1])
else:
x_negative.append(X[i][0])
y_negative.append(X[i][1])
drawpictures(x_positive, y_positive,x_negative, y_negative)
if __name__ == '__main__':
X,Y = getdata('watermelon3_0a.csv')
train(X,Y)
print("************程式運行結束************")
最終結果輸出為所選擇近鄰K的對應的正確率,并且進行plot可視化,其中回圈遍歷每一次的K值(K從1開始到所設定的值-1為止)
參考博文:https://blog.csdn.net/weixin_42152526/article/details/93528560
參考書籍:MACHINE LEARNING 機器學習(周志華)清華大學出版社
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/216450.html
標籤:其他
