我對 polyfit 的錯誤部分感到困惑。我有以下代碼:
def polyfit(df,columns, degree):
coef=[]
error=[]
x = np.array(list(range(0,df.shape[0])))
for skill in columns:
y=df[skill]
y=pd.to_numeric(y)
coeffs = numpy.polyfit(x, y, degree)
# Polynomial Coefficients
coef.append(coeffs.tolist()[0])
# r-squared
p = numpy.poly1d(coeffs)
# fit values, and mean
yhat = p(x) # or [p(z) for z in x]
ybar = numpy.sum(y)/len(y) # or sum(y)/len(y)
ssreg = numpy.sum((yhat-ybar)**2) # or sum([ (yihat - ybar)**2 for yihat in yhat])
sstot = numpy.sum((y - yhat)**2) # or sum([ (yi - ybar)**2 for yi in y])
error.append(ssreg / sstot)
results = pd.DataFrame({'skills':columns, 'coef': coef, 'error':error, 'error2':sstot})
return results
其中 df 的樣本是:
new_list administrative coordination administrative law administrative support
0 0.0465116 0.0232558 0.0581395
1 0.0714286 0 0.0285714
2 0.0210526 0 0.0421053
3 0.0288462 0.00961538 0.0961538
4 0.0714286 0.0238095 0.107143
5 0.00952381 0 0.0666667
6 0.0285714 0.00952381 0.0666667
7 0.0428571 0 0.0428571
8 0.111111 0.0277778 0.138889
9 0 0.0136986 0.0273973
結果如下:
polyfit(df,['administrative coordination', 'administrative law', 'administrative support'], 1)
skills coef error error2
0 administrative coordination -0.000573 0.002681 0.011538
1 administrative law 0.000511 0.020165 0.011538
2 administrative support 0.002245 0.036025 0.011538
但是為什么所有列的error2都相同?我在計算錯誤部分時犯了什么錯誤?我想找到錯誤最少的列。錯誤是指擬合線到資料點的最小距離。
uj5u.com熱心網友回復:
您的變數sstot是一個標量,將在回圈的每次迭代中重置。這意味著當您運行以下行時:
results = pd.DataFrame({'skills':columns, 'coef': coef, 'error':error, 'error2':sstot})
在 for 回圈的最新迭代中,該列error2將設定為 sstot 的標量值,這就是 error2 具有所有相同值的原因。
我猜你的意思是跟蹤每個 sstot skill,所以你可以創建一個名為error2, then set the column error2 equal to this list (like the lists you created forcoef anderror`) 的串列。例如:
def polyfit(df,columns, degree):
coef=[]
error=[]
error2=[]
x = np.array(list(range(0,df.shape[0])))
for skill in columns:
y=df[skill]
y=pd.to_numeric(y)
coeffs = np.polyfit(x, y, degree)
# Polynomial Coefficients
coef.append(coeffs.tolist()[0])
# r-squared
p = np.poly1d(coeffs)
# fit values, and mean
yhat = p(x) # or [p(z) for z in x]
ybar = np.sum(y)/len(y) # or sum(y)/len(y)
ssreg = np.sum((yhat-ybar)**2) # or sum([ (yihat - ybar)**2 for yihat in yhat])
sstot = np.sum((y - yhat)**2) # or sum([ (yi - ybar)**2 for yi in y])
error.append(ssreg / sstot)
error2.append(sstot)
results = pd.DataFrame({'skills':columns, 'coef': coef, 'error':error, 'error2':error2})
return results
使用您的樣本的結果df:
>>> polyfit(df,['administrative coordination', 'administrative law', 'administrative support'], 1)
skills coef error error2
0 administrative coordination -0.000573 0.002681 0.010100
1 administrative law 0.000511 0.020165 0.001069
2 administrative support 0.002245 0.036025 0.011538
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/417257.html
標籤:
上一篇:在Python中,如果串列中缺少一個值,您如何遍歷該串列?
下一篇:如何在回圈中多次向前跳過?
