如何計算polyfit擬合線與資料的誤差部分？-有解無憂

我對 polyfit 的錯誤部分感到困惑。我有以下代碼：

def polyfit(df,columns, degree):
    coef=[]
    error=[]
    x = np.array(list(range(0,df.shape[0])))
    for skill in columns:
        y=df[skill]
        y=pd.to_numeric(y)
        coeffs = numpy.polyfit(x, y, degree)

        # Polynomial Coefficients
        coef.append(coeffs.tolist()[0])

        # r-squared
        p = numpy.poly1d(coeffs)
        # fit values, and mean
        yhat = p(x)                         # or [p(z) for z in x]
        ybar = numpy.sum(y)/len(y)          # or sum(y)/len(y)
        ssreg = numpy.sum((yhat-ybar)**2)   # or sum([ (yihat - ybar)**2 for yihat in yhat])
        sstot = numpy.sum((y - yhat)**2)    # or sum([ (yi - ybar)**2 for yi in y])
        error.append(ssreg / sstot)
        
    results = pd.DataFrame({'skills':columns, 'coef': coef, 'error':error, 'error2':sstot})
    return results

其中 df 的樣本是：

new_list    administrative coordination administrative law  administrative support
0   0.0465116   0.0232558   0.0581395
1   0.0714286   0   0.0285714
2   0.0210526   0   0.0421053
3   0.0288462   0.00961538  0.0961538
4   0.0714286   0.0238095   0.107143
5   0.00952381  0   0.0666667
6   0.0285714   0.00952381  0.0666667
7   0.0428571   0   0.0428571
8   0.111111    0.0277778   0.138889
9   0   0.0136986   0.0273973

結果如下：

polyfit(df,['administrative coordination',  'administrative law',   'administrative support'], 1)

skills  coef    error   error2
0   administrative coordination -0.000573   0.002681    0.011538
1   administrative law  0.000511    0.020165    0.011538
2   administrative support  0.002245    0.036025    0.011538

但是為什么所有列的error2都相同？我在計算錯誤部分時犯了什么錯誤？我想找到錯誤最少的列。錯誤是指擬合線到資料點的最小距離。

uj5u.com熱心網友回復：

您的變數sstot是一個標量，將在回圈的每次迭代中重置。這意味著當您運行以下行時：

results = pd.DataFrame({'skills':columns, 'coef': coef, 'error':error, 'error2':sstot})

在 for 回圈的最新迭代中，該列error2將設定為 sstot 的標量值，這就是 error2 具有所有相同值的原因。

我猜你的意思是跟蹤每個 sstot skill，所以你可以創建一個名為error2, then set the column error2 equal to this list (like the lists you created forcoef anderror`) 的串列。例如：

def polyfit(df,columns, degree):
    coef=[]
    error=[]
    error2=[]
    x = np.array(list(range(0,df.shape[0])))
    for skill in columns:
        y=df[skill]
        y=pd.to_numeric(y)
        coeffs = np.polyfit(x, y, degree)

        # Polynomial Coefficients
        coef.append(coeffs.tolist()[0])

        # r-squared
        p = np.poly1d(coeffs)

        # fit values, and mean
        yhat = p(x)                         # or [p(z) for z in x]
        ybar = np.sum(y)/len(y)          # or sum(y)/len(y)
        ssreg = np.sum((yhat-ybar)**2)   # or sum([ (yihat - ybar)**2 for yihat in yhat])
        sstot = np.sum((y - yhat)**2)    # or sum([ (yi - ybar)**2 for yi in y])
        error.append(ssreg / sstot)
        error2.append(sstot)

    results = pd.DataFrame({'skills':columns, 'coef': coef, 'error':error, 'error2':error2})
    return results

使用您的樣本的結果df：

>>> polyfit(df,['administrative coordination',  'administrative law',   'administrative support'], 1)
                        skills      coef     error    error2
0  administrative coordination -0.000573  0.002681  0.010100
1           administrative law  0.000511  0.020165  0.001069
2       administrative support  0.002245  0.036025  0.011538

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/417257.html

標籤：

上一篇：在Python中，如果串列中缺少一個值，您如何遍歷該串列？

下一篇：如何在回圈中多次向前跳過？