將串列添加到現有csv-有解無憂

感謝您查看這個：

我正在嘗試將計算值（平均值）串列作為新列添加到現有 csv 中。

這是我的 MWE：

import csv
import re
import pandas as pd
import oseti
import numpy as np

# handle csv data
df = pd.read_csv('filepath/text.csv')
analyzer = oseti.Analyzer()
dtype_before = type(df["text"])
text_list = df["text"].tolist()

# create df for sentiment analysis
list_sa = (np.mean(list(map(analyzer.analyze,text_list))).tolist())
df_sa = pd.DataFrame (list_sa, columns = ['sa_mean'])
print (df_sa)

這部分有效（盡管我收到警告：

不推薦從不規則的嵌套序列（它是具有不同長度或形狀的串列或元組或 ndarray 的串列或元組）創建 ndarray。如果您打算這樣做，則必須在創建 ndarray 時指定“dtype=object”。

) 并正確列印出值（因為我是新手，所以我想確保它看起來像我想要的那樣）。列印的結果看起來有點像這樣：

    sa_mean
0   0.000000
1   0.000000
2   0.000000
3  -0.018519
4   0.037037

但是，如果我不是列印，而是嘗試將它作為原始加載的 csv（'filepath/text.csv'）的新列，我不確定如何解決它（是否有必要將其設為 DataFrame 或 Series？ )

我試過這個（而不是最后一個列印行

df["new_column"] = df_sa
df.to_csv("text.csv", index=False)

但是，我收到一個錯誤 - 仍然創建 csv，但我想了解是否有問題：

不推薦從不規則的嵌套序列（它是具有不同長度或形狀的串列或元組或 ndarray 的串列或元組）創建 ndarray。如果您打算這樣做，則必須在創建 ndarray 時指定“dtype=object”。

我不確定為什么會發生這種情況以及如何解決。

先感謝您！

編輯：

print(list_sa) 將如下所示：

[0.0, 0.0, 0.0, -0.018518518518518517, 0.037037037037037035, 0.037037037037037035, 0.0, 0.0, 0.0, 0.0, 0.0, -0.037037037037037035, 0.0, 0.037037037037037035, 0.0, 0.037037037037037035, 0.0, 0.0, 0.0, -0.037037037037037035, -0.012345679012345678, -0.037037037037037035, 0.0, 0.0, -0.037037037037037035, -0.037037037037037035, 0.0, 0.0, 0.0, -0.037037037037037035, -0.037037037037037035, 0.037037037037037035, 0.0, 0.0, 0.0, -0.037037037037037035, 0.0, 0.0, 0.0, -0.037037037037037035, -0.037037037037037035, 0.037037037037037035, 0.0, 0.0, -0.037037037037037035, -0.037037037037037035, 0.0, 0.037037037037037035, -0.037037037037037035, -0.037037037037037035, -0.037037037037037035, 0.037037037037037035, 0.037037037037037035, -0.037037037037037035, 0.037037037037037035, 0.037037037037037035, 0.0, 0.037037037037037035, -0.037037037037037035, 0.037037037037037035, 0.0, 0.0, -0.037037037037037035, 0.037037037037037035, 0.0, 0.037037037037037035, -0.037037037037037035, 0.0, 0.0, -0.037037037037037035, 0.0, 0.037037037037037035, 0.0, 0.0, -0.037037037037037035, -0.024691358024691357]

uj5u.com熱心網友回復：

使用串列推導np.mean并分配給新列，df_sa這里沒有必要：

df = pd.read_csv('filepath/text.csv')
analyzer = oseti.Analyzer()

df['new_column'] = [np.mean(analyzer.analyze(x)) for x in df['text']]

或者創建 lambda 函式：

df['new_column'] = df['text'].apply(lambda x: np.mean(analyzer.analyze(x)))

df.to_csv("text.csv", index=False)

uj5u.com熱心網友回復：

是否可以分辨出哪個陳述句產生了警告？您可能必須逐行運行，或者在它們之間列印（如果運行腳本）。

我懷疑是

np.mean(list(map(analyzer.analyze,text_list))

該警告意味著您（或您的代碼呼叫的東西）正在嘗試從長度不同的串列中創建一個陣列。例如：

In [245]: alist = [[1,2,3],[4,5],[6]]
In [246]: alist
Out[246]: [[1, 2, 3], [4, 5], [6]]
In [247]: np.array(alist)
<ipython-input-247-7512d762195a>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  np.array(alist)
Out[247]: array([list([1, 2, 3]), list([4, 5]), list([6])], dtype=object)

結果是一個 1d 陣列，具有 object dtype。它不能從這樣的串列中創建一個二維陣列。

嘗試在該串列上執行均值會產生相同的警告，因為它首先必須創建一個陣列：

In [248]: np.mean(alist)
/usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py:163: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  arr = asanyarray(a)
Out[248]: 
array([0.33333333, 0.66666667, 1.        , 1.33333333, 1.66666667,
       2.        ])

警告不會像錯誤那樣提供回溯，但它確實顯示了引發警告的操作。平均值也是關閉的 - 串列已“扁平化”，但除數為 3！

正如 jezrael 所建議的，我們可以通過以下方式獲得子串列的方法：

In [249]: [np.mean(x) for x in alist]
Out[249]: [2.0, 4.5, 6.0]

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/453052.html

標籤：Python 熊猫麻木的 CSV

上一篇：numpy：有條件的按列操作

下一篇：使用python從pandas資料框中洗掉nan串列的問題