我正在嘗試使用單詞串列進行情緒分析,以獲取 pyspark 資料框列中正負單詞的計數。我可以使用相同的方法成功獲得正面單詞的計數,并且該串列中大約有 2k 個正面單詞。負面清單的字數大約是兩倍(約 4k 字)。什么可能導致此問題,我該如何解決?
我不認為這是由于代碼,因為它適用于積極的詞,但我很困惑我正在搜索的詞的數量是否在另一個串列中太長,或者我錯過了什么。下面是一個示例(不是確切的串列):
stories.show()
--------------------
| words|
--------------------
|tom and jerry went t|
|she was angry when g|
|arnold became sad at|
--------------------
neg = ['angry','sad','sorrowful','angry']
#doing some counting manipulation here
df3.show()
錯誤:
spark-3.2.0-bin-hadoop3.2/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py in __call__(self, *args)
1308 answer = self.gateway_client.send_command(command)
1309 return_value = get_return_value(
-> 1310 answer, self.gateway_client, self.target_id, self.name)
1311
1312 for temp_arg in temp_args:
/content/spark-3.2.0-bin-hadoop3.2/python/pyspark/sql/utils.py in deco(*a, **kw)
115 # Hide where the exception came from that shows a non-Pythonic
116 # JVM exception message.
--> 117 raise converted from None
118 else:
119 raise
PythonException:
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File "<ipython-input-6-97710da0cedd>", line 17, in countNegatives
File "/usr/lib/python3.7/re.py", line 225, in findall
return _compile(pattern, flags).findall(string)
File "/usr/lib/python3.7/re.py", line 288, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.7/sre_compile.py", line 764, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.7/sre_parse.py", line 932, in parse
p = _parse_sub(source, pattern, True, 0)
File "/usr/lib/python3.7/sre_parse.py", line 420, in _parse_sub
not nested and not items))
File "/usr/lib/python3.7/sre_parse.py", line 648, in _parse
source.tell() - here len(this))
re.error: multiple repeat at position 5
預期輸出:
-------------------- --------
| words|Negative|
-------------------- --------
|tom and jerry went t| 45|
|she was angry when g| 12|
|arnold became sad at| 54|
uj5u.com熱心網友回復:
您的neg串列包含對正則運算式模式具有特殊含義的字符,因此,您的模式成為不可決議的正則運算式模式。
您可以使用re.escape()函式轉義模式中的特殊字符。
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/441159.html
標籤:Python 阿帕奇火花 pyspark apache-spark-sql 找到所有
