我正在嘗試安裝 spark(沒有 hadoop)。
Java版本:1.8.0_202
火花版本:火花-3.3.1
Python版本:3.7.15
當我執行 spark-shell 或 pyspark 我得到這個錯誤:
[spark@de ~]$ spark-shell
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
派斯帕克:
[spark@de ~]$ pyspark
Python 3.7.15 (default, Nov 1 2022, 23:18:36)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44.0.3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
Traceback (most recent call last):
File "/opt/spark-3.3.1/python/pyspark/shell.py", line 36, in <module>
SparkContext._ensure_initialized()
File "/opt/spark-3.3.1/python/pyspark/context.py", line 417, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/opt/spark-3.3.1/python/pyspark/java_gateway.py", line 106, in launch_gateway
raise RuntimeError("Java gateway process exited before sending its port number")
RuntimeError: Java gateway process exited before sending its port number
怎么了?我需要hadoop嗎?那么為什么有一個名為spark-3.3.1-bin-without-hadoop.tgz的下載選項?(沒有hadoop)。
uj5u.com熱心網友回復:
根據 Spark檔案:
Spark 使用 Hadoop 的 HDFS 和 YARN 客戶端庫。為少數流行的 Hadoop 版本預打包了下載。用戶還可以通過擴展 Spark 的類路徑下載“免費 Hadoop”二進制檔案并使用任何 Hadoop 版本運行 Spark。
您不需要設定 Hadoop 集群,但確實需要一些 Hadoop 庫。
對于您的情況,您似乎沒有設定 Hadoop 集群,因此您應該使用 spark-with-hadoop 二進制檔案。
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/526117.html
標籤:阿帕奇火花pyspark
