我在運行spark應用程式時得到一個錯誤(java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities)。 我查看了關于這種型別的錯誤的各種其他問題,但我無法確定解決方案。 Spark的版本是3.1.2
。pyspark腳本:
import os
from pyspark.sql import SparkSession
from pyspark import SparkConf, SparkContext as sc
os.environ['PYSPARK_SUBMIT_ARGS'] = '-packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:3.2.0 pyspark-shell')
spark = SparkSession.builder
.appName("s3reader")
.config('spark.sql.codegen.wholeStage', False)
.getOrCreate()
spark._jsc.hadoopConfiguration().set("fs.s3a.access.key", "xxxxxxx")
spark._jsc.hadoopConfiguration().set("fs.s3a.secret.key", "xxxxxx")
spark._jsc.hadoopConfiguration().set("fs.s3a.impl"/span>,"org.apache.hadoop.fs.s3a.S3AFileSystem"/span>)
spark._jsc.hadoopConfiguration().set("com.amazonaws.service.s3.enableV4"/span>, "true"/span>)
spark._jsc.hadoopConfiguration().set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.BasicAWSCredentialsProvider")
spark._jsc.hadoopConfiguration().set("https://x.x.x.x.x"/span>, "us-east-1")
df = spark.read.text("s3a://xxxx/xxx/test.txt"/span>)
print(df)
print(df)
這里是我的jar版本:
cloud@spark-dev-master:/usr/local/spark/jars$ ls -ltr *aws*
-rw-rw-r-- 1 cloud cloud 126287 Aug 18 2016 hadoop-aws-2。 7.3.jar
-rw-rw-r-- 1 cloud cloud 4479 Sep 17 02:36 aws-java-sdk-1。 12.69.jar
完整的錯誤日志
cloud@spark-dev-master:~$ spark-submit --master spark://x.x.x:7077 sparks3test.py
警告:發生了一個非法的反射性訪問操作
警告:org.apache.spark.unsafe.Platform(檔案:/usr/local/spark/jars/spark-unsafe_2非法反射性訪問。 12-3.1.2.jar)到建構式 java.nio.DirectByteBuffer(long,int)
警告:請考慮向org.apache.spark.unsafe.Platform的維護者報告此事。
警告:使用--illegal-access=warn來啟用對進一步非法反射訪問操作的警告。
警告:在未來的版本中,所有非法訪問操作將被拒絕 。
21/17 14:55:07 WARN NativeCodeLoader: 無法加載本地hadoop庫用于您的平臺......在適用的情況下使用內置java類。
使用Spark的默認log4j組態檔:org/apache/spark/log4j-defaults.properties
21/09/17 14:55:07 INFO SparkContext: 運行Spark 3.1.2版本
21/09/17 14:55:07 INFO ResourceUtils: ==============================================================
21/09/17 14:55:07 INFO ResourceUtils: 沒有為spark.driver配置的自定義資源。
21/09/17 14:55:07 INFO ResourceUtils: ==============================================================
21/09/17 14:55:07 INFO SparkContext: 提交的應用程式:s3reader
21/09/17 14:55:07 INFO ResourceProfile: 創建了默認的ResourceProfile,執行器資源。Map(cores -> name: cores, amount: 1, 腳本。, vendor: , memory -> name: memory, amount: 1024, 腳本: , 供應商。, offHeap -> name: offHeap, amount: 0, 腳本: , 供應商。), 任務資源。Map(cpus -> name: cpus, amount: 1.0)
21/09/17 14:55:07 INFO ResourceProfile: 限制的資源是cpu
21/09/17 14:55:07 INFO ResourceProfileManager: 已添加資源組態檔ID:0
21/09/17 14:55:07 INFO SecurityManager: 將視圖 acls 改為:云
21/09/17 14:55:07 INFO SecurityManager: 將修改acls改為:cloud
21/09/17 14:55:07 INFO SecurityManager: 更改視圖acls組為。
21/09/17 14:55:07 INFO SecurityManager: 將修改acls組改為。
21/09/17 14:55:07 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(云); 有查看權限的組。Set(); 具有修改權限的用戶。Set(cloud); 有修改權限的組。設定()
21/09/17 14:55:08 INFO Utils: 成功啟動服務'sparkDriver',埠為42827。
21/09/17 14:55:08 INFO SparkEnv: 注冊MapOutputTracker
21/09/17 14:55:08 INFO SparkEnv: 注冊BlockManagerMaster
21/09/17 14:55:08 INFO BlockManagerMasterEndpoint: 使用org.apache.spark.storage.DefaultTopologyMapper來獲取拓撲資訊
21/09/17 14:55:08 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/09/17 14:55:08 INFO SparkEnv: 注冊BlockManagerMasterHeartbeat
21/09/17 14:55:08 INFO DiskBlockManager: 在/tmp/blockmgr-47a05523-9479-4917-8463-ccb6fd4a6df8創建本地目錄
21/09/17 14:55:08 INFO MemoryStore: 記憶體存盤開始,容量為434.4 MiB
21/09/17 14:55:08 INFO SparkEnv: 注冊OutputCommitCoordinator
21/09/17 14:55:08 INFO Utils: 成功地啟動了4040埠的服務'SparkUI'。
21/09/17 14:55:08 INFO SparkUI: 將SparkUI系結到0.0.0.0,并在http://master:4040 開始。
21/09/17 14:55:08 INFO StandaloneAppClient$ClientEndpoint: 連接到主站 spark://x.x.x.x.:7077...
21/09/17 14:55:08 INFO TransportClientFactory: 32毫秒后成功創建了與/x.xx.x.x:7077的連接(引導程序中花費了0毫秒)。
21/09/17 14:55:09 INFO StandaloneSchedulerBackend: 連接到Spark集群,應用ID為app-20210917145508-0001
21/09/17 14:55:09 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20210917145508-0001/0 on worker-20210722134039-144.8.108.2-37165 (x.x.x.x:37165) with 4 core(s)
21/09/17 14:55:09 INFO Utils: 成功地啟動了40855埠的'org.apache.spark.network.netty.NettyBlockTransferService'服務。
21/09/17 14:55:09 INFO NettyBlockTransferService: 在master:40855上創建了服務器
21/09/17 14:55:09 INFO BlockManager: 使用org.apache.spark.storage.RandomBlockReplicationPolicy作為塊復制策略。
21/09/17 14:55:09 INFO StandaloneSchedulerBackend: 在主機埠x.x.x.x:37165上授予執行器ID app-202019145508-0001/0,有4個內核,1024.0 MiB記憶體
21/09/17 14:55:09 INFO BlockManagerMaster: 注冊 BlockManager BlockManagerId(driver, master, 40855, None)
21/09/17 14:55:09 INFO BlockManagerMasterEndpoint: 注冊區塊管理器master:40855,有434.4 MiB RAM,BlockManagerId(driver, master, 40855, None)
21/09/17 14:55:09 INFO BlockManagerMaster: 注冊了BlockManager BlockManagerId(driver, master, 40855, None)
21/09/17 14:55:09 INFO BlockManager: 初始化了BlockManager。BlockManagerId(driver, master, 40855, None)
21/09/17 14:55:09 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20210917145508-0001/0 is now RUNNING
21/09/17 14:55:09 INFO StandaloneSchedulerBackend: SchedulerBackend在達到minRegisteredResourcesRatio: 0.0后準備開始調度。
回溯(最近一次呼叫)。
File "/home/cloud/sparks3test.py", line 5, in <module>
spark = SparkSession.builder
檔案 "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/session.py", 第233行, in getOrCreate
檔案 "/usr/local/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", 第1304行, in __call__
檔案 "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", 第 111 行, in deco
檔案 "/usr/local/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", 326行, in get_return_value
py4j.protocol.Py4JJavaError。在呼叫o27.sessionState時發生錯誤。
: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:576)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:398)
at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.nextProviderClass(ServiceLoader.java:1210)
at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java.1221)
at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1265)
at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1300)
at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1385)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2628)
在 org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2650)
at org.apache.hadoop.fs.FsUrlStreamHandlerFactory.<init>(FsUrlStreamHandlerFactory.java:62)
at org.apache.spark.sql.internal.SharedState$.liftedTree1$1(SharedState.scala:180)
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$setFsUrlStreamHandlerFactory(SharedState.scala:179)
at org.apache.spark.sql.internal.SharedState.<init> (SharedState.scala:53)
at org.apache.spark.sql.SparkSession.$anonfun$sharedState$1(SparkSession.scala:138)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:138)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:137)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:335)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1145)。
at org.apache.spark.sql.SparkSession.$anonfun$sessionState$2(SparkSession.scala:159)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:155)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:152)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
原因是: java.lang.ClassNotFoundException: org.apache.hadoop.fs.StreamCapabilities
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 44 更多
21/09/17 14:55:09 INFO SparkUI: Stopped Spark web UI at http://master:4040
21/09/17 14:55:09 INFO StandaloneSchedulerBackend: 關閉所有執行器
21/09/17 14:55:09 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: 要求每個執行器關閉
21/09/17 14:55:09 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint停止了!
21/09/17 14:55:09 INFO MemoryStore: 清理了MemoryStore
21/09/17 14:55:09 INFO BlockManager: BlockManager停止了
21/09/17 14:55:09 INFO BlockManagerMaster: 阻止了BlockManagerMaster。
21/09/17 14:55:09 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator停止了!
21/09/17 14:55:09 INFO SparkContext: 成功地停止了SparkContext
21/09/17 14:55:09 INFO ShutdownHookManager: 關閉鉤子被呼叫
21/09/17 14:55:09 INFO ShutdownHookManager: Deleting directory /tmp/spark-a0c9a62b-2043-4dd3-891a-643a4343068d/pyspark-8fa17f27-3f76-42d6-a7fc-ba6388fca439
21/09/17 14:55:09 INFO ShutdownHookManager: Deleting directory /tmp/spark-a0c9a62b-2043-4dd3-891a-643a4343068d
21/09/17 14:55:09 INFO ShutdownHookManager: Deleting directory /tmp/spark-f637d2f7-6490-4cf3-b066-5ab90972d090
uj5u.com熱心網友回復:
你需要使用hadoop-aws 3.2.0版本。
你可以參考我之前的答案這里。
uj5u.com熱心網友回復:
我得到一個錯誤(java.lang.NoClassDefFoundError:org/apache/hadoop/fs/StreamCapabilities)
這就是你在混合hadoop-aws和hadoop-common JAR版本時所看到的情況。它們必須點對點地匹配(正如火花JAR也要求的那樣)。
除了同步 JAR 之外,不要試圖解決這個問題,你將只是在周圍移動堆疊痕跡。
參見 Hadoop troubleshooting s3a
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/328338.html
標籤:
