我正在嘗試使用 s3a 從 S3 加載資料(據我所知,這是目前唯一的選擇)。我收到一個錯誤 (java.lang.NoClassDefFoundError: org/apache/hadoop/fs/statistics/IOStatisticsSource),我在網上找不到任何相關資訊。在配置事物以使用 s3 方面,我已經做了所有我能想到的事情,但是這個錯誤似乎非常罕見。
如果有人能指出我正確的方向,我將不勝感激。
這是堆疊跟蹤:
Traceback (most recent call last):
File "/home/hdoop/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 737, in csv
File "/home/hdoop/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/home/hdoop/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/home/hdoop/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o107.csv.
: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/statistics/IOStatisticsSource
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:576)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:398)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2532)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2497)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2593)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3269)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:46)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:377)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:795)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.statistics.IOStatisticsSource
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
uj5u.com熱心網友回復:
這在罐子版本和火花中接縫不匹配。您可以使用 aws-java-sdk-bundle 來擁有您可能需要的所有 jars 的相同版本。
這是鏈接https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-bundle
我正在使用 aws-java-sdk-bundle-1.11.874.jar 和 spark-3.1.2-bin-hadoop3.2 并且作業完美。
uj5u.com熱心網友回復:
事實證明,使用以下版本呼叫包引數對我有用:
--packages org.apache.hadoop:hadoop-aws:2.8.5,com.amazonaws:aws-java-sdk:1.11.659,org.apache.hadoop:hadoop-common:2.8.5
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/332954.html
上一篇:在C#中使用EHLLAPI從IBM的PersonalCommunicationsiSeries獲取字串讓我得到字串后跟垃圾
下一篇:從查詢Doctrine回傳字串
