PySpark錯誤：無法決議“時間戳”-有解無憂

我必須找到 Yelp 資料集中大多數簽到發生的確切時間，但由于某種原因我遇到了這個錯誤。到目前為止，這是我的代碼：

from pyspark.sql.functions import udf
from pyspark.sql.functions import explode
from pyspark.sql.types import IntegerType
from pyspark.sql.types import ArrayType,StringType
from pyspark.sql import functions as F

square_udf_int = udf(lambda z: square(z), IntegerType())

checkin = spark.read.json('yelp_academic_dataset_checkin.json.gz')
datesplit = udf(lambda x: x.split(','),ArrayType(StringType()))
checkin.select('business_id',datesplit('date').alias('dates')).withColumn('checkin_date',explode('dates'))
datesplit = udf(lambda x: x.split(','),ArrayType(StringType()))
dates = checkin.select('business_id',datesplit('date').alias('dates')).withColumn('checkin_date',explode('dates'))
dates = dates.select("checkin_date")
dates.withColumn("checkin_date", F.date_trunc('checkin_date',
                   F.to_timestamp("timestamp", "yyyy-MM-dd HH:mm:ss 'UTC'"))).show(truncate=0)

和錯誤：

Py4JJavaError: An error occurred while calling o1112.withColumn.
: org.apache.spark.sql.AnalysisException: cannot resolve '`timestamp`' given input columns: [checkin_date];;
'Project [date_trunc(checkin_date, to_timestamp('timestamp, Some(yyyy-MM-dd HH:mm:ss 'UTC')), Some(Etc/UTC)) AS checkin_date#190]
 - Project [checkin_date#176]
    - Project [business_id#6, dates#172, checkin_date#176]
       - Generate explode(dates#172), false, [checkin_date#176]
          - Project [business_id#6, <lambda>(date#7) AS dates#172]
             - Relation[business_id#6,date#7] json

日期只是一個 Spark 資料框，其中一列名為：“checkin_date”，只有日期時間，所以我不確定為什么這不起作用。

uj5u.com熱心網友回復：

您獲得的錯誤僅表示在以下代碼行中，您嘗試訪問名為的列timestamp但該列不存在。

dates.withColumn("checkin_date", F.date_trunc('checkin_date',
                   F.to_timestamp("timestamp", "yyyy-MM-dd HH:mm:ss 'UTC'")))

事實上，這是to_timestamp函式的簽名：

pyspark.sql.functions.to_timestamp(col, format=None)

第一個引數是列，第二個引數是格式。我假設您正在嘗試決議日期然后截斷它。假設您想將日期截斷為月份級別。正確的做法是：

dates.withColumn("checkin_date", F.date_trunc('month',
                   F.to_timestamp('checkin_date', "yyyy-MM-dd HH:mm:ss 'UTC'")))

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/347843.html

標籤：Python 阿帕奇火花约会时间调试火花

上一篇：我想預覽包含html代碼的div標簽內的文本檔案，但檔案的html部分被呈現為html而不是文本

下一篇：如何將日期整數與AbbreviatedDayNames匹配