如何使用Spark/Scala在DataFrame行中創建嵌套JSON物件的計數-有解無憂

我有一列充滿 JSON 物件字串，如下所示：

"steps":{
    "step_1":{
        "conditions":{
        "complete_by":"2022-05-17",
        "requirement":100
                     },
        "status":"eligible",
        "type":"buy"
            },
    "step_2":{
        "conditions":{
        "complete_by":"2022-05-27",
        "requirement":100
                     },
        "status":"eligible",
        "type":"buy" 
}

在步驟物件中，可以有任意數量的步驟（在合理范圍內）。

我的問題是，我將如何創建另一個 Dataframe 列來計算該行/列中每個 JSON 字串的步數？

我正在使用 Spark/Scala，所以我使用以下內容創建了一個 UDF：

def jsonCount (col):

val jsonCountUDF = udf(jsonCount)

val stepDF = stepData.withColumn("NumberOfSteps", jsonCountUDF(col("steps")))

這就是我卡住的地方。我想遍歷步驟列中的每一行并計算步驟物件 JSON 字串中的步驟物件。有沒有人有類似任務的經驗或知道簡化此任務的功能？

uj5u.com熱心網友回復：

#make some data
str = "{\"steps\":{ \"step_1\":{\"conditions\":{ \"complete_by\":\"2022-05-17\", \"requirement\":100} }  , \"step_2\":{  \"status\":\"eligible\", \"type\":\"buy\"   }  }}"

#implement a function to return the count
def jsonCount ( jsonString ):
 import json
 json_obj = json.loads(jsonString)
 return len( json_obj["steps"] )

#define the udf
JSONCount = udf(jsonCount, IntegerType())

#create sample dataframe
df = spark.createDataFrame( [ [str] ], ["json"] )

#run udf on dataframe
df.select( df.json, JSONCount( df.json ).alias("StepCount") ).show()

 -------------------- --------- 
|                json|StepCount|
 -------------------- --------- 
|{"steps":{ "step_...|        2|
 -------------------- ---------

uj5u.com熱心網友回復：

您可以嘗試選擇該子結構，然后獲取列大小。

  stepSize=  df.select($"steps.*").columns.size

然后將其添加到您的 df

df_steps = df.withColumn("NumberOfSteps",lit(stepSize))

編輯：不要為此目的使用 UDF ...

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/477192.html

標籤：json 斯卡拉阿帕奇火花 apache-spark-sql

上一篇：如何在python中過濾.json陣列，以便每個元素中只顯示一個引數？

下一篇：Javascript-對陣列進行分組，然后根據長度對分組結果進行排序