我是 PySpark 的新手,不知道我的代碼有什么問題。我有 2 個資料框
df1=
--- --------------
| id|No_of_Question|
--- --------------
| 1| Q1|
| 2| Q4|
| 3| Q23|
|...| ...|
--- --------------
df2 =
-------------------- --- --- --- --- --- ---
| Q1| Q2| Q3| Q4| Q5| ... |Q22|Q23|Q24|Q25|
-------------------- --- --- --- --- --- ---
| 1| 0| 1| 0| 0| ... | 1| 1| 1| 1|
-------------------- --- --- --- --- --- ---
我想創建一個新的資料框,其中所有列都從df2定義到df1.No_of_Question.
預期結果
df2 =
------------
| Q1| Q4| Q24|
------------
| 1| 0| 1|
------------
我已經試過了
df2 = df2.select(*F.collect_list(df1.No_of_Question)) #Error: Column is not iterable
或者
df2 = df2.select(F.collect_list(df1.No_of_Question)) #Error: Resolved attribute(s) No_of_Question#1791 missing from Q1, Q2...
或者
df2 = df2.select(*df1.No_of_Question)
的
df2= df2.select([col for col in df2.columns if col in df1.No_of_Question])
但是這些解決方案都沒有奏效。請問你能幫幫我嗎?
uj5u.com熱心網友回復:
您可以將 的值收集No_of_Question到 python 串列中,然后將其傳遞給df2.select().
嘗試這個:
questions = [
F.col(r.No_of_Question).alias(r.No_of_Question)
for r in df1.select("No_of_Question").collect()
]
df2 = df2.select(*questions)
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/360774.html
標籤:数据框 阿帕奇火花 火花 apache-spark-sql
