我有 2 個具有相同鍵的資料框,我想將它們統一為一個標識每列來源的資料框,這可能嗎?
df1
-------------------------------------- --------------------
| ID | CURRENCY |
-------------------------------------- --------------------
| 401148EE-9BA6-4BAA-B113-ED694B0F5BED | 100.00 |
| E90ED21E-C60F-412C-8305-DB5675DA7A5E | 1000.00 |
-------------------------------------- --------------------
df2
-------------------------------------- --------------------
| ID | CURRENCY |
-------------------------------------- --------------------
| 401148EE-9BA6-4BAA-B113-ED694B0F5BED | 200.00 |
| E90ED21E-C60F-412C-8305-DB5675DA7A5E | 2000.00 |
-------------------------------------- --------------------
Result
-------------------------------------- -------------------- --------------------
| ID | DF1.CURRENCY | DF2.CURRENCY |
-------------------------------------- -------------------- --------------------
| 401148EE-9BA6-4BAA-B113-ED694B0F5BED | 100.00 | 200.00 |
| E90ED21E-C60F-412C-8305-DB5675DA7A5E | 1000.00 | 2000.00 |
-------------------------------------- -------------------- --------------------
uj5u.com熱心網友回復:
使用join這種情況。
Example:
df1=spark.createDataFrame([('401148EE-9BA6-4BAA-B113-ED694B0F5BED',100),('E90ED21E-C60F-412C-8305-DB5675DA7A5E',1000)],['id','currency']).withColumnRenamed("currency","df1.currency")
df2=spark.createDataFrame([('401148EE-9BA6-4BAA-B113-ED694B0F5BED',200),('E90ED21E-C60F-412C-8305-DB5675DA7A5E',2000)],['id','currency']).withColumnRenamed("currency","df2.currency")
df1.join(df2,['id'],'inner').show()
# -------------------- ------------ ------------
#| id|df1.currency|df2.currency|
# -------------------- ------------ ------------
#|401148EE-9BA6-4BA...| 100| 200|
#|E90ED21E-C60F-412...| 1000| 2000|
# -------------------- ------------ ------------
uj5u.com熱心網友回復:
df1 = spark.createDataFrame([('401148EE-9BA6-4BAA-B113-ED694B0F5BED', 100.00),('E90ED21E-C60F-412C-8305-DB5675DA7A5E',1000.00 )],['ID','CURRENCY'])
df2 = spark.createDataFrame([('401148EE-9BA6-4BAA-B113-ED694B0F5BED', 200.00),('E90ED21E-C60F-412C-8305-DB5675DA7A5E',2000.00 )],['ID','CURRENCY'])
df1\
.withColumnRenamed("CURRENCY", "DF1.CURRENCY")\
.join(df2.withColumnRenamed("CURRENCY", "DF2.CURRENCY"),['ID'],how='full')\
.show(truncate=False)
------------------------------------ ------------ ------------
|ID |DF1.CURRENCY|DF2.CURRENCY|
------------------------------------ ------------ ------------
|401148EE-9BA6-4BAA-B113-ED694B0F5BED|100.0 |200.0 |
|E90ED21E-C60F-412C-8305-DB5675DA7A5E|1000.0 |2000.0 |
------------------------------------ ------------ ------------
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/376286.html
上一篇:添加行值作為資料框中的新列
下一篇:重新采樣熊貓資料框并用零填充新行
