有什么方法可以根據來自 databricks pyspark 資料幀的一組值中兩個值的存在來更改列?
例子:
df = (
[
('E1', 'A1',''),
('E2', 'A2',''),
('F1', 'A3',''),
('F2', 'B1',''),
('F3', 'B2',''),
('G1', 'B3',''),
('G2', 'C1',''),
('G3', 'C2',''),
('G4', 'C3',''),
('H1', 'C4',''),
('H2', 'D1',''),
],
['old_comp_id', 'db_id', 'comment']
)
我們檢查值的存在,E1并C1,在兩種情況下都用注釋標記,預期結果應該是:
df = (
[
('E1', 'A1','mark'),
('E2', 'A2',''),
('F1', 'A3',''),
('F2', 'B1',''),
('F3', 'B2',''),
('G1', 'B3',''),
('G2', 'C1','mark'),
('G3', 'C2',''),
('G4', 'C3',''),
('H1', 'C4',''),
('H2', 'D1',''),
],
['old_comp_id', 'db_id', 'comment']
)
為了能夠在 Databricks 中使用多個 worker,我認為它應該只使用 pyspark 框架,而不是在任何時候轉換為 Pandas。
另一個預期的行為:
假設我們沒有包含“C1”元素的行。在這種情況下,輸入資料框將是:
df = (
[
('E1', 'A1',''),
('E2', 'A2',''),
('F1', 'A3',''),
('F2', 'B1',''),
('F3', 'B2',''),
('G1', 'B3',''),
('G3', 'C2',''),
('G4', 'C3',''),
('H1', 'C4',''),
('H2', 'D1',''),
],
['old_comp_id', 'db_id', 'comment']
)
和輸出:將完全等于輸入。
uj5u.com熱心網友回復:
我認為您必須分兩步完成此操作。首先,檢查值C1和是否E1在兩列中至少出現一次,如果是,則應用操作,類似于@Steven 的建議:
from pyspark.sql.functions import col, when
df = spark.createDataFrame([
('E1', 'A1',''),
('E2', 'A2',''),
('F1', 'A3',''),
('F2', 'B1',''),
('F3', 'B2',''),
('G1', 'B3',''),
('G2', 'C1',''),
('G3', 'C2',''),
('G4', 'C3',''),
('H1', 'C4',''),
('H2', 'D1',''),
],
['old_comp_id', 'db_id', 'comment']
)
key_values = ["E1", "C1"]
df_old_comp_id_filtered = df.filter(col("old_comp_id").isin(key_values))
df_db_id_filtered = df.filter(col("db_id").isin(key_values))
if df_old_comp_id_filtered.count() == 0 or df_db_id_filtered.count() == 0:
df.show() # And preferably return original DF
df.withColumn("comment", when(col("old_comp_id").isin(key_values), "mark").when(col("db_id").isin(key_values), "mark")).show()
# If both key values exist:
----------- ----- -------
|old_comp_id|db_id|comment|
----------- ----- -------
| E1| A1| mark|
| E2| A2| |
| F1| A3| |
| F2| B1| |
| F3| B2| |
| G1| B3| |
| G2| C1| mark|
| G3| C2| |
| G4| C3| |
| H1| C4| |
| H2| D1| |
----------- ----- -------
# Else
----------- ----- -------
|old_comp_id|db_id|comment|
----------- ----- -------
| E1| A1| |
| E2| A2| |
| F1| A3| |
| F2| B1| |
| F3| B2| |
| G1| B3| |
| G3| C2| |
| G4| C3| |
| H1| C4| |
| H2| D1| |
----------- ----- -------
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/535758.html
