我正在嘗試從 S3 存盤桶中讀取資料并希望將其寫入/加載到 postgres 表中。我的代碼是-
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Read Multiple CSV Files').getOrCreate()
path = ['C://Projects/Sandbox/file2.csv']
files = spark.read.csv(path, sep=',',inferSchema=True, header=True)
df1 = files.toPandas()
from pyspark.sql import DataFrameWriter
my_writer = DataFrameWriter(df1)
mode = "overwrite"
url = ""
properties = {"user": "","password": "","driver": "org.postgresql.Driver"}
my_writer.write.jdbc(url=url, table="test_result", mode=mode, properties=properties)
在線的
my_writer = DataFrameWriter(files)
它給出了錯誤 -
AttributeError: 'DataFrameWriter' 物件沒有屬性 'write'
在線,當 DataFrameWriter() 的引數傳遞為 -
my_writer = DataFrameWriter(df1)
AttributeError:“DataFrame”物件沒有屬性“sql_ctx”
有什么/任何地方我做錯了嗎?
uj5u.com熱心網友回復:
無需創建新實體DataFrameWriter,spark 資料框已經使用該write屬性公開了此介面。您可以使用此屬性通過jdbc連接匯出 csv 資料
# Read the data form source
files = spark.read.csv(path, sep=',', inferSchema=True, header=True)
# Write the data to destination using jdbc connection
files.write.jdbc(url=url, table="test_result", mode=mode, properties=properties)
如何修復現有代碼?
創建一個新的DataFrameWriter使用實體,files然后使用連接my_writer.jdbc來匯出資料jdbc
my_writer = DataFrameWriter(files)
my_writer.jdbc(url=url, table="test_result", mode=mode, properties=properties)
# ^^^^^^ No need to use .write attribute
uj5u.com熱心網友回復:
以下解決方案是正確的
spark = SparkSession.builder.appName('Read Multiple CSV Files').getOrCreate()
path = ['C://Projects/Sandbox/file2.csv']
files = spark.read.csv(path, sep=',',inferSchema=True, header=True)
df1 = files.toPandas()
from pyspark.sql import DataFrameWriter
my_writer = DataFrameWriter(df1)
mode = "overwrite"
url = ""
properties = {"user": "","password": "","driver": "org.postgresql.Driver"}
my_writer.jdbc(url=url, table="test_result", mode=mode, properties=properties)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/447410.html
標籤:Python PostgreSQL 亚马逊-s3 pyspark
