csv gz的檔案是以逗號分隔的,但是有一些特殊字符為了保證資料的完整性,被雙型號包起來了
csv file sample:
doc_key,doc_name,doc_brand,brand_inc,etl_load_date
8a14eff760993c84d2306b3547fc6bea,VIRTUALLY CALIFORNIA,UNITED STATES OF AMERICA,UNKNOWN,2018-02-28
1fd23373942c9ffcdfc7223cd1114ee1,"ARIZONA \"9\" LLC",UNITED STATES OF AMERICA,UNKNOWN,2018-02-28
copy stg.train_ticket from 's3://{path}/'
CREDENTIALS 'aws_access_key_id={aws_access_key_id};aws_secret_access_key={aws_secret_access_key}'
TRIMBLANKS TRUNCATECOLUMNS ACCEPTINVCHARS dateformat as 'auto'
GZIP csv quote as '"' delimiter ',' IGNOREHEADER as 1;
我在sqlworkbench運行上面這個copy cmd,報錯
An error occurred when executing the SQL command:
copy stg.train_ticket from 's3://{path}/'
CREDENTIALS 'aws_access_key_id={aws_access_key_id};aws_secret_access...
[Amazon](500310) Invalid operation: Load into table 'train_ticket ' failed. Check 'stl_load_errors' system table for details.;
1 statement failed.
查到stl_load_errors這個表的錯誤資訊是:
column: doc_name
err_reason: Invalid quote formatting for CSV
raw_field_value: ARIZONA \\
err_code: 1214
請問這一段要怎么處理 "ARIZONA \"9\" LLC"
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/53486.html
標籤:Spark
