我一直在弄清楚這一點。我有以下從輸出創建的 df。我正在回圈獲取table_name,schema和. 我想確保存在評論,如果存在創建附加到串列的字串查詢。column_namecommentquery_list
output = [['table_name', 'schema_name', 'column_name', 'data_type', 'null?', 'default', 'kind', 'expression', 'comment', 'database_name', 'autoincrement'], ['ACCOUNT', 'SFO', '_LOAD_DATETIME', '{"type":"TIMESTAMP_LTZ","precision":0,"scale":9,"nullable":true}', 'TRUE', '', 'COLUMN', '', 'DATE of Account', 'VE'], ['ACCOUNT', 'SFO', '_LOAD_FILENAME', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false}', 'TRUE', '', 'COLUMN', '', 'file name', 'VE'], ['ACCOUNT', 'SFO', '_LOAD_FILE_TIMESTAMP', '{"type":"TIMESTAMP_NTZ","precision":0,"scale":9,"nullable":true}', 'TRUE', '', 'COLUMN', '', '', 'VE'], ['CUSTOMER', 'SFO', 'SUBSCRIPTIONSLIST', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false}', 'TRUE', '', 'COLUMN', '', '', 'VE'], ['CUSTOMER', 'SFO', 'CONTACTROLESLIST', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false}', 'TRUE', '', 'COLUMN', '', 'list of contract', 'VE'], ['DATA', 'SFO', 'OPPORTUNITY_NAME', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false}', 'TRUE', '', 'COLUMN', '', '', 'VE']]
output = filter(bool, output)
df = pd.DataFrame(output)
df.columns = df.iloc[0]
df = df[1:]
query_list = []
grouped_comments = ''
for index, row in df.iterrows():
if row['comment'] is not None and row['comment'] != '':
if len(row['table_name']) > 1:
# the below doesn't work, it groups all table comments together
sql = f"(COLUMN {row['column_name']} COMMENT '{row['comment']}')"
grouped_comments = grouped_comments sql
elif len(row['table_name']) == 1:
sql = f"ALTER TABLE {row['schema_name']}.{row['table_name']} ALTER COLUMN {row['column_name']} COMMENT '{row['comment']}';"
query_list.append(sql)
現在我堅持得到的部分是,如果存在評論,并且如果table_name顯示不止一次,那么它應該創建一個如下所示的字串,因此它應該獲取所有column_nameandcomment并分組為一個字串為此table_name:
"ALTER TABLE VE.ACCOUNT ALTER (COLUMN _LOAD_DATETIME COMMENT 'DATE of Account', COLUMN _LOAD_FILENAME COMMENT 'file name');"
并且 elif 有效,因為如果 table_name 只出現一次,那么它會填充正確的字串:
"ALTER TABLE VE.CUSTOMER ALTER COLUMN CONTACTROLESLIST COMMENT 'list of contract';"
所以最后,如果我有上面的 2 個字串,那么我的 query_list 應該如下所示:
query_list = ["ALTER TABLE VE.ACCOUNT ALTER (COLUMN _LOAD_DATETIME COMMENT 'DATE of Account', COLUMN _LOAD_FILENAME COMMENT 'file name');",
"ALTER TABLE VE.CUSTOMER ALTER COLUMN CONTACTROLESLIST COMMENT 'list of contract';"]
uj5u.com熱心網友回復:
首先,您可以過濾掉不必要的行。
df = df[df.comment.notnull() & (df.comment.str.len() > 0)]
然后,連接列陳述句。
df['column_prop'] = 'COLUMN ' df.column_name ' COMMENT \'' df.comment '\''
現在,聚合column_prop加入字串,按表名分組。
df = df.groupby(['database_name', 'table_name']).agg({'column_prop': lambda x: ', '.join(x)}).reset_index()
這將為您提供以下資訊。
0 database_name table_name column_prop
0 VE ACCOUNT COLUMN _LOAD_DATETIME COMMENT 'DATE of Account', COLUMN _LOAD_FILENAME COMMENT 'file name'
1 VE CUSTOMER COLUMN CONTACTROLESLIST COMMENT 'list of contract'
從這里,您可以將每列與其他字串連接起來,以獲得您想要的字串。
如果您的資料框很小,您可以像這樣簡單地連接。
df['sql'] = 'ALTER TABLE ' df.database_name '.' ...
或者在這里查看https://stackoverflow.com/a/54298586/2956135以獲得更好的字串連接方法。
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/435581.html
標籤:Python python-3.x 熊猫
