由換行符分隔的普通JSON到BigqueryJSON要求-有解無憂

我有一個長度超過 100,000 的字典串列。

我將如何將其轉換為 JSON 并按照 Bigquery 的要求將其寫入 JSON 檔案以創建帶有換行符的 JSON 檔案。

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}
{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}

代替

[{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}, {"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}]

請注意兩個 JSON 之間的區別：第一個是換行符分隔，而第二個是逗號分隔（Python 中的正常 JSON 轉儲）。我需要第一個。

我之前做的是在回圈的最后一部分，我這樣做：

while condition:
     with open('cache/name.json', 'a') as a:
          json_data = json.dumps(store)
          a.write(json_data   '\n')

這樣做，我根據字典串列的長度打開和寫入，這使得回圈變慢。

我如何能夠按照 bigquery 的要求以更快的方式插入它？

uj5u.com熱心網友回復：

這種格式稱為 NEWLINE_DELIMITED_JSON 并且 bigquery 具有內置庫來加載它。考慮到您在 gs 存盤桶中有 json，您可以使用以下內容：

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name"

job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("name", "STRING"),
        bigquery.SchemaField("post_abbr", "STRING"),
    ],
    source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,
)
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.json"

load_job = client.load_table_from_uri(
    uri,
    table_id,
    location="US",  # Must match the destination dataset location.
    job_config=job_config,
)  # Make an API request.

load_job.result()  # Waits for the job to complete.

destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))

uj5u.com熱心網友回復：

考慮JSON在以寫入模式訪問檔案后回圈字典串列。這樣，在將字典寫入JSON檔案后，檔案關閉只會發生一次。下面的代碼比檔案訪問在while回圈內時運行得更快，在回圈中每次迭代都會發生檔案關閉。

import json

list_dict = [{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}, 
{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}] 

list_dict = list_dict * 55000  ## Fills the list with 110,000 elements

with open ("sample-json-data.json", "w") as jsonwrite:
    for item in list_dict:
        jsonwrite.write(json.dumps(item)   '\n')

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/336463.html

標籤：Python json python-2.7 谷歌-bigquery

上一篇：如何使用PySpark在用于字串的列中用NULL替換整數的任何實體？

下一篇：如何使用區塊鏈記仇