下面給出了將管道分隔的 csv 檔案匯入到 monogdb 的代碼。
import csv
import json
from pymongo import MongoClient
url = "mongodb://localhost:27017"
client = MongoClient(url)
db = client.Office
customer = db.Customer
jsonArray = []
with open("Names.txt", "r") as csv_file:
csv_reader = csv.DictReader(csv_file, dialect='excel', delimiter='|', quoting=csv.QUOTE_NONE)
for row in csv_reader:
jsonArray.append(row)
jsonString = json.dumps(jsonArray, indent=1, separators=(",", ":"))
jsonfile = json.loads(jsonString)
customer.insert_many(jsonfile)
以下是我在運行上述代碼時遇到的錯誤。
Traceback (most recent call last):
File "E:\Anaconda Projects\Mongo Projects\Office Tool\csvtojson.py", line 16, in <module>
jsonString = json.dumps(jsonArray, indent=1, separators=(",", ":"))
File "C:\Users\Predator\anaconda3\lib\json\__init__.py", line 234, in dumps
return cls(
File "C:\Users\Predator\anaconda3\lib\json\encoder.py", line 201, in encode
chunks = list(chunks)
MemoryError
我如果在 for 回圈下用一些縮進修改代碼。MongoDB 會再次匯入相同的資料,而不會停止。
import csv
import json
from pymongo import MongoClient
url = "mongodb://localhost:27017"
client = MongoClient(url)
db = client.Office
customer = db.Customer
jsonArray = []
with open("Names.txt", "r") as csv_file:
csv_reader = csv.DictReader(csv_file, dialect='excel', delimiter='|', quoting=csv.QUOTE_NONE)
for row in csv_reader:
jsonArray.append(row)
jsonString = json.dumps(jsonArray, indent=1, separators=(",", ":"))
jsonfile = json.loads(jsonString)
customer.insert_many(jsonfile)
uj5u.com熱心網友回復:
我建議你使用熊貓;它通過設定可以根據記憶體限制調整的 chunksize 引數來提供“分塊”模式。insert_many()也更有效率。
加上代碼變得更簡單:
import pandas as pd
filename = "Names.txt"
with pd.read_csv(filename, chunksize=1000, delimiter='|') as reader:
for chunk in reader:
db.mycollection.insert_many(chunk.to_dict('records'))
如果您發布檔案示例,我可以更新以匹配。
uj5u.com熱心網友回復:
一次插入一條記錄可以解決記憶體問題。
import csv
import json
from pymongo import MongoClient
url_mongo = "mongodb://localhost:27017"
client = MongoClient(url_mongo)
db = client.Office
customer = db.Customer
jsonArray = []
file_txt = "Text.txt"
rowcount = 0
with open(file_txt, "r") as txt_file:
csv_reader = csv.DictReader(txt_file, dialect="excel", delimiter="|", quoting=csv.QUOTE_NONE)
for row in csv_reader:
rowcount = 1
jsonArray.append(row)
for i in range(rowcount):
jsonString = json.dumps(jsonArray[i], indent=1, separators=(",", ":"))
jsonfile = json.loads(jsonString)
customer.insert_one(jsonfile)
print("Finished")
謝謝大家的想法
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/414864.html
標籤:
