我在將 csv 檔案讀入 MySQL 資料庫時遇到問題。我嘗試了多種解決方案,但錯誤不斷變化,代碼無法正常作業。相同的代碼已與另一個 csv 檔案一起使用,所以我想我可能對這個檔案做錯了什么?
這是我的代碼
from database_access import *
from builtins import bytes, int, str
import codecs
import csv
import requests
from urllib.parse import urlparse, urljoin
from bs4 import BeautifulSoup
import re
import cgi
import MySQLdb
import chardet
# from database_access import *
import MySQLdb
import simplejson
if __name__ == '__main__':
with open("SIMRA.csv",'r') as file:
reader = csv.reader(file)
#reader = csv.reader(text)
next(reader, None)
print ("project running")
#print (row[7])
#rowlist = []
all_links = []
all_project_ids = []
for row in reader:
if row[7] != "" and row[16] != "":
country = row[2]
city = row[8]
description = row[11] '' row[12]
title = row[7].replace("'", "''")
link = row[16]
#date_start = row[9]
#print a check here
print(title,description,country, city, link)
db = MySQLdb.connect(host, username, password, database, charset='utf8')
cursor = db.cursor()
new_project = True
proj_check = "SELECT * from Projects where ProjectName like '%" title "%'"
#proj_check = "SELECT * from Projects where ProjectName like %s",(title,)
#cur.execute("SELECT * FROM records WHERE email LIKE %s", (search,))
cursor.execute(proj_check)
num_rows = cursor.rowcount
if num_rows != 0:
new_project = False
url_compare = "SELECT * from Projects where ProjectWebpage like '" link "'"
#url_compare = "SELECT * from Projects where ProjectWebpage like %s",(link,)
cursor.execute(url_compare)
num_rows = cursor.rowcount
if num_rows != 0:
new_project = False
if new_project:
project_insert = "Insert into Projects (ProjectName,ProjectWebpage,FirstDataSource,DataSources_idDataSources) VALUES (%s,%s,%s,%s)"
cursor.execute(project_insert, (title, link,'SIMRA', 5))
projectid = cursor.lastrowid
print(projectid)
#ashoka_projectids.append(projectid)
db.commit()
ins_desc = "Insert into AdditionalProjectData (FieldName,Value,Projects_idProjects,DateObtained) VALUES (%s,%s,%s,NOW())"
cursor.executemany(ins_desc, ("Description", description, str(projectid)))
db.commit()
ins_location = "Insert into ProjectLocation (Type,Country,City,Projects_idProjects) VALUES (%s,%s,%s,%s)"
cursor.execute(ins_location, ("Main", country,city, str(projectid)))
db.commit()
else:
print('Project already exists!')
print(title)
all_links.append(link)
#print out SIMRA's links to a file for crawling later
with open('simra_links', 'w', newline='') as f:
write = csv.writer(f)
for row in all_links:
columns = [c.strip() for c in row.strip(', ').split(',')]
write.writerow(columns)
當我運行它時,我收到以下錯誤:
檔案“/usr/lib/python3.8/codecs.py”,第 322 行,解碼(結果,已使用)= self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't解碼位置 898 中的位元組 0xa3:無效的起始位元組
我做了一些研究并嘗試通過添加不同形式的編碼來處理編碼錯誤,如此處所示 - UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte和Python MySQLdb TypeError: not all arguments convert在字串格式化期間。在 csv open 引數中添加了這個 -
with open("SIMRA.csv", 'r', encoding="cp437", errors='ignore') as file:
使用這些不同的編碼選項運行代碼會出現不同的錯誤:
MySQLdb._exceptions.ProgrammingError:并非所有引數都在位元組格式化期間轉換
進一步的研究建議使用元組或串列來解決這個問題,所以我在代碼中的“選擇”函式中添加了這些,正如這里所建議的那樣 - Python MySQLdb TypeError: not all arguments convert during string formatting and in the Python SQL documentation here - PythonMySqldb
所以選擇查詢變成了:
proj_check = "SELECT * from Projects where ProjectName like %s",(title,)
cursor.execute(proj_check)
num_rows = cursor.rowcount
if num_rows != 0:
new_project = False
url_compare = "SELECT * from Projects where ProjectWebpage like %s",(link,)
cursor.execute(url_compare)
num_rows = cursor.rowcount
if num_rows != 0:
new_project = False
當我運行代碼時,我想出了這個斷言錯誤,我不知道該怎么辦了。
檔案“/home/ros/.local/lib/python3.8/site-packages/MySQLdb/cursors.py”,第 205 行,在執行 assert isinstance(query, (bytes, bytearray)) AssertionError
我已經沒有想法了。可能是我錯過了一些小東西,但我現在無法弄清楚這一點,因為我已經為此奮斗了兩天。
Can anyone help point out what I'm missing? It will be greatly appreciated. This code ran perfectly with another csv file. I am running this with Python 3.8 btw.
uj5u.com熱心網友回復:
現在已經解決了這個問題。我不得不對原始代碼使用不同的編碼,這解決了問題。因此,我將 csv open 引數更改為:
with open("SIMRA.csv",'r', encoding="ISO-8859-1") as file:
reader = csv.reader(file)
uj5u.com熱心網友回復:
你期待£嗎?您需要指定檔案的編碼是什么。它可能是“latin1”。請參閱 的語法以LOAD DATA了解如何指定CHARACTER SET latin1.
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/414858.html
標籤:
