對于我的國際象棋引擎,我使用統計資料來選擇最佳移動。我從數百萬個游戲中收集了它們。我對當前的移動、下一步以及在當前移動的情況下下一個移動的次數感興趣。
對于使用 Python 字典并將其與 pickle 一起存盤,檔案太大,并且很難用新游戲進行更新。所以我決定使用 SQLite。
我創建了一個類MovesDatabase:
class MovesDatabase:
def __init__(self, work_dir):
self.con = sqlite3.connect(os.path.join(work_dir, "moves.db"))
self.con.execute('PRAGMA temp_store = MEMORY')
self.con.execute('PRAGMA synchronous = NORMAL')
self.con.execute('PRAGMA journal_mode = WAL')
self.cur = self.con.cursor()
self.cur.execute("CREATE TABLE IF NOT EXISTS moves("
"move TEXT,"
"next TEXT,"
"count INTEGER DEFAULT 1);")
move并next以字串格式表示棋盤的狀態:FEN。例子:
- rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR
- r1b1k1nr/p2p1pNp/n2B4/1p1NP2P/6P1/3P1Q2/P1P1K3/q5b1
- 8/8/8/4p1K1/2k1P3/8/8/8 b
下面的方法負責獲取一個游戲檔案,如果這對 ( move, next) 是新的,則提取移動并插入,或者如果 ( move, next) 已經存在于資料庫中,則更新:
def insert_moves_from_file(self, file: str):
print("Extracting moves to database from " file)
count = 0
with open(file) as games_file:
game = chess.pgn.read_game(games_file)
while game is not None:
batch = []
board = game.board()
state_one = board.fen().split(' ')[0] ' ' board.fen().split(' ')[1]
for move in game.mainline_moves():
board.push(move)
fen = board.fen().split(' ')
state_two = fen[0] ' ' fen[1]
res = self.cur.execute("SELECT * FROM moves WHERE move=? AND next=?",
(state_one, state_two))
res = res.fetchall()
if len(res) != 0:
self.cur.execute("UPDATE moves SET count=count 1 WHERE move=? AND next=?",
(state_one, state_two))
else:
batch.append((state_one, state_two))
state_one = state_two
self.cur.executemany("INSERT INTO moves(move, next) VALUES"
"(?, ?)", batch)
count = 1
print('\r' "%d games was add to the database.." % (count 1), end='')
game = chess.pgn.read_game(games_file)
self.con.commit()
print("\n Finished!")
這對夫婦 ( move, next) 是獨一無二的。
我用一個包含大約 400 萬個 ( move, next) 的檔案進行了測驗。它開始以 3.000 行/秒的速度插入/更新,但在 50K 行時它會減慢到 100 行/秒并繼續下降。我設計這種方法來處理多個游戲檔案,這就是為什么我首先選擇 SQL 資料庫。
uj5u.com熱心網友回復:
這里不是INSERT很慢。
您的move和next列沒有被索引,因此任何SELECT或UPDATE涉及這些列都需要全表掃描。
如果(move, next)始終是唯一的,您將需要在其UNIQUE上添加索引。它還將自動使查詢move/next對的查詢更快(但不一定是那些僅查詢這兩列之一的查詢)。
要在現有表上創建該索引,
CREATE UNIQUE INDEX ix_move_next ON moves (move, next);
最后,一旦你有了那個索引,你也可以用 upsert 擺脫整個SELECT/UPDATE東西:
INSERT INTO moves (move, next) VALUES (?, ?) ON CONFLICT (move, next) DO UPDATE SET count = count 1;
這是一個輕微的重構,在我的機器上插入了大約 6200 次移動/秒。(它需要tqdm一個漂亮的進度條庫和一個pgns/包含 PGN 檔案的目錄。)
import glob
import sqlite3
import chess.pgn
import tqdm
from chess import WHITE
def board_to_state(board):
# These were extracted from the implementation of `board.fen()`
# so as to avoid doing extra work we don't need.
bfen = board.board_fen(promoted=False)
turn = ("w" if board.turn == WHITE else "b")
return f'{bfen} {turn}'
def insert_game(cur, game):
batch = []
board = game.board()
state_one = board_to_state(board)
for move in game.mainline_moves():
board.push(move)
state_two = board_to_state(board)
batch.append((state_one, state_two))
state_one = state_two
cur.executemany("INSERT INTO moves (move, next) VALUES (?, ?) ON CONFLICT (move, next) DO UPDATE SET count = count 1", batch)
n_moves = len(batch)
return n_moves
def main():
con = sqlite3.connect("moves.db")
con.execute('PRAGMA temp_store = MEMORY')
con.execute('PRAGMA synchronous = NORMAL')
con.execute('PRAGMA journal_mode = WAL')
con.execute('CREATE TABLE IF NOT EXISTS moves(move TEXT,next TEXT,count INTEGER DEFAULT 1);')
con.execute('CREATE UNIQUE INDEX IF NOT EXISTS ix_move_next ON moves (move, next);')
cur = con.cursor()
for pgn_file in sorted(glob.glob("pgns/*.pgn")):
with open(pgn_file) as games_file:
n_games = 0
with tqdm.tqdm(desc=pgn_file, unit="moves") as pbar:
while (game := chess.pgn.read_game(games_file)):
n_moves = insert_game(cur, game)
n_games = 1
pbar.set_description(f"{pgn_file} ({n_games} games)", refresh=False)
pbar.update(n_moves)
con.commit()
if __name__ == '__main__':
main()
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/510188.html
下一篇:如何查看臨時表?
