我已經使用wikipedia包從任何 wikipedia 頁面獲取影像 url 串列:
import wikipedia
et_page = wikipedia.page("Summer")
images = et_page.images
現在,我想將所有影像從影像變數保存到名為影像的集合中的 mongodb。
import pymongo
from PIL import Image
import io
client = pymongo.MongoClient("mongodb srv://<>:<>@cluster0.lfrg6.mongodb.net/myFirstDatabase?retryWrites=true&w=majority")
database_name = 'test'
database = client[database_name]
collection = 'images'
image_collection = database[collection]
有什么辦法嗎?由于有多個影像,它們可以以串列格式保存嗎?
uj5u.com熱心網友回復:
最好不要將 MongoDB 用作任意 blob 資料存盤,尤其是。對于大影像。縮略圖和小資訊圖表很好。但OP試圖了解它是如何能做到,最好的方法是使用gridFS。 gridFS是 pymongo 環境的一部分,所以如果可以的import pymongo話import gridfs。這是一個作業示例:
import wikipedia
import pymongo
import gridfs
from urllib.request import urlopen
connstr = "mongodb://yourInfoHere"
client = pymongo.MongoClient(connstr)
database = client.testX
# This will create two collections that are under control
# of the gridfs object, images.chunks and images.files. Do
# not go to these collections directly; use the gridfs
# methods instead. The choice of "images" is arbitrary; you
# can use any name you wish. gridfs will add .chunks and .files
# to the real collection names.
# Docs are here
# https://pymongo.readthedocs.io/en/stable/api/gridfs/index.html#module-gridfs
gfs = gridfs.GridFS(database, collection="images")
page_name = "Summer"
print("capturing URLs to images on page",page_name)
et_page = wikipedia.page(page_name)
images = et_page.images
n = 0
for ii in images:
print("processing",ii)
f = urlopen(ii)
# put() "inserts" the file-like object into the gfs subsystem
# and returns an ID.
file_id = gfs.put(f)
# Make up a name and capture it AND the gridfs ID in a
# regular collection, called imageMeta here but it is
# any name you like. It is not strictly necessary to do this
# and it is completely separate from gridFS but you will almost
# always have a need to capture some metadata around the pix.
name = "IMAGE_" str(n)
database.imageMeta.insert_one({"name":name, "fileId":file_id})
n = 1
# Here is an alternate solution where only 1 imageMeta doc is written
# but with arrays of image info. You STILL need to push each image
# individually into gridfs:
n = 0
info = []
for ii in images:
print("processing",ii)
f = urlopen(ii)
# put() "inserts" the file-like object into the gfs subsystem
# and returns an ID.
file_id = gfs.put(f)
# Make up a name and capture it AND the gridfs ID in a
# regular collection, called imageMeta here but it is
# any name you like.
name = "IMAGE_" str(n)
info.append({"name":name, "fileId":file_id})
n = 1
database.imageMeta.insert_one({"page":page_name, "imageInfo":info});
# Here is how you can get your images out. Let's pick
# IMAGE_0 for example but obviously any query criteria on the
# imageMeta docs is valid:
doc = database.imageMeta.find_one({"name":"IMAGE_0"});
gg = gfs.get(doc['fileId'])
with open('foo.jpg', 'wb ') as wf:
wf.write(gg.read()) # Nice read/write slurp
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/401473.html
