我正在嘗試創建一個 webscraping Flask 應用程式,它為每個用戶會話創建一個新的 Webdriver 實體,以便不同的用戶可以從不同的頁面抓取內容。如果driver.get()和 資料收集發生在同一個 API 呼叫中,這會更簡單,但由于我將要做的抓取的性質,它們不能。這是我到目前為止所擁有的:
from flask import Flask, session
from flask_session import Session
app = Flask(__name__)
SESSION_TYPE = 'filesystem'
app.config.from_object(__name__)
Session(app)
@app.route('/get_site/<link>', methods=['GET'])
def get_site(link):
session['driver'] = webdriver.Chrome(options=options)
session['driver'].get(link)
return 'link opened!' # confirmation message
@app.route('/scrape_current_site', methods=['GET'])
def scrape_current_site():
return session['driver'].title # collecting arbitrary data from page
app.run()
但是,這不起作用:
AttributeError: Can't pickle local object '_createenviron.<locals>.encode'
從概念上講,flask 會話感覺就像是我正在尋找的東西(每個會話都有一個新的、獨特的物件),但我不知道如何讓它作業。
uj5u.com熱心網友回復:
將瀏覽器實體存盤在全域字典中,
并使用會話來存盤 user_id,它是您 dict 的鍵。
此外,如果我們為每個用戶啟動瀏覽器,您也應該保留我的瀏覽器,
如果一段時間后用戶不再發送任何請求,我們也應該關閉它。
我已經使用這個答案制作了一個后臺計時器程式來清除未使用的瀏覽器。
import atexit
import threading
import uuid
import time
from flask import Flask, session
app = Flask(__name__)
app.secret_key = 'any random string'
POOL_TIME = 5 #Seconds
MAX_OPEN_BROWSER_TIME = 60 #Seconds
#stores browser instance as value and user_id as key between requests
userBrowsers = {}
#user_id as key and unix time of last browser usage as value
userLastuse = {}
dataLock = threading.Lock()
timerThread = threading.Thread()
def create_app():
app = Flask(__name__)
def interrupt():
global timerThread
timerThread.cancel()
def deleteUnusedBrowsers():
global userBrowsers
global timerThread
with dataLock:
for userId, lastuse in userLastuse.items():
if time.time() - lastuse > MAX_OPEN_BROWSER_TIME:
del userBrowsers[userId]
# runs this function after a delay
timerThread = threading.Timer(POOL_TIME, deleteUnusedBrowsers, ())
timerThread.start()
def startTimer():
global timerThread
timerThread = threading.Timer(POOL_TIME, deleteUnusedBrowsers, ())
timerThread.start()
startTimer()
# When you kill Flask (SIGTERM), clear the trigger for the next thread
atexit.register(interrupt)
return app
app = create_app()
@app.route('/open_site/<url>', methods=['GET'])
def open_site(url):
browser = get_browser()
browser.get(url)
return 'site opened!'
def get_browser():
# check if user have an id in session or assign an id to user
user_id = session.get("session-id")
if user_id is None:
user_id = uuid.uuid4()
session["session-id"] = user_id
# check if user have a browser instance or create one for user
browser = userBrowsers.get(user_id)
if browser is None:
browser = webdriver.Chrome(options=options)
userBrowsers[user_id] = browser
# updates last use time of browser
userLastuse[user_id] = time.time()
return browser
uj5u.com熱心網友回復:
您可以嘗試在來自同一用戶的多個請求上持續存在的 g 模塊
from flask import g
并將驅動程式附加到 g eg
g._driver = webdriver.Chrome(options=options)
g._driver.get(link)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/403762.html
標籤:
