我正在嘗試刮掉名字并將它們匯入到 Excel 表中以供以后使用。問題是我需要它們在 3 個不同的單元格中first,last和initial。該腳本在這種情況下查找關鍵字est of并列印整行,該行具有全名和“est of”。我需要它:
- 從最后洗掉 est of。
- 將全名拆分為 3,以便可以將其匯出到作業表中。
繼承人的代碼:
#!python
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException
from random import randint
import pickle
import datetime
import os
import time
import sys
import openpyxl
from openpyxl import Workbook
import re
url = 'https://www.miamidade.gov/global/home.page'
current_time = datetime.datetime.now()
current_time.strftime("%m/%d/%Y")
options = webdriver.ChromeOptions()
options.headless = True
chromedriver = "chromedriver.exe"
number = "2080"
driver = webdriver.Chrome(chromedriver) #chromedriver
driver.get(url)
pickle.dump(driver.get_cookies() , open("cookies.pkl","wb"))
time.sleep(3)
nav1 = driver.find_element_by_xpath('/html/body/div[2]/div/div[1]/div/header/div[2]/nav/div/div[1]/div/div[1]/a').click()
time.sleep(1)
nav2 = driver.find_element_by_xpath('/html/body/div[2]/div/div[1]/div/header/div[2]/div[2]/div/div/div/ul/li[1]/button').click()
propsrch1 = driver.find_element_by_xpath('/html/body/div[2]/div/div[1]/div/header/div[2]/div[2]/div/div/div/ul/li[1]/ul/li[2]/ul/li[5]/a').click()
time.sleep(2)
propsrch2 = driver.find_element_by_xpath('/html/body/div[2]/div/main/div[2]/div/div[2]/div/div[1]/div[1]/ul/li[1]/span/a').click()
time.sleep(5)
subdivision = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/ul/li[3]/a').click()
searchbar = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/div[2]/div[2]/div/div[3]/div/input')
time.sleep(2)
searchbar.send_keys("RICHMOND HGTS")
search = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/div[2]/div[2]/div/div[3]/div/span/button/span').click()
time.sleep(10)
table = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/div[2]/div[4]/a').click()
main_window_handle = None
while not main_window_handle:
main_window_handle = driver.current_window_handle
#driver.find_element_by_xpath(u'//a[text()="click here"]').click()
signin_window_handle = None
while not signin_window_handle:
for handle in driver.window_handles:
if handle != main_window_handle:
signin_window_handle = handle
break
driver.switch_to.window(signin_window_handle)
time.sleep(20)
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'html.parser')
keyword = 'est of'
#keywords = soup.find(keyword)
counts = soup.find_all(text=re.compile("EST OF"))
for count in counts:
print(count)
現在它列印到 cmd 中,所以我可以看到它的作業。看起來像這樣:
GRACE K ROLLE EST OF
ETHEL H FIFE EST OF
BARBARA J BROUSSARD EST OF
CLEMENTINA D RAHMING EST OF
CHARLES B CAMBRIDGE JR EST OF
EMILY STATEN EST OF
HATTIE S KING EST OF
拆分名稱的最佳方法是什么?
uj5u.com熱心網友回復:
您可以使用拆分方法拆分以下空間
for count in counts:
count= count.split(' ')
First_name=counnt[0]
mid_name=count[1]
Last_name=count[2]
uj5u.com熱心網友回復:
如果您知道它總是由空格分隔的 3 個單詞,您可以使用count.split(' ')[:3].
如果您不知道名稱的長度,您可以使用count.rstrip('EST OF').split(' ').
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/436837.html
