我知道對于堆疊溢位的類似問題還有其他解決方案,但它們不適用于我的特定情況。
我有一些字串——這里有一些例子。
string_with_dates = "random non-date text, 22 May 1945 and 11 June 2004"
string2 = "random non-date text, 01/01/1999 & 11 June 2004"
string3 = "random non-date text, 01/01/1990, June 23 2010"
string4 = "01/2/2010 and 25th of July 2020"
string5 = "random non-date text, 01/02/1990"
string6 = "random non-date text, 01/02/2010 June 10 2010"
我需要一個決議器,它可以確定字串中有多少個類似日期的物件,然后將它們決議為串列中的實際日期。我在那里找不到任何解決方案。這是所需的輸出:
['05/22/1945','06/11/2004']
或者作為實際的 datetiem 物件。有任何想法嗎?
我已經嘗試過此處列出的解決方案,但它們不起作用。如何從 Python(或其他語言)的文本塊中決議多個日期
以下是當我嘗試該鏈接中建議的解決方案時發生的情況:
import itertools
from dateutil import parser
jumpwords = set(parser.parserinfo.JUMP)
keywords = set(kw.lower() for kw in itertools.chain(
parser.parserinfo.UTCZONE,
parser.parserinfo.PERTAIN,
(x for s in parser.parserinfo.WEEKDAYS for x in s),
(x for s in parser.parserinfo.MONTHS for x in s),
(x for s in parser.parserinfo.HMS for x in s),
(x for s in parser.parserinfo.AMPM for x in s),
))
def parse_multiple(s):
def is_valid_kw(s):
try: # is it a number?
float(s)
return True
except ValueError:
return s.lower() in keywords
def _split(s):
kw_found = False
tokens = parser._timelex.split(s)
for i in xrange(len(tokens)):
if tokens[i] in jumpwords:
continue
if not kw_found and is_valid_kw(tokens[i]):
kw_found = True
start = i
elif kw_found and not is_valid_kw(tokens[i]):
kw_found = False
yield "".join(tokens[start:i])
# handle date at end of input str
if kw_found:
yield "".join(tokens[start:])
return [parser.parse(x) for x in _split(s)]
parse_multiple(string_with_dates)
輸出:
ParserError: Unknown string format: 22 May 1945 and 11 June 2004
另一種方法:
from dateutil.parser import _timelex, parser
a = "I like peas on 2011-04-23, and I also like them on easter and my birthday, the 29th of July, 1928"
p = parser()
info = p.info
def timetoken(token):
try:
float(token)
return True
except ValueError:
pass
return any(f(token) for f in (info.jump,info.weekday,info.month,info.hms,info.ampm,info.pertain,info.utczone,info.tzoffset))
def timesplit(input_string):
batch = []
for token in _timelex(input_string):
if timetoken(token):
if info.jump(token):
continue
batch.append(token)
else:
if batch:
yield " ".join(batch)
batch = []
if batch:
yield " ".join(batch)
for item in timesplit(string_with_dates):
print "Found:", (item)
print "Parsed:", p.parse(item)
輸出:
ParserError: Unknown string format: 22 May 1945 11 June 2004
有任何想法嗎?
uj5u.com熱心網友回復:
好吧,對花時間在這上面的任何人表示抱歉——但我能夠回答我自己的問題。留下這個以防其他人有同樣的問題。
這個包能夠完美地作業:https ://pypi.org/project/datefinder/
import datefinder
def DatesToList(x):
dates = datefinder.find_dates(x)
lists = []
for date in dates:
lists.append(date)
return (lists)
dates = DateToList(string_with_dates)
輸出:
[datetime.datetime(1945, 5, 22, 0, 0), datetime.datetime(2004, 6, 11, 0, 0)]
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/537783.html
