我有一個帶有日期時間值的系列,帶有時區。
如果我使用 np.select 來:
- 如果 12 小時后 -> 第二天回傳
- 如果 11 小時前 -> 當天回傳
- 否則回傳 np.nan
使用帶有 timezones的 datetime 值,它可以作業。但是,如果我在洗掉時區后使用 np.select,它會給我以下錯誤:
TypeError: Choicelists and default value do not have a common dtype: The DType <class 'numpy.dtype[datetime64]'> could not be promoted by <class 'numpy.dtype[float64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtype[datetime64]'>, <class 'numpy.dtype[float64]'>)
這是我的代碼:
import pandas as pd
import numpy as np
from datetime import timedelta
import datetime
datetime_series = pd.Series(['2022-09-24 22:00:00 02:00','2022-09-04 11:30:00 02:00', '2022-11-11 02:20:30 02:00', '2022-11-12 03:20:30 02:00'])
#make datetime
datetime_series = pd.to_datetime(datetime_series, errors='coerce')
#remove timezone
datetime_series_no_timezone = datetime_series.dt.tz_localize(None)
print ('datetime_series dtype: ', datetime_series.dtype)
print ('datetime_series_no_timezone dtype: ', datetime_series_no_timezone.dtype)
# with timezone it works
conditions = [
datetime_series.dt.hour > 12,
datetime_series.dt.hour < 11]
choiches = [
(datetime_series datetime.timedelta(days=1)),
datetime_series ]
print (np.select(conditions, choiches, default=np.nan))
# without timezone it doesn't
conditions = [
datetime_series_no_timezone.dt.hour > 12,
datetime_series_no_timezone.dt.hour < 11]
choiches = [
(datetime_series_no_timezone datetime.timedelta(days=1)),
datetime_series_no_timezone ]
print (np.select(conditions, choiches, default=np.nan))
出去:
datetime_series dtype: datetime64[ns, pytz.FixedOffset(120)]
datetime_series_no_timezone dtype: datetime64[ns]
[Timestamp('2022-09-25 22:00:00 0200', tz='pytz.FixedOffset(120)') nan
Timestamp('2022-11-11 02:20:30 0200', tz='pytz.FixedOffset(120)')
Timestamp('2022-11-12 03:20:30 0200', tz='pytz.FixedOffset(120)')]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-63a5d209c9ba> in <module>
30 (datetime_series_no_timezone datetime.timedelta(days=1)),
31 datetime_series_no_timezone ]
---> 32 print (np.select(conditions, choiches, default=np.nan))
<__array_function__ internals> in select(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/numpy/lib/function_base.py in select(condlist, choicelist, default)
687 except TypeError as e:
688 msg = f'Choicelists and default value do not have a common dtype: {e}'
--> 689 raise TypeError(msg) from None
690
691 # Convert conditions to arrays and broadcast conditions and choices
uj5u.com熱心網友回復:
使用合適的默認值來避免錯誤;在這里你可以使用 pandas 的非時間值:
import pandas as pd
import numpy as np
datetime_series = pd.Series(
[
"2022-09-24 22:00:00 02:00",
"2022-09-04 11:30:00 02:00",
"2022-11-11 02:20:30 02:00",
"2022-11-12 03:20:30 02:00",
]
)
datetime_series = pd.to_datetime(datetime_series, errors="coerce")
datetime_series_no_timezone = datetime_series.dt.tz_localize(None)
conditions = [
datetime_series_no_timezone.dt.hour > 12,
datetime_series_no_timezone.dt.hour < 11,
]
choiches = [
(datetime_series_no_timezone pd.Timedelta(days=1)),
datetime_series_no_timezone,
]
s = np.select(conditions, choiches, default=pd.NaT)
print(s)
# [1664143200000000000 NaT 1668133230000000000 1668223230000000000]
請注意,自行程中的 Unix 紀元以來,日期時間值將轉換為整數納秒。
至于為什么會發生這種情況,如果您查看該系列的 numpy 表示,
datetime_series.to_numpy()
Out[26]:
array([Timestamp('2022-09-24 22:00:00 0200', tz='pytz.FixedOffset(120)'),
Timestamp('2022-09-04 11:30:00 0200', tz='pytz.FixedOffset(120)'),
Timestamp('2022-11-11 02:20:30 0200', tz='pytz.FixedOffset(120)'),
Timestamp('2022-11-12 03:20:30 0200', tz='pytz.FixedOffset(120)')],
dtype=object)
datetime_series_no_timezone.to_numpy()
Out[27]:
array(['2022-09-24T22:00:00.000000000', '2022-09-04T11:30:00.000000000',
'2022-11-11T02:20:30.000000000', '2022-11-12T03:20:30.000000000'],
dtype='datetime64[ns]')
您會看到,在第一種情況下,由于 pytz 固定偏移量,numpy 找不到合適的 dtype 并使用“object”。在第二種情況下,numpy 將 datetime 確定為 dtype。我假設當這些陣列傳遞給時np.select,在第一種情況下,由于它是物件(任何東西!),因此不會嘗試強制使用常見的 dtypes。在第二種情況下,進行了這樣的嘗試,但失敗了,np.nan 是 dtype float 而 datetime 是 datetime 或 int 如果轉換為 Unix 時間納秒。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/530343.html
