如何遍歷和匹配兩個日期列并將關聯的代碼提取到新串列中-有解無憂

我有一個帶有代碼 col 和帶有 289k 條目的日期 col（型別：物件）的 Pandas df。每個日期都有多個代碼，因此 fx 10 行具有相同的日期和不同的代碼在下一個 col 中，然后 20 行具有帶有新代碼等的新日期。我還有一個包含 103 個條目的日期（型別：str）的 ndarray . 我想將 ndarray 中的所有日期與 df 匹配，每次找到匹配項時，我都想將該特定日期中的所有相關“代碼”提取到新串列中。我嘗試了很多不同的事情，但沒有取得多大成功。

filtered_codes = []
for j in raw_data.Dates:
    for q in reb_dates:
        if j == q:
            filtered_codes.append(raw_data.codes)

filtered_codes = []
 for j in reb_dates:
     for q in raw_data.Dates:
         if raw_data['Dates'][q] == reb_dates[j]:
             filtered_codes.append(raw_data.codes[q])

運行第一個會給我一個系列串列。所有串列都是相同的，并且該系列包含與我的 raw_data 一樣多的條目。

如果我運行第二個示例，則會收到此錯誤：

Traceback (most recent call last):
  File "C:\Users\xxx\venv\lib\site-packages\pandas\core\indexes\base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index_class_helper.pxi", line 105, in pandas._libs.index.Int64Engine._check_type
  File "pandas\_libs\index_class_helper.pxi", line 105, in pandas._libs.index.Int64Engine._check_type
KeyError: '2013-02-20'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "C:\Users\xxx.py", line 73, in <module>
    if raw_data['Dates'][q] == reb_dates[j]:
  File "C:\Users\xxx\venv\lib\site-packages\pandas\core\series.py", line 942, in __getitem__
    return self._get_value(key)
  File "C:\Usersxxx\venv\lib\site-packages\pandas\core\series.py", line 1051, in _get_value
    loc = self.index.get_loc(label)
  File "C:\Users\xxx\venv\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: '2013-02-20'

Process finished with exit code 1

是導致問題的回圈還是資料型別？我在python方面沒有那么有經驗。任何幫助表示贊賞

uj5u.com熱心網友回復：

規則是盡可能避免資料幀上的任何 Python 級別回圈。

但讓我們先看看您當前的代碼：

filtered_codes = []
for j in raw_data.Dates:
    for q in reb_dates:
        if j == q:
            filtered_codes.append(raw_data.codes) # Oops !!

您將完整列附加到串列中，raw_data.codes并在每次reb_dates日期列中出現日期時執行此操作。不是你想要的...

filtered_codes = []
 for j in reb_dates:
     for q in raw_data.Dates:
         if raw_data['Dates'][q] == reb_dates[j]:         # Oops
             filtered_codes.append(raw_data.codes[q])     # Oops again...

第一行Oops應該是if q == j:因為q并且j是實際的日期值，而不是它們在容器中的索引。和q表示日期而不是索引的字串值一樣，raw_data.codes[q]下一行也是一個錯誤。

您可以輕松構建一個系列，其中日期是索引（具有 str 型別，因此是物件 dtype）并且值是該日期的代碼集：

codes_per_date = raw_data[raw_data['Dates'].isin(reb_dates)].groupby(
    'Dates')['code'].agg(set)

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/355850.html

標籤：Python 熊猫麻木的

上一篇：獲得更干凈的斑點以進行計數

下一篇：如何將numpy模塊的ndarray物件轉換為python中的字串