在R中，如果有多個TRUE答案，則選擇第一個TRUE答案-有解無憂

我有一個資料框，它代表一條河流的兩年每日溫度時間序列。對于這條河，我想知道一年中的哪一天（doy）：

溫度持續大于或等于 10 度

持續是指在一年中的最高溫度之后（例如秋季或冬季）之前不再有低于 10 度的下降

溫度持續低于或等于 10 度

持續是指直到第二年沒有超過 10 的峰值

當我嘗試計算 2 時遇到錯誤，因為TRUE代碼有多個答案可供選擇。TRUE如果有多個答案，我想知道如何使代碼與第一個答案一致TRUE。

示例資料集

library(ggplot2)
library(lubridate)
library(dplyr)
library(dataRetrieval)

siteNumber <- "01417500"
parameterCd <- "00010" # water temperature
statCd <- "00003" # mean
startDate <- "2015-01-01"
endDate <- "2016-12-31"

dat <- readNWISdv(siteNumber, parameterCd, startDate, endDate, statCd=statCd)
dat <- dat[,c(2:4)]
colnames(dat)[3] <- "temperature"

# Visually inspect the time series
ggplot(data = dat, aes(x = Date, y = temperature))  
  geom_point()  
  theme_bw()

1 和 2 的代碼，其中 2 有問題，因為有多個TRUE陳述句可供選擇

dat %>%
  mutate(year = year(Date),
         doy = yday(Date)) %>%
  group_by(year) %>%
  mutate(gt_10 = temperature >= 10, # greater than or equal to 10 degrees
         lt_10 = temperature <= 10, # less than or equal to 10 degrees
         peak_doy = doy[which.max(temperature)], # what doy is max temperature
         below_peak = doy < peak_doy, # is the observed doy less than the peak temperature doy
         after_peak = doy > peak_doy, # is the observed doy greater than the peak temperature doy
         test_above = ave(gt_10, cumsum(!gt_10), FUN = cumsum), # counts number of days above 10 degree threshold
         test_below = ave(lt_10, cumsum(!lt_10), FUN = cumsum)) %>% # counts number of days below 10 degree threshold
  summarise(first_above_10_sustained = doy[below_peak == T & test_above == 14]-13, # answer to 1 
            first_below_10_sustained = doy[after_peak == T & test_below == 14]-13) # answer to 2

要回答 2，代碼正在查看溫度在年度峰值溫度之后（即after_peak == T）以及溫度連續 14 天低于 10 閾值的時間（即test_below == 14）。這test_below == 14是錯誤所在，因為這種情況發生多次。是的，您可以將連續天數的閾值更改為 > 14 的某個值，但這不是重點。TRUE如果有多個答案，我怎樣才能讓代碼接受第一個TRUE答案？

我在這里有一個類似的 SO 問題，但我的答案僅在沒有多個TRUE答案可供選擇時才有效。

uj5u.com熱心網友回復：

我會在這里使用一些技巧：

創建一個指定溫度是高于還是低于 10 度的列。
獲取rleid此列的，它將對高于或低于 10 度閾值的所有連續天進行分組。
找出每年的最高溫度，并將其存盤在一個列中。
根據rleid您的定義，包含最高溫度的日期將是該年溫度持續 > 10 度的日期
用最低溫度做同樣的事情，但是在計算那一年的最低溫度時，在計算分組的年份時減去 6 個月。這不會對最終結果產生影響，但可以計算冬季期間的最低溫度：

df <- dat %>%
  mutate(year = year(Date)) %>%
  group_by(year) %>%
  mutate(max_temp = max(temperature)) %>%
  ungroup() %>%
  mutate(above_ten = temperature >= 10,
         run = factor(data.table::rleid(above_ten))) %>%
  group_by(run) %>%
  mutate(sustained_hi = max(temperature) == max(max_temp)) %>%
  ungroup() %>%
  mutate(year = year(Date - months(6))) %>%
  group_by(year) %>%
  mutate(min_temp = min(temperature)) %>%
  group_by(run) %>%
  mutate(sustained_lo = min(temperature) == min(min_temp)) %>%
  mutate(group = ifelse(sustained_hi, 'High', 
                        ifelse(sustained_lo, 'Low', 
                               'Unsustained'))) %>%
  select(site_no, Date, temperature, group, run)

這導致：

df
#> # A tibble: 731 x 5
#> # Groups:   run [27]
#>    site_no  Date       temperature group run  
#>    <chr>    <date>           <dbl> <chr> <fct>
#>  1 01417500 2015-01-01         0.7 Low   1    
#>  2 01417500 2015-01-02         1.1 Low   1    
#>  3 01417500 2015-01-03         1   Low   1    
#>  4 01417500 2015-01-04         2.5 Low   1    
#>  5 01417500 2015-01-05         2   Low   1    
#>  6 01417500 2015-01-06         0.3 Low   1    
#>  7 01417500 2015-01-07         0.2 Low   1    
#>  8 01417500 2015-01-08         0.2 Low   1    
#>  9 01417500 2015-01-09         0.3 Low   1    
#> 10 01417500 2015-01-10         0.3 Low   1    
#> # ... with 721 more rows
#> # i Use `print(n = ...)` to see more rows

我們可以通過這樣的繪圖來查看結果：

ggplot(df, aes(x = Date, y = temperature, color = group))  
  geom_point()  
  scale_color_manual(limits = c('High', 'Unsustained', 'Low'),
                     values = c('orange', 'gray', 'steelblue'))  
  theme_bw()

在 R 中，如果有多個 TRUE 答案，則選擇第一個 TRUE 答案

我們可以通過執行以下操作獲得一個關于我們持續高溫/低溫的開始和結束日期的漂亮小資料框：

df %>% 
  filter(group != 'Unsustained') %>%
  group_by(run) %>%
  summarize(Date = c(first(Date), last(Date)),
            Event = paste('Sustained', first(group), c('Start', 'End'))) %>%
  ungroup() %>%
  select(-run)
#> # A tibble: 10 x 2
#>    Date       Event               
#>    <date>     <chr>               
#>  1 2015-01-01 Sustained Low Start 
#>  2 2015-04-28 Sustained Low End   
#>  3 2015-04-29 Sustained High Start
#>  4 2015-07-16 Sustained High End  
#>  5 2015-11-08 Sustained Low Start 
#>  6 2016-03-31 Sustained Low End   
#>  7 2016-05-18 Sustained High Start
#>  8 2016-10-09 Sustained High End  
#>  9 2016-10-23 Sustained Low Start 
#> 10 2016-12-31 Sustained Low End

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/521645.html

標籤：r

上一篇：獲取條件下列中每個向量的連續TRUE（或FALSE）值的數量

下一篇：更改R中可反應表中的搜索欄文本