我目前正在使用以下格式從電子表格中抓取資料:

每個財政年度按以下方式分隔:

我想做的是在左側創建一個名為“Financial_Year”的附加列,該列從相關單元格中獲取日期。
所以我希望df看起來如下:

到目前為止,我的代碼如下:
library(tidyverse)
library(dplyr)
library(XLConnect)
tmp = tempfile(fileext = ".xls")
download.file(url = "https://dmo.gov.uk/umbraco/surface/DataExport/GetDataExport?reportCode=D4L&exportFormatValue=xls¶meters=&Financial Year=(All)", destfile = tmp, mode="wb")
holds <- readWorksheetFromFile(file = tmp, sheet=1) %>%
filter(across(everything(), ~!is.na(.)))
有人對如何實作結果有任何建議嗎?(檔案很大)。
編輯到目前為止,我一直在做的是單獨下載每年的 s/s,將它們編織在一起,重命名列并洗掉各種列。
uj5u.com熱心網友回復:
首先,我們加載第一列,以便我們可以對表進行分組并迭代加載它們。我們假設第一列 with"Financial"是一個清晰的標簽;從那開始,我們跳過每行的前幾行(僅限于一年內的每月資料),然后加載到list:
# library(readxl)
# library(cellranger) # imported by readxl
col1 <- readxl::read_xls(tmp, range = cellranger::cell_cols(1))
tablenames <- sub(".* - ", "", grep("Financial", col1[[1]], value = TRUE))
tablenames
# [1] "2005-06" "2006-07" "2007-08" "2008-09" "2009-10" "2010-11" "2011-12" "2012-13" "2013-14" "2014-15" "2015-16" "2016-17"
# [13] "2017-18" "2018-19" "2019-20" "2020-21" "2021-22"
tablerows <- split(seq_along(col1[[1]]), cumsum(grepl("Financial", col1[[1]])))[-1]
alldat <- lapply(setNames(tablerows, nm = tablenames), function(R) {
readxl::read_xls("~/Downloads/quux.xls", range = cellranger::cell_rows(R[-(1:5)]), .name_repair = "universal")
})
# New names:
### ...snip...
alldat[["2005-06"]]
# # A tibble: 56 x 14
# ...1 ...2 Apr..2005 May..2005 Jun..2005 Jul..2005 Aug..2005 Sep..2005 Oct..2005 Nov..2005 Dec..2005 Jan..2006 Feb..2006 Mar..2006
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 10?% Exchequer Stock 2005 GB0003270005 0.00395 0.00395 0.00842 0.00842 0.00842 0 0 0 0 0 0 0
# 2 8?% Treasury Stock 2005 GB0008880808 309. 309. 301. 301. 301. 301. 301. 545. 0 0 0 0
# 3 7?% Treasury Stock 2006 GB0008916024 549. 549. 549. 549. 549. 549. 549. 549. 549. 548. 548. 548.
# 4 9?% Conversion Stock 2006 GB0009021956 0 0 0.002 0.00505 0.00505 0.00505 0.00505 0.00505 0.00505 0.00505 0.00505 0.00505
# 5 7?% Treasury Stock 2006 GB0009998302 560. 602. 602. 602. 601. 601. 862. 862. 862. 862. 862. 862.
# 6 4?% Treasury Stock 2007 GB0034040740 344. 344. 344. 344. 342. 342. 596. 596. 596. 596. 596. 596.
# 7 8?% Treasury Loan 2007 GB0009126557 498. 498. 498. 498. 498. 498. 601. 601. 601. 601. 601. 601.
# 8 7?% Treasury Stock 2007 GB0009997114 550. 550. 550. 550. 549. 549. 795. 795. 795. 795. 795. 795.
# 9 5% Treasury Stock 2008 GB0031734154 557. 558. 558. 558. 558. 558. 873. 873. 873. 872. 872. 872.
# 10 9% Treasury Loan 2008 GB0009128371 1.59 2.12 11.8 12.0 12.4 12.8 13.0 13.0 13.1 13.9 13.9 0.0442
# # ... with 46 more rows
alldat[["2006-07"]]
# # A tibble: 59 x 14
# ...1 ...2 Apr..2006 May..2006 Jun..2006 Jul..2006 Aug..2006 Sep..2006 Oct..2006 Nov..2006 Dec..2006 Jan..2007 Feb..2007 Mar..2007
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 7?% Treasury Stock 2006 GB0008916024 549. 549. 629. 629. 740. 0 0 0 0 0 0 0
# 2 9?% Conversion Stock 2006 GB0009021956 0.00505 0.00505 0.00505 0.00505 0.00505 0.00705 0.00705 0 0 0 0 0
# 3 7?% Treasury Stock 2006 GB0009998302 862. 862. 1175. 1175. 1175. 1388. 1664. 5159. 0 0 0 0
# 4 4?% Treasury Stock 2007 GB0034040740 596. 596. 596. 596. 596. 596. 588. 668. 668. 668. 1597. 0
# 5 8?% Treasury Loan 2007 GB0009126557 601. 601. 601. 601. 601. 601. 602. 602. 602. 602. 600. 600.
# 6 7?% Treasury Stock 2007 GB0009997114 795. 795. 795. 795. 795. 795. 795. 795. 795. 795. 795. 795.
# 7 5% Treasury Stock 2008 GB0031734154 873. 873. 873. 873. 873. 872. 864. 864. 864. 864. 865. 865.
# 8 9% Treasury Loan 2008 GB0009128371 0.485 0.596 1.63 1.65 4.72 82.0 82.2 82.3 82.3 103. 104. 0.261
# 9 4% Treasury Stock 2009 GB0032785924 754. 754. 754. 754. 754. 754. 746. 745. 745. 745. 746. 746.
# 10 8% Treasury Stock 2009 GB0009125369 2.12 4.24 4.31 4.48 4.64 4.75 4.83 5.88 5.98 6.76 6.88 0.189
# # ... with 49 more rows
注意:tablenames如果您更喜歡將其用作寬格式的幀串列,alldat[["2005-06"]]我會使用它,因此參考是有意義的。但是,您可以通過先旋轉將其組合到一個框架中(因為列名都不同)。試試這個作為起點:
library(dplyr)
library(tidyr) # pivot_longer
combined <- lapply(
alldat,
function(z) separate(pivot_longer(z, -(1:2)), name, c("month", "year"))
) %>%
bind_rows()
combined
# # A tibble: 14,644 x 5
# ...1 ...2 month year value
# <chr> <chr> <chr> <chr> <dbl>
# 1 10?% Exchequer Stock 2005 GB0003270005 Apr 2005 0.00395
# 2 10?% Exchequer Stock 2005 GB0003270005 May 2005 0.00395
# 3 10?% Exchequer Stock 2005 GB0003270005 Jun 2005 0.00842
# 4 10?% Exchequer Stock 2005 GB0003270005 Jul 2005 0.00842
# 5 10?% Exchequer Stock 2005 GB0003270005 Aug 2005 0.00842
# 6 10?% Exchequer Stock 2005 GB0003270005 Sep 2005 0
# 7 10?% Exchequer Stock 2005 GB0003270005 Oct 2005 0
# 8 10?% Exchequer Stock 2005 GB0003270005 Nov 2005 0
# 9 10?% Exchequer Stock 2005 GB0003270005 Dec 2005 0
# 10 10?% Exchequer Stock 2005 GB0003270005 Jan 2006 0
# # ... with 14,634 more rows
combined[10000:10010,]
# # A tibble: 11 x 5
# ...1 ...2 month year value
# <chr> <chr> <chr> <chr> <dbl>
# 1 3?% Treasury Gilt 2020 GB00B582JV65 Jul 2017 1423.
# 2 3?% Treasury Gilt 2020 GB00B582JV65 Aug 2017 1423.
# 3 3?% Treasury Gilt 2020 GB00B582JV65 Sep 2017 1423.
# 4 3?% Treasury Gilt 2020 GB00B582JV65 Oct 2017 1423.
# 5 3?% Treasury Gilt 2020 GB00B582JV65 Nov 2017 1423.
# 6 3?% Treasury Gilt 2020 GB00B582JV65 Dec 2017 1423.
# 7 3?% Treasury Gilt 2020 GB00B582JV65 Jan 2018 1423.
# 8 3?% Treasury Gilt 2020 GB00B582JV65 Feb 2018 1423.
# 9 3?% Treasury Gilt 2020 GB00B582JV65 Mar 2018 1423.
# 10 1?% Treasury Gilt 2021 GB00BYY5F581 Apr 2017 349.
# 11 1?% Treasury Gilt 2021 GB00BYY5F581 May 2017 349.
You'll likely want to convert month and year into either a Date (mutate(date = as.Date(paste(year, month, "01"), format = "%Y %b %d"))) or at least into something sortable.
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/454418.html
上一篇:mapply用于單個串列
下一篇:在r中創建具有第三個屬性的時間線
