我有以下示例資料框:
dat <- data.frame(date= c("Sep2020", "Oct2020", "Nov2020", "Dec2020"),
txt= c("1.1 What is the Constitution? 1.2 The original charter, which replaced the Articles of Confederation 1.3 hat all States would be equal. ",
"4.4 What is the Bill of Rights? 4.5 The 9th and 10th amendments are general ",
"5.1 in criminal prosecution to a speedy and public 5.2 War, three amendments were ratified (1865 5.3 13. The most recent amendment, the 27th, was",
"6.2 the case of the proposed equal rights amendment, the Congress exten 6.3 but the proposed Amendment was never ratifie 6.4 tification deadline. The 38th State, Michig"))
我想拆分資料幀,以便在每個數字(點)數字之后創建一個新行。最終的資料框如下所示:
dat2 <-data.frame(date= c("Sep2020", "Sep2020", "Sep2020", "Oct2020", "Oct2020", "Nov2020", "Nov2020", "Nov2020", "Dec2020", "Dec2020", "Dec2020"),
txt= c("1.1 What is the Constitution?","1.2 The original charter, which replaced the Articles of Confederation","1.3 hat all States would be equal. ",
"4.4 What is the Bill of Rights?", "4.5 The 9th and 10th amendments are general ",
"5.1 in criminal prosecution to a speedy and public", "5.2 War, three amendments were ratified (1865", "5.3 13. The most recent amendment, the 27th, was",
"6.2 the case of the proposed equal rights amendment, the Congress exten", "6.3 but the proposed Amendment was never ratifie", "6.4 tification deadline. The 38th State, Michig"))
這是我到目前為止:
dat<-dat %>%
mutate(parsed= str_extract_all(txt, "(\\d{1}\\.\\d{1,2})")) %>%
unnest(parsed)
我能夠得到數字,但不能得到它們之間的文本。例如,我是正則運算式的初學者,無法弄清楚如何說我想要 1.1 和 1.2 之間的所有內容。
謝謝!
uj5u.com熱心網友回復:
我們可能會使用 separate_rows
library(tidyr)
library(dplyr)
dat %>%
separate_rows(txt, sep = "\\s (?=\\d \\.\\d )")
uj5u.com熱心網友回復:
library(dplyr)
library(tidyr)
library(stringr)
dat %>%
mutate(parsed = stringr::str_extract_all(txt, ".*?[^0-9](?=$|[0-9]{1}\\.[0-9]{1})")) %>%
select(-txt) %>%
unnest(parsed) %>%
mutate(parsed = trimws(parsed))
# # A tibble: 11 x 2
# date parsed
# <chr> <chr>
# 1 Sep2020 1.1 What is the Constitution?
# 2 Sep2020 1.2 The original charter, which replaced the Articles of Confederation
# 3 Sep2020 1.3 hat all States would be equal.
# 4 Oct2020 4.4 What is the Bill of Rights?
# 5 Oct2020 4.5 The 9th and 10th amendments are general
# 6 Nov2020 5.1 in criminal prosecution to a speedy and public
# 7 Nov2020 5.2 War, three amendments were ratified (1865
# 8 Nov2020 5.3 13. The most recent amendment, the 27th, was
# 9 Dec2020 6.2 the case of the proposed equal rights amendment, the Congress exten
# 10 Dec2020 6.3 but the proposed Amendment was never ratifie
# 11 Dec2020 6.4 tification deadline. The 38th State, Michig
我正在使用,".*?[^0-9](?=$|[0-9]{1}\\.[0-9]{1})"因為這 ( {1}) 最接近您使用的內容,但我想知道這是否過度約束。除非您知道您永遠不會看到 (eg) 1.10,否則我通常更喜歡".*?[^0-9](?=$|[0-9] \\.[0-9] )".
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/347589.html
