我正在努力正確閱讀 .txt。為此,我正在使用:
r_dat <- read_csv(ff,quote = ",", skip=5)
資料如下:
dput(head(r_dat))
structure(list(`--------------------------------------------------------` =
c("240 19790111_00 7 7 0.86587346 0.75074303 1.35784054
-0.45948577 -1.18579698 -1.07059395 -0.34373909",
"243 19790111_03 0 7 0.85441613 0.72267860 1.31580353 -0.44945070
-1.16703987 -1.03862977 -0.32952571",
"246 19790111_06 7 7 0.83927369 0.69352823 1.27102554 -0.43822104
-1.14390016 -1.00381625 -0.31613618",
"249 19790111_09 0 7 0.82096398 0.66378951 1.22433603 -0.42610571
-1.11709785 -0.96681082 -0.30378893",
"252 19790111_12 7 7 0.79906243 0.63312817 1.17505550
-0.41290590 -1.08599937 -0.92708772 -0.29231820",
"255 19790111_15 0 7 0.77413946 0.60201460 1.12398231 -0.39892274
-1.05132735 -0.88528168 -0.28191039"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
因此,它會生成一個包含時間、小時和一些數值的單個字符列,每個值都應進入不同的列。
為了正確處理資料,我需要將列拆分為 11 列。因此,在閱讀檔案后,我嘗試使用單獨的列:
r_dat%>%separate(1, into=paste("X",seq(1,11,by=1), sep=""), sep=" ")
但它不起作用,看來我需要添加更多列。此外,當添加超過 11 列并使用整個資料集時,我收到警告“警告訊息:1:預期 27 件。在 57140 行中丟棄了其他件”..
我嘗試了幾種“參考”模式(例如“”、“\t”、),因為我認為問題出在那里......但它不起作用。任何建議將不勝感激。
謝謝
uj5u.com熱心網友回復:
查看您的資料,您似乎想拆分為 12 個元素,假設您想在出現一個或多個空格的任何地方拆分:
r_dat %>%
separate(1, into = paste0("X", 1:12), sep = " {1,}")
或者
r_dat %>%
mutate(new_cols = str_split(.[[1]], " {1,}")) %>%
unnest_wider(new_cols)
第一個選擇將給出:
# A tibble: 6 x 12
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 240 19790111_00 7 7 0.86587346 0.75074303 1.35784054 "\n" "-0.45948577" -1.18579698 -1.07059395 -0.34373909
2 243 19790111_03 0 7 0.85441613 0.72267860 1.31580353 "-0.44945070" "\n" -1.16703987 -1.03862977 -0.32952571
3 246 19790111_06 7 7 0.83927369 0.69352823 1.27102554 "-0.43822104" "\n" -1.14390016 -1.00381625 -0.31613618
4 249 19790111_09 0 7 0.82096398 0.66378951 1.22433603 "-0.42610571" "\n" -1.11709785 -0.96681082 -0.30378893
5 252 19790111_12 7 7 0.79906243 0.63312817 1.17505550 "\n" "-0.41290590" -1.08599937 -0.92708772 -0.29231820
6 255 19790111_15 0 7 0.77413946 0.60201460 1.12398231 "-0.39892274" "\n" -1.05132735 -0.88528168 -0.28191039
進入轉換為數字的第二點,您可以嘗試:
r_dat %>%
separate(1, into = paste0("X", 1:12), sep = " {1,}") %>%
mutate(across(everything(), ~type.convert(.)))
這使:
# A tibble: 6 x 12
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
<int> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 240 19790111_00 7 7 0.866 0.751 1.36 NA -0.459 -1.19 -1.07 -0.344
2 243 19790111_03 0 7 0.854 0.723 1.32 -0.449 NA -1.17 -1.04 -0.330
3 246 19790111_06 7 7 0.839 0.694 1.27 -0.438 NA -1.14 -1.00 -0.316
4 249 19790111_09 0 7 0.821 0.664 1.22 -0.426 NA -1.12 -0.967 -0.304
5 252 19790111_12 7 7 0.799 0.633 1.18 NA -0.413 -1.09 -0.927 -0.292
6 255 19790111_15 0 7 0.774 0.602 1.12 -0.399 NA -1.05 -0.885 -0.282
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/364200.html
標籤:r
上一篇:只保存tsdiag中的一個圖
