問題
我正在使用 R 包使用 RmRpostman訪問我的郵件帳戶。當我通過 Thunderbird 將從我的計算機發送到專用郵件地址的郵件獲取時,一切正常。但是當我使用我的 Android 手機做同樣的事情時,文本被奇怪地編碼并且不再清晰。我該如何解決?我試過使用base64enc::base64decode(),但我無法讓它作業。我嘗試通過更改編碼同樣失敗了Encoding()。
代表
我發了兩封郵件。一個來自我使用 Thunderbird 的計算機,文本只是“從計算機上的 Thunderbird 發送”。另一封郵件是使用我的 Android 手機使用默認郵件應用程式發送的。這個僅包含文本“從 Android 發送”。
library(mRpostman) # for email communication
# Connect to mail server
imap_mail <- 'imaps://imap.gmail.com' # mail client
user_mail <- keyring::key_get('dataviz-mail')
password_mail <- keyring::key_get('dataviz-mail-password')
# Establish connection to imap server
con <- configure_imap(
url = imap_mail,
user = user_mail,
password = password_mail
)
# Switch to Inbox
con$select_folder('Inbox')
# Fetch Thunderbird mail
con$fetch_text(11)
#> $text11
#> [1] "Sent from thunderbird on computer\r\n\r\n"
# Fetch Android mail
con$fetch_text(12)
#> $text12
#> [1] "----_com.samsung.android.email_7640956728775490\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nVGhpcyBtYWlsIGlzIHNlbnQgZnJvbSBBbmRyb2lk\r\n\r\n----_com.samsung.android.email_7640956728775490\r\nContent-Type: text/html; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nPGh0bWw PGhlYWQ PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0\r\nL2h0bWw7IGNoYXJzZXQ9VVRGLTgiPjwvaGVhZD48Ym9keSBkaXI9ImF1dG8iPlRoaXMgbWFpbCBp\r\ncyBzZW50IGZyb20gQW5kcm9pZDwvYm9keT48L2h0bWw \r\n\r\n----_com.samsung.android.email_7640956728775490--\r\n\r\n"
由reprex 包創建于 2022-04-06 (v2.0.0 )
更新
Allan Cameron 的解決方案有效,但洗掉了換行符
library(tidyverse)
text_that_should_contain_line_breaks <- "----_com.samsung.android.email_6729645824359240\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\naHR0cHM6Ly90d2l0dGVyLmNvbS9jX2dlYmhhcmQvc3RhdHVzLzE1MTA4NjcwMDkxMTM5MjM1ODg/\r\ncz0yMCZ0PWR0X3dvVkV2a3dPSjBfRGZUc2ttZUFIYW5kZHJhd24gZm9udCBoZWFkaW5nVm9uIG1l\r\naW5lbS9tZWluZXIgR2FsYXh5IGdlc2VuZGV0\r\n\r\n----_com.samsung.android.email_6729645824359240\r\nContent-Type: text/html; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nPGh0bWw PGhlYWQ PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0\r\nL2h0bWw7IGNoYXJzZXQ9VVRGLTgiPjwvaGVhZD48Ym9keSBkaXI9ImF1dG8iPmh0dHBzOi8vdHdp\r\ndHRlci5jb20vY19nZWJoYXJkL3N0YXR1cy8xNTEwODY3MDA5MTEzOTIzNTg4P3M9MjAmYW1wO3Q9\r\nZHRfd29WRXZrd09KMF9EZlRza21lQTxkaXYgZGlyPSJhdXRvIj48YnI PC9kaXY PGRpdiBkaXI9\r\nImF1dG8iPkhhbmRkcmF3biBmb250IGhlYWRpbmc8L2Rpdj48ZGl2IGRpcj0iYXV0byI PGJyPjwv\r\nZGl2PjxkaXYgaWQ9ImNvbXBvc2VyX3NpZ25hdHVyZSIgZGlyPSJhdXRvIj48ZGl2IHN0eWxlPSJm\r\nb250LXNpemU6MTJweDtjb2xvcjojNTc1NzU3IiBkaXI9ImF1dG8iPlZvbiBtZWluZW0vbWVpbmVy\r\nIEdhbGF4eSBnZXNlbmRldDwvZGl2PjwvZGl2PjxkaXYgZGlyPSJhdXRvIj48YnI PC9kaXY PC9i\r\nb2R5PjwvaHRtbD4=\r\n\r\n----_com.samsung.android.email_6729645824359240--\r\n\r\n"
decoded <- text_that_should_contain_line_breaks %>%
str_match('base64\\r\\n\\r\\n([[:alpha:][:digit:]/\\r\\n]*)----') %>%
.[, 2] %>%
base64enc::base64decode() %>%
rawToChar()
decoded
#> [1] "https://twitter.com/c_gebhard/status/1510867009113923588?s=20&t=dt_woVEvkwOJ0_DfTskmeAHanddrawn font headingVon meinem/meiner Galaxy gesendet"
# But should be
cat("https://twitter.com/c_gebhard/status/1510867009113923588?s=20&t=dt_woVEvkwOJ0_DfTskmeA\nHanddrawn font heading\nVon meinem/meiner Galaxy gesendet")
#> https://twitter.com/c_gebhard/status/1510867009113923588?s=20&t=dt_woVEvkwOJ0_DfTskmeA
#> Handdrawn font heading
#> Von meinem/meiner Galaxy gesendet
由reprex 包于 2022-04-11 創建 (v2.0.0 )
uj5u.com熱心網友回復:
android 字串確實包含 base 64 編碼的訊息,但它嵌入在其他非 base64 編碼文本中,因此您必須提取它。
如果我們從您的問題中獲取字串:
text12 <- "----_com.samsung.android.email_7640956728775490\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nVGhpcyBtYWlsIGlzIHNlbnQgZnJvbSBBbmRyb2lk\r\n\r\n----_com.samsung.android.email_7640956728775490\r\nContent-Type: text/html; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nPGh0bWw PGhlYWQ PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0\r\nL2h0bWw7IGNoYXJzZXQ9VVRGLTgiPjwvaGVhZD48Ym9keSBkaXI9ImF1dG8iPlRoaXMgbWFpbCBp\r\ncyBzZW50IGZyb20gQW5kcm9pZDwvYm9keT48L2h0bWw \r\n\r\n----_com.samsung.android.email_7640956728775490--\r\n\r\n"
然后我們可以提取出 base 64 字串,將其解碼為位元組并轉換為字符,如下所示:
library(dplyr)
library(purrr)
library(base64enc)
text12 %>%
strsplit("base64\r\n\r\n") %>%
pluck(1, 2) %>%
strsplit("----") %>%
pluck(1, 1) %>%
gsub(pattern = "[\r\n] ", replacement = "", .) %>%
base64decode() %>%
rawToChar()
#> [1] "This mail is sent from Android"
由reprex 包創建于 2022-04-06 (v2.0.1)
更新
訊息似乎被存盤了兩次:一次是純文本,第二次是 html 格式的文本。純文本中沒有實際的換行符,而 html 因為<br>標簽而只有換行符。獲取保留換行符的文本的最簡單方法是決議 html。
parsed_content <- text_that_should_contain_line_breaks %>%
strsplit("base64\r\n\r\n") %>%
pluck(1, 3) %>%
strsplit("----") %>%
pluck(1, 1) %>%
base64decode() %>%
rawToChar() %>%
rvest::read_html() %>%
rvest::html_text2()
例如:
cat(parsed_content)
#> https://twitter.com/c_gebhard/status/1510867009113923588?s=20&t=dt_woVEvkwOJ0_DfTskmeA
#>
#>
#> Handdrawn font heading
#>
#>
#> Von meinem/meiner Galaxy gesendet
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/461069.html
