我正在為 kaggle 上的一個專案制作一些 wordcloud,但是這行代碼不起作用。我試圖從包含文本的列中洗掉所有撇號。在我的語料庫中,“'s”和“'re”是我最常用的兩個“詞”。雖然資料仍然是資料框的形式,但我一直在使用這行代碼df$col <- gsub("\'","", df$col)。
下面是一些示例資料。在我的 kaggle 專案中,文本資料位于資料框的一列中。我錯過了什么嗎?我也試過str_replace_all和sub。
編輯:
dput(head(df))
structure(list(X1 = c(0, 1, 2, 3, 4, 5), Character = c("Michael",
"Jim", "Michael", "Jim", "Michael", "Michael"), Line = c("All right Jim. Your quarterlies look very good. How are things at the library?",
"Oh, I told you. I couldn’t close it. So…", "So you’ve come to the master for guidance? Is this what you’re saying, grasshopper?",
"Actually, you called me in here, but yeah.", "All right. Well, let me show you how it’s done.",
"[on the phone] Yes, I’d like to speak to your office manager, please. Yes, hello. This is Michael Scott. I am the Regional Manager of Dunder Mifflin Paper Products. Just wanted to talk to you manager-a-manger. [quick cut scene] All right. Done deal. Thank you very much, sir. You’re a gentleman and a scholar. Oh, I’m sorry. OK. I’m sorry. My mistake. [hangs up] That was a woman I was talking to, so… She had a very low voice. Probably a smoker, so… [Clears throat] So that’s the way it’s done."
), Season = c(1, 1, 1, 1, 1, 1), Episode_Number = c(1, 1, 1,
1, 1, 1)), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
編輯 2:以前我說過df$col <- gsub("\'","", df$col)在 R 作業室作業。這僅適用于玩具資料。我在 dput 上使用它但它沒有用,所以我回到了第一個。
uj5u.com熱心網友回復:
您的輸入有“花式引號”,而不是標準引號。這應該擺脫所有花哨的單引號和雙引號以及所有非花哨的單引號:
gsub("['‘’”“]", "", df$Line)
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/388735.html
上一篇:帶反斜杠的字串到字典
