我試圖從 txt 檔案中洗掉連詞和標點符號。標點符號已成功洗掉,但仍保留了一些連詞。這是我的代碼:
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
string words = File.ReadAllText(@"C:\Users\...\Desktop\data_protection_law.txt").ToLower(new CultureInfo("en-US", false));
string[] punctuation = { ".", "!", "?", "–", "-", "-", "/", "_", ",", ";", ":", "(", ")", "[", "]", "“", "”", "\"", "1", "2", "3", "4", "5", "6", "7", "8", "9" };
string[] con_art = { "the", "a", "an", "for", "and", "or", "nor", "but", "yet", "so", "of", "to", "in", "are", "is", "on", "be", "by", "we", "he", "that", "he", "that", "because", "as", "it", "about", "were", "i", "our", "they", "with", "these", "there", "then", "them" };
foreach (string s in punctuation)
{
words = words.Replace(s, "");
}
foreach (string s in con_art)
{
words = words.Replace(" " s " ", " ");
}
richTextBox1.Text = words;
}
}
為了確定,我在richTextBox 中列印了這些單詞。查了原文,發現有些連詞被刪了,但不是全部。 這是剩余連詞的證明
原始文本檔案
我快瘋了,幾天來我一直試圖自己找出錯誤,但我找不到。
那么我在這段代碼中的錯誤在哪里?順便說一句,我只是一個初學者,所以如果我犯了一個大錯誤,請不要生氣:)
uj5u.com熱心網友回復:
我認為您需要完全更改搜索和替換樣式;在這里使用正則運算式是最簡單的
var rex = string.Join("|", con_art.Select(w => $@"\b{w}\b"));
words = Regex.Replace(words, rex, "", RegexOptions.IgnoreCase);
第一行代碼將您的單詞串列轉換為字串,例如
\bthe\b|\ba\b|\ban\b|\bfor\b|\band\b|\bor\b|...
當被正則運算式引擎使用時,\b表示“非單詞字符(如空格、標點符號、換行符等)與單詞字符(如字母、數字等)之間的邊界”;這有效地使搜索the, a, an, for,and等功能作為“僅整個單詞” - 你正在嘗試使用你的空格(這不起作用,因為有時你的單詞沒有被空格包圍)。
豎線|表示“或”;通過提供“整個單詞'the'或整個單詞'a'或整個單詞'an'......”的串列,這意味著您不必在回圈中一遍又一遍地替換()
uj5u.com熱心網友回復:
因為有時單詞兩邊沒有被空格包圍。
您無法替換的都在行首或行尾,這意味著有換行符而不是空格
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/429714.html
上一篇:我目前正在從事圖書管理專案,我正在使用VisualStudio中的sqlserver,我在資料庫中有圖書類別表
下一篇:從專案屬性設定文化資訊
