C程式中的邊緣情況，用于從文本檔案中查找和替換單詞-有解無憂

我是 C 新手，非常感謝在修復我的程式中的錯誤方面提供的幫助。

我已經確定了一個邊緣案例，但我不太確定如何解決它。

目前，該功能將在給定文本檔案的單詞中查找和替換單詞和單詞。例如，將 'water' 更改為 'snow' 會將字串 'waterfall' 更改為 'snowfall'。這是預期的結果。

但是，當我輸入“瀑布”來更改“瀑布”這個詞時，程式似乎陷入了無限回圈。我不太清楚為什么，但如果有人能指出我正確的方向，我將不勝感激。

這是我的代碼：

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>

#define BUFFER_SIZE 20

void replaceWord(char *str, const char *oldWord, const char *newWord)
{
    char *position, buffer[BUFFER_SIZE];
    int index, oldWordLength;
    oldWordLength = (long)strlen(oldWord);
    while ((position = strstr(str, oldWord)) != NULL)
    {
        strcpy(buffer, str);
        index = position - str;
        str[index] = '\0';
        strcat(str, newWord);
        strcat(str, buffer   index   oldWordLength);
    }
}

int main()
{
    char msg[100] = "This is some text with the word snowfall to replace";
    puts(msg);
    replaceWord(msg, "snowfall", "snowfalls");
    puts(msg);
    return 0;
}

uj5u.com熱心網友回復：

好的。首先，您的緩沖區嚴重不足。這：

char buffer[BUFFER_SIZE];

是最終成為原始訊息的完整字串副本的目標。但是在main，原始訊息：

char msg[100] = "This is some text with the word snowfall to replace";

是51 個字符寬（不包括終止符）。那是行不通的，并且通過除錯地址清理程式（或理想情況下的常規除錯器）運行會表明這一點。：

==1==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd578054d4 at pc 0x7f7457e4c846 bp 0x7ffd57805460 sp 0x7ffd57804c10
WRITE of size 52 at 0x7ffd578054d4 thread T0
    #0 0x7f7457e4c845  (/opt/compiler-explorer/gcc-11.2.0/lib64/libasan.so.6 0x55845)
    #1 0x4012a7 in replaceWord /app/example.c:15
    #2 0x401592 in main /app/example.c:27
    #3 0x7f7457c2c0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6 0x270b2)
    #4 0x40112d in _start (/app/output.s 0x40112d)

Address 0x7ffd578054d4 is located in stack of thread T0 at offset 52 in frame
    #0 0x4011f5 in replaceWord /app/example.c:9

所以這顯然不是什么好事。通過增加緩沖區大小來解決這個問題將“起作用”，但要真正做到這一點，源/目標緩沖區（在您的函式中它們是相同的）應該具有其完整的可寫寬度（并且包括終止符的空間）作為引數提供（您絕對應該這樣做。

其次，執行此操作的代碼：

while ((position = strstr(str, oldWord)) != NULL)

總是從輸入字串oldWord的開頭開始搜索。那是錯誤的（嗯，它只正確一次；第一次通過；之后，它是錯誤的）。考慮一下：

i love c

假設我看 ori并且我想用is. 我會在這里找到它：

i love c  
^

替換我正在構建的新字串后，如下所示：

is love c

那么你知道從哪里開始下一次搜索呢？您從原始字串開始的位置開始，加上替換字串值的長度。原來在 pos 0。替換的長度是 2，所以我們在 pos 2 開始下一次搜索。

is love c  
  ^

請注意，當您在原地執行所有操作（例如，沒有中間緩沖區）時，這會變得更加復雜，但這似乎不是您現在的目標，并且可能不在您的視線范圍內。因此，一種效率不高但實用的方法是：

從字串的開頭開始 ( src = str)
搜索從開始的舊詞src。
如果找到，將原始字串復制到緩沖區，但不包括舊單詞。
將替換字串附加到緩沖區。
將傳遞舊字的原始字串的其余部分附加到緩沖區。
將緩沖區復制回源字串。
重新定位src為替換詞的長度超過從 (3) 中找到的舊詞的原始位置
回圈回到（2），直到不再找到舊單詞。

就像我說的; 效率不高，但相當容易理解。在代碼中看起來像這樣。請注意，緩沖區大小顯著增加，用于宣告臨時緩沖區和main. 這還是不好，但這是你帶來的，所以我堅持下去。我敦促您考慮使用動態記憶體管理或作為附加引數傳遞的大小限制來執行此演算法：

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>

#define BUFFER_SIZE 100

void replaceWord(char *str, const char *oldWord, const char *newWord)
{
    char buffer[ BUFFER_SIZE ];
    char *src = str;
    char *oldpos = NULL;

    size_t lenOldWord = strlen(oldWord);
    size_t lenNewWord = strlen(newWord);

    while ((oldpos = strstr(src, oldWord)) != NULL)
    {
        // 1. copy everything up to the old word.
        // 2. append the new word
        // 3. copy the remainder of source string *past* the old word
        // 4. copy back to the original string.
        memcpy(buffer, str, (size_t)(oldpos - str));
        memcpy(buffer   (oldpos - str), newWord, lenNewWord);
        strcpy(buffer   (oldpos - str)   lenNewWord, oldpos   lenOldWord);
        strcpy(str, buffer);

        // the new starting point will be the previous discovry
        //  location plus the length of the new word.
        src = oldpos   lenNewWord;
    }
}

int main()
{
    char msg[BUFFER_SIZE] = "This is some text with the word snowfall to replace";
    puts(msg);
    replaceWord(msg, "snowfall", "snowfalls");
    puts(msg);
    return 0;
}

輸出

This is some text with the word snowfall to replace
This is some text with the word snowfalls to replace

我強烈建議您在除錯器中運行它并逐步觀察它是如何作業的。它將幫助您了解您缺少 start-search-here 邏輯的地方。作為練習，我更敦促您解決該演算法的明顯漏洞。想一想可以輕松呼叫未定義行為的方法（提示：簡短、常見的舊詞、非常長的替換詞），以及解決這些問題應該做的事情。

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/430754.html

標籤：C 细绳文件指针代替

上一篇：需要指標時避免創建命名變數

下一篇：用于包布局的SSIS鍵盤快捷鍵