用C計算檔案中的字母-有解無憂

我正在嘗試創建一個程式，該程式從檔案中讀取并計算檔案中每個字母字符的出現次數。下面是我到目前為止所擁有的，但是回傳的計數（存盤在計數器陣列中）高于預期。

  void count_letters(const char *filename, int counters[26]) {
    FILE* in_file = fopen(filename, "r");
    const char ALPHABET[] = "abcdefghijklmnopqrstuvwxyz";
    if(in_file == NULL){
        printf("Error(count_letters): Could not open file %s\n",filename);
        return;
    }
    char line[200];
    while(fgets(line, sizeof(line),in_file) != NULL){ //keep reading lines until there's nothing left to read
        for(int pos = 0; pos < sizeof(line); pos  ){//iterate through each character in line...
            if(isalpha(line[pos])){//skip checking and increment position if current char is not alphabetical
                for(int i = 0; i < 26; i  ){//...for each character in the alphabet
                    if(tolower(line[pos]) == tolower(ALPHABET[i]))//upper case and lower case are counted as same
                        counters[i]  ;    // increment the current element in counters for each match in the line
                }
            }
        }
    }
    fclose(in_file);
    return;
}

uj5u.com熱心網友回復：

在for(int pos = 0; pos < sizeof(line); pos ),sizeof(line)計算整個陣列的大小line，而不是最近fgets呼叫填充的部分。因此，在長行之后，回圈重復計算陣列中讀取短行的字符數。

修改回圈以僅遍歷line最近填充的部分fgets。您可以通過在看到空字符時退出回圈來做到這一點。

uj5u.com熱心網友回復：

我的 2 美分用于一個稍微簡單的解決方案（你有很多回圈 ;) ）。在大多數情況下，首選逐行讀取輸入，但由于您在這里只是計算字符，我認為這不是其中之一，最終會增加復雜性。該答案還假定使用ASCII 字符編碼，如另一個答案的注釋中所述，C 標準不保證這種編碼。您可以根據需要進行修改以char ALPHABET獲得最終的便攜性

#include <stdio.h>
#include <ctype.h>

#define NUM_LETTERS 26

int main(void)
{
    FILE* in_file = fopen("/path/to/my/file.txt", "r");
    if (in_file == NULL) exit(-1);

    unsigned charCounts[NUM_LETTERS] = {0};
    int curChar;
    // rather than reading line-by-line, read one character at a time
    while ((curChar = fgetc(in_file)) != EOF)
    {
        // only proceed if it is a letter
        if (isalpha(curChar))
        {
            // this is bad if not using ASCII, instead you'd need another
            // loop to check your ALPHABET, but increment the count here
            (charCounts[tolower(curChar) - 'a'])  ;
        }
    }

    // print out the results
    for (int i=0; i<NUM_LETTERS; i  )
    {
        // 'A' i also assumes ASCII encoding
        printf("%c: %u\n", 'A' i, charCounts[i]);
    }
}

演示使用stdin而不是檔案。

uj5u.com熱心網友回復：

你有一個錯誤for(int pos = 0; pos < sizeof(line); ...。您假設陣列中的所有 200 個位置都是有效字符，但這僅適用于每行有 200 個字符的文本。您應該只計算字串初始化部分中的字符。它的長度因行而異：

for(int pos = 0; pos < strlen(line); ...

此外，您不需要最內部的回圈，因為所有字母字符很可能都有順序的 ASCII 代碼：

if(isalpha(line[pos]))
    counters[tolower(line[pos]) - 'a']  ;

我假設counters之前已經用 0 初始化了。如果不是，則必須在計數前初始化此陣列。

uj5u.com熱心網友回復：

您不需要使用 fgets，因為字符函式的作業速度與檔案系統使用自己的緩沖相同。

#define NLETTERS    ('z' - 'a'   1)

int countLetters(FILE *fi, size_t *counter)
{
    int ch;
    if(fi && counter)
    {
        memset(counter, 0, sizeof(*counter * NLETTERS));
        while((ch = fgetc(fi)) != EOF)
        {
            if(isalpha(ch))
            {
                counter[tolower(ch) - 'a']  ;
            }
        }
        return 0;
    }
    return 1;
}

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/337925.html

標籤：C 文件

上一篇：外鍵在ASP.NET中是NULL的一對多關系。如何將其添加到控制器中？

下一篇：Jest：模擬NPM模塊方法