在C 中讀取檔案并在行之間進行比較-有解無憂

假設 file.txt 包含如下隨機檔案名：

a.cpp
b.txt
c.java
d.cpp
...

這個想法是我想從檔案名中將檔案擴展名作為子字串，然后在擴展名之間進行比較以查找重復項。
這是我的代碼：

#include<iostream>
#include<fstream>
#include<string>
using namespace std;

    int main()
    {

    ifstream infile;   
    infile.open("file.txt"); 

    string str,sub;
    int count,pos=0;

    while(infile>>str)  
    {
    pos=str.find(".");
    sub=str.substr(pos 1);
    
    
    if(sub==?)
        // I stopped here
        count  ;  
    
    }
    cout<<count;    
    return 0;
    }

我是 C 的新手，所以我不知道使用哪個函式跳轉到下一行，我搜索了很多來弄清楚，但沒有。

uj5u.com熱心網友回復：

您可以使用以下程式列印輸入檔案中每個擴展名對應的計數。該程式用于std::map跟蹤計數。


#include <iostream>
#include <map>
#include <fstream>

int main()
{
   
    std::ifstream inputFile("input.txt");
    
    std::map<std::string, int> countExtOccurence; //this will count how many time each extension occurred
    
    std::string name, extension;
    if(inputFile)
    {
        while(std::getline(inputFile, name, '.')) //this will read upto a . occurrs 
        {
            std::getline(inputFile, extension, '\n');
            {
                countExtOccurence[extension]  ; //increase the count corresponding to a given extension
            }
        }
    }
    else 
    {
        std::cout<<"input file cannot be opened"<<std::endl;
    }
    inputFile.close();
    
    //lets print out how many times each extensino occurred in the file 
    for(const std::pair<std::string, int> &pairElem: countExtOccurence)
    {
        std::cout<<pairElem.first<<" occurred: "<<pairElem.second<<" time"<<std::endl;
    }
    return 0;
}

上面程式的輸出可以在這里看到。

uj5u.com熱心網友回復：

好的，您想讀取存盤在檔案中的檔案名，然后獲取擴展名的計數。

這看起來很簡單，但事實并非如此。原因是，現在的檔案名可以包含所有型別的特殊字符。其中可能有空格，也可能有多個點 ('.') 。根據檔案系統的不同，可能會有斜線“/”（如在 Unix/Linux 中）或反斜線“\”（如在 Windows 系統中）作為分隔符。還有沒有擴展名的檔案名和以句點開頭的特殊檔案。（如“.profile”）。所以基本上沒那么容易。

即使只有檔案名，您應該做的最少操作是從字串的右端搜索點“.”，（可能）表示檔案擴展名。永遠不要從左側。因此，在您的情況下，您應該使用rfind而不是find.

現在，對于您的問題，如何閱讀下一行。您使用格式化輸入函式的方法適用于示例源檔案中顯示的檔案名，但如果檔案名中有空格，則不起作用。例如，您的陳述句infile>>str將在第一個空格字符后停止轉換。

示例：檔案名是“Hello World.txt”，那么“str”將只包含“Hello”，下一次讀取將包含“World.txt”。因此，您應該閱讀帶有專用函式的完整行std::getline。請在此處閱讀說明。

有了它，您可以逐行閱讀：while(std::getline(inputFile,str)。

然后，稍后您可以拆分擴展并對其進行計數。

對于擴展的拆分，我已經給了你一個提示和一些警告。但是，非常好，C 為您提供了一個隨時可用的解決方案。此處filesystem描述的-library 。這擁有您需要的一切，隨時可用。

特別有用的是路徑型別，它有一個功能擴展。這將為您完成所有細節。

因為它這樣做，我強烈建議使用它。

現在，生活變得簡單。請參閱以下示例：

#include <iostream>
#include <string>
#include <filesystem>

// Namespace alias to save a lot of typing work . . .
namespace fs = std::filesystem;

int main() {
    // Read any kind of filename from the user
    std::string line{};   std:getline(std::cin, line);

    // Print the extension
    std::cout << fs::path{ line }.extension().string();
}

因此，無需擔心作業系統和任何型別的檔案名。它只會為您完成所有基礎作業。

接下來，計數。

有一種或多或少的標準方法來計算容器中的某些東西或由輸入給出，然后可能會另外獲取并顯示其排名。所以，按出現頻率排序。

對于計數部分，我們可以使用關聯容器，如 astd::map或 a std::unordered_map。在這里，我們將“鍵”（在本例中為“擴展”）與計數和值相關聯，在本例中為特定“擴展”的計數。

幸運的是，基本上選擇這種方法的原因是兩個地圖都有一個非常好的 index operator[]。這將查找給定的鍵，如果找到，則回傳對計數的參考。如果未找到，則它將使用鍵（“擴展名”）創建一個新條目并回傳對新條目的參考。因此，在這兩種情況下，我們都將獲得對用于計數的值的參考。然后我們可以簡單地寫：

std::unordered_map<std::string, int> counter{};
counter[extension]  ;

這看起來非常直觀。

完成此操作后，您已經有了頻率表。要么按鍵（單詞）排序，要么使用 astd::map或 unsorted，但使用 a可以更快地訪問std::unordered_map。

在您的情況下，您只對計數感興趣， astd::unordered_map是可取的，因為不需要std::map按其鍵對 a 中的資料進行排序，以后也不使用此排序。

Then, maybe you want to sort according to the frequency/count. if you do not want to do that, then skip the following:

Sorting of maps by their value is infortunately not possible. Because a major property of a the map - container family is their reference to a key and not a value or count.

Therefore we need to use a second container, like a std::vector, or such, which we then can sort using std::sort for any given predicate, or, we can copy the values into a container, like a std::multiset that implicitly orders its elements. And because this is just a one liner, this is the recommended solution.

Additionally, because writing all these long names for the std containers, we create alias names, with the using keyword.

After we have the rank of the words, so, the list of words sorted by its count, we can use iterators and loops to access the data and output them.

Because you want to read from a file, I would like to give also an additional information regarding opening an closing a file (a stream).

If you read about the ifstream then you will see that is has a constructor, which takes a filename as input and a destructor, which will automatically close the file for you. File opening via the constructor will return a file stream variable which has a state. This is, by the way, true for any stream.

The background is, that its bool-operator is overwritten and will return the state of the stream. Also the not-operator ! is overwritten and can be used. Because of those overwritten operators you can write something like if (filestream) to see, if a file could be opened.

Additionally, since C 17, we have an extended if-statement, where you can use an initializer list in front of the conditions. This is important because it allows us to define a variable, that will be checked later, but with a scope limited to the if compound statement. Which in most cases is very much recommended. Example:

// Open a file and check, if it could be opened
if (std::ifstream infile("file.txt"); infile) {

   // ....   Do things fith file stream

} // Here the file will be closed automatically by the destructor

Much better than unessary open and close statements.

And now, after we thought about the design, now we can start to write code. Not before.

So, we will get now:

#include <iostream>
#include <fstream>
#include <string>
#include <filesystem>
#include <unordered_map>
#include <set>
#include <type_traits>
#include <utility>

// ------------------------------------------------------------
// Create aliases. Save typing work and make code more readable
using Pair = std::pair<std::string, unsigned int>;

// Standard approach for counter
using Counter = std::unordered_map<Pair::first_type, Pair::second_type>;

// Sorted values will be stored in a multiset
struct Comp { bool operator ()(const Pair& p1, const Pair& p2) const { return (p1.second == p2.second) ? p1.first<p2.first : p1.second>p2.second; } };
using Sorter = std::multiset<Pair, Comp>;

// Namespace alias
namespace fs = std::filesystem;
// ------------------------------------------------------------


int main() {

    // Open the source file and check, if it could be opened
    if (std::ifstream inFileStream{ "r:\\file.txt" }; inFileStream) {

        // Here we will count the extensions of the file names
        Counter counter{};

        // Read source file strings and count the extensions
        std::string line{};
        // Read all lines from file
        while (std::getline(inFileStream, line))

            // Get extensions and count them
            counter[ fs::path{ line }.extension().string() ]  ;

        // Show result to the user. 
        for (const Pair& p : counter) std::cout << p.first << " --> " << p.second << '\n';

    } // File will be closed here
    else {
        // file could not be opened
        std::cerr << "\n\n*** Error: Input file could not be opened\n\n";
    }
}

With only 8 statements in function main, we can do all the needed task, inclusive all kind of path formats and error handling.

There is more optimization possible.

As I mentioned, it is always a good concept to have a narrow scope for variable. In the above code, we can see the variable "line" is defined in the outer scope of the while loop. That is not necessary. And because we a for and a while loop are basically the same, we can better use a for loop, because it has an initializer part.

Instead of

std::string line{};
        // Read all lines from file
        while (std::getline(inFileStream, line))

we can write

        for  (std::string line{};std::getline(inFileStream, line);  )

We could even exagerate a little bit and do the counting in the iteration expression part of the for-loop. And do the whole reading and counting in just one for statement

        for (std::string line{}; std::getline(inFileStream, line); counter[fs::path{ line }.extension().string()]  )
            ;

So, do the complete reading of the file and the complete counting of all kinds of extensions in one statement in one line of code. Wow!

But readability is a little bit lower and we will not use that.

In the output statement, we could do some more readable stuff. Basically Pair and .first and .second is not that nice and understandable. C has also a solution for that. It is called structured binding.

With all the above, we come now to the final implementation, including output sorted by the frequency:

#include <iostream>
#include <fstream>
#include <string>
#include <filesystem>
#include <unordered_map>
#include <set>
#include <type_traits>
#include <utility>

// ------------------------------------------------------------
// Create aliases. Save typing work and make code more readable
using Pair = std::pair<std::string, unsigned int>;

// Standard approach for counter
using Counter = std::unordered_map<Pair::first_type, Pair::second_type>;

// Sorted values will be stored in a multiset
struct Comp { bool operator ()(const Pair& p1, const Pair& p2) const { return (p1.second == p2.second) ? p1.first<p2.first : p1.second>p2.second; } };
using Sorter = std::multiset<Pair, Comp>;

// Namespace alias
namespace fs = std::filesystem;
// ------------------------------------------------------------


int main() {

    // Open the source file and check, if it could be opened
    if (std::ifstream inFileStream{ "r:\\file.txt" }; inFileStream) {

        // Here we will count the extensions of the file names
        Counter counter{};

        // Read all lines from file
        for (std::string line{}; std::getline(inFileStream, line); )

            // Get extensions and count them
            counter[ fs::path{ line }.extension().string() ]  ;

        Sorter sorter(counter.begin(), counter.end());

        // Show result to the user. 
        for (const auto& [extension, count] : sorter) std::cout << extension << " --> " << count << '\n';

    } // File will be closed here
    else {
        // file could not be opened
        std::cerr << "\n\n*** Error: Input file could not be opened\n\n";
    }
}

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/350339.html

標籤：C 文件流

上一篇：Linux：僅允許2個用戶（共3個）訪問特定檔案夾

下一篇：C從文本檔案中讀取數字