引入

現在有這樣一個問題，給出\(n\)個單詞和\(m\)個詢問，每次詢問一個單詞，回答這個單詞是否在單詞表中出現過，

好像還行，用 map<string,bool> ，幾行就完事了，

那如果n的范圍是 \(10^5\) 呢？再用 \(map\) 妥妥的超時，說不定還會超記憶體，

這時候就需要一種強大的資料結構——字典樹

基本性質

字典樹，又叫Trie樹、前綴樹，用于統計，排序和保存大量的字串，經常被搜索引擎系統用于文本詞頻統計，

基本思想： 利用字串的公共前綴來減少查詢時間，最大限度地減少無謂的字串比較，

假設所有單詞都只由小寫字母構成，由（abd,abcd,b,bcd,efg,hil）構成的字典樹如下，（百科的圖最后少了一個字母，姑且認為它是'l'吧）

可以看出字典樹具有以下特點：

用邊表示字母
具有相同前綴的單詞共用前綴節點
每個節點最多有26個子節點（在單詞只包含小寫字母的情況下）
樹的根節點是空的

基本操作

資料結構定義

用pass記錄有多少字串經過該節點，就是多少單詞以根結點到該結點的邊組成的字串為前綴，

用end記錄有多少字串以該節點結尾，就是多少單詞是以根結點到該結點的邊組成的字串，

typedef struct node{
    int pass;//有多少單詞經過該結點
    int end;//有多少單詞以該結點結尾
    struct node* next[26];
}*trieTree;

插入

向字典樹中插入字串 \(S\)

void insert(trieTree T,string s) {
    node *n = T;
    for (int i = 0; i < s.length(); i++) {
        int index = s[i] - 'a';
        if (T->next[index] == NULL) {
            node *t = new node();
            T->next[index] = t;
        }
        T = T->next[index];
        T->pass++;
    }
    T->end++;
}

查找

查找文章中有多少單詞以字串 \(S\) 為前綴，

如果要查找字串 \(s\) 在文章中出現了多少次，則回傳值改成 T->end ，

int find(trieTree T, string s) {
    node *n = T;
    for (int i = 0; i < s.length(); i++) {
        int index = s[i] - 'a';
        if (T->next[index] == NULL) {
            return NULL;
        }
        T = T->next[index];
    }
    return T->pass;
}

完整實作

#include <iostream>
#include <string>
using namespace std;

typedef struct node{
    int pass;
    int end;
    struct node* next[26];
}*trieTree;

void insert(trieTree T,string s) {
    node *n = T;
    for (int i = 0; i < s.length(); i++) {
        int index = s[i] - 'a';
        if (T->next[index] == NULL) {
            node *t = new node();
            T->next[index] = t;
        }
        T = T->next[index];
        T->pass++;
    }
    T->end++;
}
int find(trieTree T, string s) {
    node *n = T;
    for (int i = 0; i < s.length(); i++) {
        int index = s[i] - 'a';
        if (T->next[index] == NULL) {
            return NULL;
        }
        T = T->next[index];
    }
    return T->pass;
}

map實作

用 node* next[26] 會浪費很多空間，因為不可能每個結點都用掉 26 個 next

#include <iostream>
#include <map>
#include <string>
using namespace std;
typedef struct node{
public:
    int pass;
    int end;
    map<char,struct node *>m;
}* trieTree;

void insert(trieTree T,string s) {
    for (int i = 0; i < s.length(); i++) {
        if (T->m.find(s[i]) == T->m.end()) {
            node *t = new node();
            T->m.insert(make_pair(s[i], t));
        }
        T = T->m[s[i]];
        T->pass++;
    }
    T->end++;
}
int find(trieTree T, string s) {
    node *n = T;
    for (int i = 0; i < s.length(); i++) {
        if (T->m.find(s[i]) == T->m.end()) {
            return NULL;
        }
        T = T->m[s[i]];
    }
    return T->pass;
}

適用例題

前綴匹配、字串檢索、詞頻統計，這些差不多都是一類題目，具體實作有一點點不同，

比如前綴匹配，我們只需要pass就行了，用不到end；詞頻統計的話，我們又只用得到end了；如果只是字串檢索的話，那更方便了，end定義成bool變數就行了，具體用啥，怎么用要變通，

題目鏈接： http://acm.hdu.edu.cn/showproblem.php?pid=1251

這題有點小坑，用 node* next[26] 交G++會超記憶體，交C++就不會，但確實用陣列會浪費很多空間，推薦使用map實作，

#include <iostream>
#include <map>
#include <string>
using namespace std;

typedef struct node{
    int pass;
    map<char,struct node *>m;
}*trieTree;

void insert(trieTree T,string s) {
    for (int i = 0; i < s.length(); i++) {
        if (T->m.find(s[i]) == T->m.end()) {
            node *t = new node();
            T->m.insert(make_pair(s[i], t));
        }
        T = T->m[s[i]];
        T->pass++;
    }
}

int find(trieTree T, string s) {
    node *n = T;
    for (int i = 0; i < s.length(); i++) {
        if (T->m.find(s[i]) == T->m.end()) {
            return NULL;
        }
        T = T->m[s[i]];
    }
    return T->pass;
}
int main() {
    trieTree T = new node();
    string s;
    while (getline(cin,s)) {
        if (s.empty()) break;
        insert(T, s);
    }

    while (getline(cin,s)) {
        cout << find(T, s) << endl;
    }
    return 0;
}

此外，還適用于字串排序，字典樹是一棵多叉樹，只要先序遍歷整棵樹，輸出相應的字串便是按字典序排序的結果，

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/127159.html

標籤：其他

上一篇：Codeforces Round #599 (Div. 2)

下一篇：AtCoder - 2037 （dp）

資料結構篇——字典樹（trie樹）

引入