Redis原始碼剖析之AOF-有解無憂

書接上回，上回我們詳細講解了Redis的RDB機制，RDB解決了redis資料持久化一部分的問題，為什么說一部分？因為rdb是redis中某一時刻的快照，那么在這次快照后如果資料有新的變更，它是不會被持久化下來的，必須得等到下次rdb備份，然而，生成rdb是和消耗性能的，所以它就不適合很頻繁生成，Redis為了彌補這一不足提供了AOF，

AOF的全稱是AppendOnlyFile，原始碼在aof.c，其實關鍵就是Append(追加)，核心原理很簡單，就是如果執行完命令(set,del,expire……)后，發現有資料變動，就將這次操作作為一條日志記錄到aof檔案里，如果有宕機就重新加載aof檔案，重放所有的改動命令就可以恢復資料了，只要日志被完整刷到了磁盤上，資料就不會丟失，

配置

AOF的配置比較簡單，只有如下幾項，

appendonly no  # aof開關，默認關閉
appendfilename "appendonly.aof"  # 保存的檔案名，默認appendonly.aof
# 有三種刷資料的策略
appendfsync always  # always是只要有資料改動，就把資料刷到磁盤里，最安全但性能也最差
appendfsync everysec  # 每隔一秒鐘刷一次資料，資料安全性和性能折中，這也是redis默認和推薦的配置， 
appendfsync no # 不主動刷，什么時候資料刷到磁盤里取決于作業系統，在大多數Linux系統中每30秒提交一次，性能最好，但資料安全性最差，

原始碼

AOF的觸發

aof如何實作，又是怎么被觸發的，讓我們詳細看下原始碼，
server.c中的void call(client *c, int flags)是redis接受到client請求后處理請求的入口，其中會檢測Redis中的資料有沒有發生變化，如果有變化就會執行propagate()函式，

    dirty = server.dirty;
    prev_err_count = server.stat_total_error_replies;
    updateCachedTime(0);
    elapsedStart(&call_timer);
    c->cmd->proc(c); // 執行命令
    const long duration = elapsedUs(call_timer);
    c->duration = duration;
    dirty = server.dirty-dirty;
    if (dirty < 0) dirty = 0;

void propagate(struct redisCommand *cmd, int dbid, robj **argv, int argc,
               int flags)
{
    if (server.in_exec && !server.propagate_in_transaction)
        execCommandPropagateMulti(dbid);

    /* This needs to be unreachable since the dataset should be fixed during 
     * client pause, otherwise data may be lossed during a failover. */
    serverAssert(!(areClientsPaused() && !server.client_pause_in_transaction));

    if (server.aof_state != AOF_OFF && flags & PROPAGATE_AOF)
        feedAppendOnlyFile(cmd,dbid,argv,argc); // 如果aof開啟了，就會向aof傳播該命令，
    if (flags & PROPAGATE_REPL)
        replicationFeedSlaves(server.slaves,dbid,argv,argc);
}

propagate函式的作用就是將帶來資料改動的命令傳播給slave和AOF，這里我們只關注AOF，我們來詳細看下feedAppendOnlyFile()函式，

AOF資料生成

void feedAppendOnlyFile(struct redisCommand *cmd, int dictid, robj **argv, int argc) {
    sds buf = sdsempty();
    /* The DB this command was targeting is not the same as the last command
     * we appended. To issue a SELECT command is needed. */
    if (dictid != server.aof_selected_db) {
        char seldb[64];

        snprintf(seldb,sizeof(seldb),"%d",dictid);
        buf = sdscatprintf(buf,"*2\r\n$6\r\nSELECT\r\n$%lu\r\n%s\r\n",
            (unsigned long)strlen(seldb),seldb);
        server.aof_selected_db = dictid;
    }

    if (cmd->proc == expireCommand || cmd->proc == pexpireCommand ||
        cmd->proc == expireatCommand) {
        /* 把 EXPIRE/PEXPIRE/EXPIREAT 命令轉化為 PEXPIREAT 命令*/
        buf = catAppendOnlyExpireAtCommand(buf,cmd,argv[1],argv[2]);
    } else if (cmd->proc == setCommand && argc > 3) {
        robj *pxarg = NULL;
        /* When SET is used with EX/PX argument setGenericCommand propagates them with PX millisecond argument.
         * So since the command arguments are re-written there, we can rely here on the index of PX being 3. */
        if (!strcasecmp(argv[3]->ptr, "px")) {
            pxarg = argv[4];
        }
        /* 把set命令的expired所帶的相對時間轉化為絕對時間(ms). */
        if (pxarg) {
            robj *millisecond = getDecodedObject(pxarg);
            long long when = strtoll(millisecond->ptr,NULL,10);
            when += mstime();

            decrRefCount(millisecond);

            robj *newargs[5];
            newargs[0] = argv[0];
            newargs[1] = argv[1];
            newargs[2] = argv[2];
            newargs[3] = shared.pxat;
            newargs[4] = createStringObjectFromLongLong(when);
            buf = catAppendOnlyGenericCommand(buf,5,newargs);
            decrRefCount(newargs[4]);
        } else {
            buf = catAppendOnlyGenericCommand(buf,argc,argv);
        }
    } else {
        /* 其他的命令都不需要轉化 */
        buf = catAppendOnlyGenericCommand(buf,argc,argv);
    }

    /* 追加到AOF緩沖區，在重新進入事件回圈之前，資料將被重繪到磁盤上，因此在客戶端在執行前就會得到回復，*/
    if (server.aof_state == AOF_ON)
        server.aof_buf = sdscatlen(server.aof_buf,buf,sdslen(buf));

    /* 如果后臺正在進行AOF重寫，我們希望將子資料庫和當前資料庫之間的差異累積到緩沖區中，
     * 以便在子行程執行其作業時，我們可以將這些差異追加到新的只追加檔案中， */
    if (server.child_type == CHILD_TYPE_AOF)
        aofRewriteBufferAppend((unsigned char*)buf,sdslen(buf));

    sdsfree(buf);
}

這里沒有啥太復雜的邏輯，就是將命令轉化為RESP協議格式的字串(RESP協議后續會詳解)，然后追加到server.aof_buf中，這時候AOF資料還都在緩沖區中，并沒有寫入到磁盤中，那buf中的資料何時寫入磁盤呢？

刷資料

刷資料的核心代碼在flushAppendOnlyFile()中，flushAppendOnlyFile在serverCron、beforeSleep和prepareForShutdown中都有被呼叫，它的作用就是將緩沖區的資料寫到磁盤中，代碼比較長且復雜，但大部分都是例外處理和性能監控，忽略掉這部分后代碼也比較容易理解，這里就不再羅列了，詳見aof.c，

RDB vs AOF

最后，我們來對比下RDB和AOF，他們各自都有啥優缺點，該如何選用，

RDB的優勢

RDB是壓縮的后緊湊資料格式，比較很適合備份，
同樣的資料量下，rdb的檔案大小會很小，比較適合傳輸和資料恢復，
RDB對Redis的讀寫性能影響小，生成RDB的時redis主行程會fork出一個子行程，不會影響到主行程的讀寫，
RDB資料加載更快，恢復起來更快，

RDB的缺點

RDB是定期備份，如果備份前發生宕機，資料可能會丟失，
RDB的生成依賴于linux的fork，如果資料量比較大的話，很影響服務器性能，

AOF的優勢

AOF是持續性備份，可以盡可能保證資料不丟失，
Redis太大時，Redis可以在后臺自動重寫AOF，重寫是完全安全的，因為Redis繼續追加到舊檔案時，會生成一個全新的檔案，其中包含創建當前資料集所需的最少操作集，一旦準備好第二個檔案，Redis會切換這兩個檔案并開始追加到新的那一個，
AOF檔案格式簡單，易于決議，

AOF的缺點

對于同一資料集，AOF檔案大小通常大于等效的RDB檔案，
如果使用fsync策略，AOF可能比RDB慢，

RDB和AOF該如何選

如果是要求極致的性能，但對資料恢復不敏感，二者可以都不要，如果是關注性能且關注資料可用性，但不要求資料完整性，可以選用RDB，如果說非常關注資料完整性和宕機恢復的能力，可以RDB+AOF同時開啟，

參考資料

Redis persistence demystified
Redis Persistence

本文是Redis原始碼剖析系列博文，同時也有與之對應的Redis中文注釋版，有想深入學習Redis的同學，歡迎star和關注，
Redis中文注解版倉庫：https://github.com/xindoo/Redis
Redis原始碼剖析專欄：https://zxs.io/s/1h
如果覺得本文對你有用，歡迎一鍵三連，

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/273322.html

標籤：AI

上一篇：谷歌勝訴！10年Java著作權案終結

下一篇：困在“墻”里的中年程式員