作者：吳帆青云資料庫團隊成員

主要負責維護 MySQL 及 ClickHouse 產品開發，擅長故障分析，性能優化，

在多副本分布式 ClickHouse 集群中，通常需要使用 Distributed 表寫入或讀取資料，Distributed 表引擎自身不存盤任何資料，它能夠作為分布式表的一層透明代理，在集群內部自動開展資料的寫入、分發、查詢、路由等作業，

Distributed 表實作副本資料同步有兩種方案：

Distributed + MergeTree
Distributed + ReplicateMergeTree

| Distributed + MergeTree

在使用這種方案時 internal_replication 需要設為 false，向 Distributed 表寫入資料，Distributed 表會將資料寫入集群內的每個副本，Distributed 節點需要負責所有分片和副本的資料寫入作業，

file

1. 集群配置

<logical_consistency_cluster>
    <shard>
        <internal_replication>false</internal_replication>
        <replica>
            <host>shard1-repl1</host>
            <port>9000</port>
        </replica>
        <replica>
            <host>shard1-repl2</host>
            <port>9000</port>
        </replica>
    </shard>
</logical_consistency_cluster>

2. 資料寫入

CREATE TABLE test.t_local  on cluster logical_consistency_cluster
(
    EventDate DateTime,
    CounterID UInt32,
    UserID UInt32
) ENGINE MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate) ;

CREATE TABLE test.t_logical_Distributed on cluster logical_consistency_cluster
(
    EventDate DateTime,
    CounterID UInt32,
    UserID UInt32
)
ENGINE = Distributed(logical_consistency_cluster, test, t_local, CounterID) ;

INSERT INTO test.t_logical_Distributed VALUES ('2019-01-16 00:00:00', 1, 1),('2019-02-10 00:00:00',2, 2),('2019-03-10 00:00:00',3, 3)

3. 資料查詢

# shard1-repl1

SELECT *
FROM test.t_local

Query id: bd031554-b1e0-4fda-9ff8-1145ffae5b02

┌───────────EventDate──┬─CounterID─┬─UserID─┐
│ 2019-03-10 00:00:00 │         3 │      3 │
└─────────────────────┴───────────┴────────┘
┌───────────EventDate─┬─CounterID─┬─UserID─┐
│ 2019-02-10 00:00:00 │         2 │      2 │
└─────────────────────┴───────────┴────────┘
┌───────────EventDate─┬─CounterID─┬─UserID─┐
│ 2019-01-16 00:00:00 │         1 │      1 │
└─────────────────────┴───────────┴────────┘

3 rows in set. Elapsed: 0.004 sec. 

------------------------------------------

# shard1-repl2

SELECT *
FROM test.t_local

Query id: 636f7580-02e0-4279-bc9b-1f153c0473dc

┌───────────EventDate─┬─CounterID─┬─UserID─┐
│ 2019-01-16 00:00:00 │         1 │      1 │
└─────────────────────┴───────────┴────────┘
┌───────────EventDate─┬─CounterID─┬─UserID─┐
│ 2019-03-10 00:00:00 │         3 │      3 │
└─────────────────────┴───────────┴────────┘
┌───────────EventDate─┬─CounterID─┬─UserID─┐
│ 2019-02-10 00:00:00 │         2 │      2 │
└─────────────────────┴───────────┴────────┘

3 rows in set. Elapsed: 0.005 sec.

通過寫入測驗我們可以看到每個副本資料是一致的，

即使本地表不使用 ReplicatedMergeTree 表引擎，也能實作資料副本的功能，但每個副本的資料是通過 Distributed 表獨立寫入，檔案存盤格式不會完全一致，可以理解這種方式為邏輯一致性，

Distributed 需要同時負責分片和副本的資料寫入作業，單點寫入很有可能會成為系統性能的瓶頸，所有有接下來的第二種方案，

| Distributed + ReplicateMergeTree

在使用這種方案時 internal_replication 需要設為 true，向 Distributed 表寫入資料，Distributed 表在每個分片中選擇一個合適的副本并對其寫入資料，

分片內多個副本之間的資料復制會由 ReplicatedMergeTree 自己處理，不再由 Distributed 負責，

file

1. 組態檔

<physical_consistency_cluster>
    <shard>
        <internal_replication>true</internal_replication>
        <replica>
            <host>shard1-repl1</host>
            <port>9000</port>
        </replica>
        <replica>
            <host>shard1-repl2</host>
            <port>9000</port>
        </replica>
    </shard>
</physical_consistency_cluster>

2. 資料寫入

CREATE TABLE test.t_local on cluster  physical_consistency_cluster 
(
    EventDate DateTime,
    CounterID UInt32,
    UserID UInt32
)
ENGINE = ReplicatedMergeTree('{namespace}/test/t_local', '{replica}')
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID);



CREATE TABLE test.t_physical_Distributed on cluster physical_consistency_cluster
(
    EventDate DateTime,
    CounterID UInt32,
    UserID UInt32
)
ENGINE = Distributed(physical_consistency_cluster, test, t_local, CounterID);

INSERT INTO test.t_physical_Distributed VALUES ('2019-01-16 00:00:00', 1, 1),('2019-02-10 00:00:00',2, 2),('2019-03-10 00:00:00',3, 3)

3. 資料查詢

# shard1-repl1

SELECT *
FROM test.t_local

Query id: d2bafd2d-d0a8-41b4-8d79-ece37e8159e5

┌───────────EventDate──┬─CounterID─┬─UserID─┐
│ 2019-03-10 00:00:00 │         3 │      3 │
└─────────────────────┴───────────┴────────┘
┌───────────EventDate─┬─CounterID─┬─UserID─┐
│ 2019-02-10 00:00:00 │         2 │      2 │
└─────────────────────┴───────────┴────────┘
┌───────────EventDate─┬─CounterID─┬─UserID─┐
│ 2019-01-16 00:00:00 │         1 │      1 │
└─────────────────────┴───────────┴────────┘

3 rows in set. Elapsed: 0.004 sec. 

------------------------------------------

# shard1-repl2

SELECT *
FROM test.t_local

Query id: b5f0dc80-f73f-427e-b04e-e5b787876462

┌───────────EventDate─┬─CounterID─┬─UserID─┐
│ 2019-01-16 00:00:00 │         1 │      1 │
└─────────────────────┴───────────┴────────┘
┌───────────EventDate─┬─CounterID─┬─UserID─┐
│ 2019-03-10 00:00:00 │         3 │      3 │
└─────────────────────┴───────────┴────────┘
┌───────────EventDate─┬─CounterID─┬─UserID─┐
│ 2019-02-10 00:00:00 │         2 │      2 │
└─────────────────────┴───────────┴────────┘

3 rows in set. Elapsed: 0.005 sec.

ReplicatedMergeTree 需要依靠 ZooKeeper 的事件監聽機制以實作各個副本之間的協同，副本協同的核心流程主要有：INSERT、MERGE、MUTATION 和 ALTER 四種，

通過寫入測驗我們可以看到每個副本資料也是一致的，副本之間依靠 ZooKeeper 同步元資料，保證檔案存盤格式完全一致，可以理解這種方式是物理一致，

ReplicatedMergeTree 也是在分布式集群中最常用的一種方案，但資料同步需要依賴 ZooKeeper，在一些 DDL 比較頻繁的業務中 Zookeeper 往往會成為系統性能的瓶頸，甚至會導致服務不可用，

我們需要考慮為 ZooKeeper 減負，使用第一種方案 + 負載均衡輪詢的方式可以降低單節點寫入的壓力，

總結

internal_replication = false

使用 Distributed + MergeTree 可實作邏輯一致分布式，

資料內容完全一致，資料存盤格式不完全一致，資料同步不依賴 ZooKeeper，副本的資料可能會不一致，單點寫入壓力較大，

internal_replication = true

使用 Distributed + ReplicateMergeTree 可實作物理一致分布式，

資料內容完全一致，資料存盤格式完全一致，資料同步需要依賴 ZooKeeper，ZooKeeper 會成為系統瓶頸，

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/303231.html

標籤：MySQL

上一篇：為什么有些程式員明明很努力,但是卻回報很低,收益很小,工資始終上不去-出自中華石杉老師

下一篇：mysql創建用戶并授權

設計 | ClickHouse 分布式表實作資料同步

| Distributed + MergeTree

1. 集群配置

2. 資料寫入

3. 資料查詢

| Distributed + ReplicateMergeTree

1. 組態檔

2. 資料寫入

3. 資料查詢

總結