
表的設計
- 列蔟:推薦1-2個,能使用1個就不是使用2個
- 版本的設計:如果我們的專案不需要保存歷史的版本,直接按照默認配置VERSIONS=1就OK,如果專案中需要保存歷史的變更資訊,就可以將VERSIONS設定為>1,但是設定為大于1也就意味著要占用更多的空間
- 資料的壓縮:在創建表的時候,可以針對列蔟指定資料壓縮方式(GZ、SNAPPY、LZO),GZ方式是壓縮比最高的,13%左右的空間,但是它的壓縮和解壓縮速度慢一些
避免熱點的關鍵操作
-
預磁區
- 在創建表的時候,配置一些策略,讓一個table有多個region,分布在不同的HRegionServer中
- HBase會自動進行split,如果一個region過大,HBase會自動split成兩個,就是根據rowkey來橫向切分
-
rowkey設計
-
反轉:舉例:手機號碼、時間戳,可以將手機號碼反轉
-
加鹽:在rowkey前面加亂數,加了亂數之后,就會導致資料查詢不出來,因為HBase默認是沒有二級索引的
-
hash:根據rowkey中的某個部分取hash,因為hash每次計算都一樣的值,所以,我們可以用hash操作獲取資料
-
這幾種策略,因為要將資料均勻分布在集群中的每個RegionServer,所以其核心就是把rowkey打散后放入到集群節點中,所以資料不再是有序的存盤,會導致scan的效率下降
-
預磁區
-
預磁區有兩種策略
- startKey、endKey來預磁區 [10, 40, 50]

- 直接指定數量,startKey、endKey由hbase自動生成,還需要指定key的演算法

-
HBase的資料都是存放在HDFS中
- /hbase/data/命名空間/表/列蔟/StoreFiles
建表指令
# 一、命名空間
# 1. 創建一個命名空間
create_namespace 'MOMO_CHAT'
# 2. 查看命名空間
list_namespace
# 3. 洗掉之前的命名空間
drop_namespace 'MOMO_CHAT'
# 4. 查看某個具體的命名空間
describe_namespace 'MOMO_CHAT'
describe_namespace 'default'
# 5. 在命令MOMO_CHAT命名空間下創建名為:MSG的表,該表包含一個名為C1的列蔟,
# 注意:帶有命名空間的表,使用冒號將命名空間和表名連接到一起
create "MOMO_CHAT:MSG", "C1"
# 6. 指定修改某個表的列蔟,它的壓縮方式
alter "MOMO_CHAT:MSG", {NAME => "C1", COMPRESSION => "GZ"}
# 7. 洗掉之前創建的表
disable "MOMO_CHAT:MSG"
drop "MOMO_CHAT:MSG"
# 8. 在創建表時需要指定預磁區
create 'MOMO_CHAT:MSG', {NAME => "C1", COMPRESSION => "GZ"}, { NUMREGIONS => 6, SPLITALGO => 'HexStringSplit'}
可以看到已經有了六個region,

隨機生成一條訊息
- 通過ExcelReader工具類從Excel檔案中讀取資料,放入到一個Map結構中
- key:欄位名
- value:List,欄位對應的資料串列
- 創建getOneMessage方法,這個方法專門用來根據Excel讀取到的資料,隨機生成一個Msg物體物件
- 呼叫ExcelReader.randomColumn方法來隨機獲取一個列的資料
- 注意:訊息使用的是系統當前時間,時間的格式是:年-月-日 小時:分鐘:秒
public class MoMoMsgGen {
public static void main(String[] args) {
// 讀取Excel檔案中的資料
Map<String, List<String>> resultMap =
ExcelReader.readXlsx("D:\\課程研發\\51.V8.0_NoSQL_MQ\\2.HBase\\3.代碼\\momo_chat_app\\data\\測驗資料集.xlsx", "陌陌資料");
System.out.println(getOneMessage(resultMap));
}
/**
* 基于從Excel表格中讀取的資料隨機生成一個Msg物件
* @param resultMap Excel讀取的資料(Map結構)
* @return 一個Msg物件
*/
public static Msg getOneMessage(Map<String, List<String>> resultMap) {
// 1. 構建Msg物體類物件
Msg msg = new Msg();
// 將當前系統的時間設定為訊息的時間,以年月日 時分秒的形式存盤
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
// 獲取系統時間
Date now = new Date();
msg.setMsg_time(simpleDateFormat.format(now));
// 2. 呼叫ExcelReader中的randomColumn隨機生成一個列的資料
// 初始化sender_nickyname欄位,呼叫randomColumn隨機取nick_name設定資料
msg.setSender_nickyname(ExcelReader.randomColumn(resultMap, "sender_nickyname"));
msg.setSender_account(ExcelReader.randomColumn(resultMap, "sender_account"));
msg.setSender_sex(ExcelReader.randomColumn(resultMap, "sender_sex"));
msg.setSender_ip(ExcelReader.randomColumn(resultMap, "sender_ip"));
msg.setSender_os(ExcelReader.randomColumn(resultMap, "sender_os"));
msg.setSender_phone_type(ExcelReader.randomColumn(resultMap, "sender_phone_type"));
msg.setSender_network(ExcelReader.randomColumn(resultMap, "sender_network"));
msg.setSender_gps(ExcelReader.randomColumn(resultMap, "sender_gps"));
msg.setReceiver_nickyname(ExcelReader.randomColumn(resultMap, "receiver_nickyname"));
msg.setReceiver_ip(ExcelReader.randomColumn(resultMap, "receiver_ip"));
msg.setReceiver_account(ExcelReader.randomColumn(resultMap, "receiver_account"));
msg.setReceiver_os(ExcelReader.randomColumn(resultMap, "receiver_os"));
msg.setReceiver_phone_type(ExcelReader.randomColumn(resultMap, "receiver_phone_type"));
msg.setReceiver_network(ExcelReader.randomColumn(resultMap, "receiver_network"));
msg.setReceiver_gps(ExcelReader.randomColumn(resultMap, "receiver_gps"));
msg.setReceiver_sex(ExcelReader.randomColumn(resultMap, "receiver_sex"));
msg.setMsg_type(ExcelReader.randomColumn(resultMap, "msg_type"));
msg.setDistance(ExcelReader.randomColumn(resultMap, "distance"));
msg.setMessage(ExcelReader.randomColumn(resultMap, "message"));
// 3. 注意時間使用系統當前時間
return msg;
}
}
生成rowkey
- ROWKEY = MD5Hash_發件人賬號_收件人賬號_訊息時間戳
- MD5Hash.getMD5AsHex生成MD5值,為了縮短rowkey,取前8位
// 根據Msg物體物件生成rowkey
public static byte[] getRowkey(Msg msg) throws ParseException {
//
// ROWKEY = MD5Hash_發件人賬號_收件人賬號_訊息時間戳
//
// 使用StringBuilder將發件人賬號、收件人賬號、訊息時間戳使用下劃線(_)拼接起來
StringBuilder builder = new StringBuilder();
builder.append(msg.getSender_account());
builder.append("_");
builder.append(msg.getReceiver_account());
builder.append("_");
// 獲取訊息的時間戳
String msgDateTime = msg.getMsg_time();
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Date msgDate = simpleDateFormat.parse(msgDateTime);
long timestamp = msgDate.getTime();
builder.append(timestamp);
// 使用Bytes.toBytes將拼接出來的字串轉換為byte[]陣列
// 使用MD5Hash.getMD5AsHex生成MD5值,并取其前8位
String md5AsHex = MD5Hash.getMD5AsHex(builder.toString().getBytes());
String md5Hex8bit = md5AsHex.substring(0, 8);
// 再將MD5值和之前拼接好的發件人賬號、收件人賬號、訊息時間戳,再使用下劃線拼接,轉換為Bytes陣列
String rowkeyString = md5Hex8bit + "_" + builder.toString();
System.out.println(rowkeyString);
return Bytes.toBytes(rowkeyString);
}
將隨機生成的資料推入到HBase
public static void main(String[] args) throws ParseException, IOException {
// 讀取Excel檔案中的資料
Map<String, List<String>> resultMap =
ExcelReader.readXlsx("D:\\課程研發\\51.V8.0_NoSQL_MQ\\2.HBase\\3.代碼\\momo_chat_app\\data\\測驗資料集.xlsx", "陌陌資料");
// 生成資料到HBase中
// 1. 獲取Hbase連接
Configuration config = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(config);
// 2. 獲取HBase表MOMO_CHAT:MSG
Table table = connection.getTable(TableName.valueOf("MOMO_CHAT:MSG"));
int i = 0;
int MAX = 100000;
while (i < MAX) {
Msg msg = getOneMessage(resultMap);
// 3. 初始化操作Hbase所需的變數(列蔟、列名)
byte[] rowkey = getRowkey(msg);
String cf = "C1";
String colMsg_time = "msg_time";
String colSender_nickyname = "sender_nickyname";
String colSender_account = "sender_account";
String colSender_sex = "sender_sex";
String colSender_ip = "sender_ip";
String colSender_os = "sender_os";
String colSender_phone_type = "sender_phone_type";
String colSender_network = "sender_network";
String colSender_gps = "sender_gps";
String colReceiver_nickyname = "receiver_nickyname";
String colReceiver_ip = "receiver_ip";
String colReceiver_account = "receiver_account";
String colReceiver_os = "receiver_os";
String colReceiver_phone_type = "receiver_phone_type";
String colReceiver_network = "receiver_network";
String colReceiver_gps = "receiver_gps";
String colReceiver_sex = "receiver_sex";
String colMsg_type = "msg_type";
String colDistance = "distance";
String colMessage = "message";
// 4. 構建put請求
Put put = new Put(rowkey);
// 5. 挨個添加陌陌訊息的所有列
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colMsg_time), Bytes.toBytes(msg.getMsg_time()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_nickyname), Bytes.toBytes(msg.getSender_nickyname()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_account), Bytes.toBytes(msg.getSender_account()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_sex), Bytes.toBytes(msg.getSender_sex()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_ip), Bytes.toBytes(msg.getSender_ip()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_os), Bytes.toBytes(msg.getSender_os()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_phone_type), Bytes.toBytes(msg.getSender_phone_type()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_network), Bytes.toBytes(msg.getSender_network()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_gps), Bytes.toBytes(msg.getSender_gps()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_nickyname), Bytes.toBytes(msg.getReceiver_nickyname()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_ip), Bytes.toBytes(msg.getReceiver_ip()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_account), Bytes.toBytes(msg.getReceiver_account()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_os), Bytes.toBytes(msg.getReceiver_os()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_phone_type), Bytes.toBytes(msg.getReceiver_phone_type()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_network), Bytes.toBytes(msg.getReceiver_network()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_gps), Bytes.toBytes(msg.getReceiver_gps()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_sex), Bytes.toBytes(msg.getReceiver_sex()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colMsg_type), Bytes.toBytes(msg.getMsg_type()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colDistance), Bytes.toBytes(msg.getDistance()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colMessage), Bytes.toBytes(msg.getMessage()));
// 6. 發起put請求
table.put(put);
// 顯示進度
++i;
System.out.println(i + " / " + MAX);
}
table.close();
connection.close();
}
這里寫入資料的數量為10w,可以看到這個請求是均勻分布在region中的,

實作getMessage資料服務介面
使用scan + filter實作的
- 構建scan物件
- 構建4個filter(開始日期查詢、結束日期查詢、發件人、收件人)
- 構建一個Msg物件串列
public List<Msg> getMessage(String date, String sender, String receiver) throws Exception {
// 1. 構建scan物件
Scan scan = new Scan();
// 構建兩個帶時分秒的日期字串
String startDateStr = date + " 00:00:00";
String endDateStr = date + " 23:59:59";
// 2. 構建用于查詢時間的范圍,例如:2020-10-05 00:00:00 – 2020-10-05 23:59:59
// 3. 構建查詢日期的兩個Filter,大于等于、小于等于,此處過濾單個列使用SingleColumnValueFilter即可,
SingleColumnValueFilter startDateFilter = new SingleColumnValueFilter(Bytes.toBytes("C1")
, Bytes.toBytes("msg_time")
, CompareFilter.CompareOp.GREATER_OR_EQUAL
, new BinaryComparator(Bytes.toBytes(startDateStr)));
SingleColumnValueFilter endDateFilter = new SingleColumnValueFilter(Bytes.toBytes("C1")
, Bytes.toBytes("msg_time")
, CompareFilter.CompareOp.LESS_OR_EQUAL
, new BinaryComparator(Bytes.toBytes(endDateStr)));
// 4. 構建發件人Filter
SingleColumnValueFilter senderFilter = new SingleColumnValueFilter(Bytes.toBytes("C1")
, Bytes.toBytes("sender_account")
, CompareFilter.CompareOp.EQUAL
, new BinaryComparator(Bytes.toBytes(sender)));
// 5. 構建收件人Filter
SingleColumnValueFilter receiverFilter = new SingleColumnValueFilter(Bytes.toBytes("C1")
, Bytes.toBytes("receiver_account")
, CompareFilter.CompareOp.EQUAL
, new BinaryComparator(Bytes.toBytes(receiver)));
// 6. 使用FilterList組合所有Filter
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL
, startDateFilter
, endDateFilter
, senderFilter
, receiverFilter);
// 7. 設定scan物件filter
scan.setFilter(filterList);
// 8. 獲取HTable物件,并呼叫getScanner執行
Table table = connection.getTable(TableName.valueOf("MOMO_CHAT:MSG"));
ResultScanner resultScanner = table.getScanner(scan);
// 9. 獲取迭代器,迭代每一行,同時迭代每一個單元格
Iterator<Result> iterator = resultScanner.iterator();
// 創建一個串列,用于保存查詢出來的訊息
ArrayList<Msg> msgList = new ArrayList<>();
while (iterator.hasNext()) {
// 每一行查詢出來的資料都是一個Msg物件
Result result = iterator.next();
Msg msg = new Msg();
// 獲取rowkey
String rowkey = Bytes.toString(result.getRow());
// 單元格串列
List<Cell> cellList = result.listCells();
for (Cell cell : cellList) {
// 根據當前的cell單元格的列名來判斷,設定對應的欄位
String columnName = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
if (columnName.equals("msg_time")) {
msg.setMsg_time(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_nickyname")) {
msg.setSender_nickyname(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_account")) {
msg.setSender_account(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_sex")) {
msg.setSender_sex(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_ip")) {
msg.setSender_ip(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_os")) {
msg.setSender_os(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_phone_type")) {
msg.setSender_phone_type(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_network")) {
msg.setSender_network(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_gps")) {
msg.setSender_gps(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_nickyname")) {
msg.setReceiver_nickyname(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_ip")) {
msg.setReceiver_ip(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_account")) {
msg.setReceiver_account(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_os")) {
msg.setReceiver_os(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_phone_type")) {
msg.setReceiver_phone_type(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_network")) {
msg.setReceiver_network(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_gps")) {
msg.setReceiver_gps(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_sex")) {
msg.setReceiver_sex(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("msg_type")) {
msg.setMsg_type(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("distance")) {
msg.setDistance(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("message")) {
msg.setMessage(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
}
msgList.add(msg);
}
// 關閉資源
resultScanner.close();
table.close();
return msgList;
}
先執行下這個shell看下結果
scan 'MOMO_CHAT:MSG' , {COLUMNS => ['C1:sender_account', 'C1:receiver_account', 'C1:msg_time'], FILTER => "SingleColumnValueFilter('C1', 'sender_account', =, 'binary:13514684105') AND SingleColumnValueFilter('C1', 'receiver_account', = , 'binary:13647128512')"}


最后附上代碼地址:https://github.com/fafeidou/momo_chat_app
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/423219.html
標籤:其他
上一篇:Atlas部署(待完成)
下一篇:Hadoop之偽分布式配置安裝
