c++ 通過thrift2寫入HBase效率很慢,有類似java的BufferedMutator的操作嗎?
就是java的高效的緩沖介面BufferedMutator每次mutate是到記憶體里,只有達到write.buffer的值后才寫入,或自己flush()。
c++是呼叫thrift2的介面,怎樣做到高效的插入,或關閉表的自動flush??
void HBaseDriver::putMultiple(string& tableName, string& rowKey, unordered_map<string ,string>& qualifierVal, string& family){
currentPutCount +=1;
//std::vector<TPut> puts;
TPut put;
std::vector<TColumnValue> cvs;
put.__set_row(rowKey);
if(BOOL_TEST){
start = clock();
BOOL_TEST = false;
}
unordered_map<string , string>::iterator it;
try{
it = qualifierVal.begin();
while(it != qualifierVal.end()) {
TColumnValue cv;
cv.__set_family(family);
cv.__set_qualifier(it->first);
cv.__set_value(it->second);
/*if(it->first == "feature"){ //if the key is 'feature'
assert(it->second.length() == BASE64_FEATURE_LEN);
}*/
cvs.insert(cvs.end(), cv);
put.__set_durability(TDurability::SKIP_WAL);//skit write ahead log for speed up
put.__set_columnValues(cvs);
puts.insert(puts.end(), put);
it++;
}
if(this->currentPutCount % this->PUT_BATCH_SIZE == 0){
put_client_->putMultiple(tableName, puts);
finish = clock();
duration = (finish - start) / CLOCKS_PER_SEC;
printf( "===================================== %f seconds cost for hbase put.\n", duration );
//long end = CURRENT_TIME();
//LOG(string("Time Cost :"+to_string(end-start)+" ms"));
puts.clear();
}
cvs.clear();
}catch(TException tex) {
LOG(string(tex.what()));
}
}
uj5u.com熱心網友回復:
可以不通過c++寫么?打訊息佇列讓java去寫,如果時效性要求不高(分鐘級)可以用bulkload轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/32380.html
標籤:Spark
上一篇:VMware Workstation部署.ovf檔案時失敗,提示無法將資料寫入傳輸連接,遠程主機強迫關閉了一個現有的連接
