c++字符編碼轉換
簡述
字符編碼一直是軟體開發中很麻煩的問題,當前專案開發普遍使用的字符集是utf-8,而windows系統則默認是gbk,linux默認編碼則是utf-8,所以想要開發一個在windows正常運行的軟體,就需要考慮字符集的問題,
c++11新增了很多本地化的功能,包括字符編碼轉換等,主要使用wstring_convert和codecvt相結合進行轉換,下面介紹具體的方法供大家學習(復制粘貼 ??),
windows:gbk編碼,std::wstring = std::u16string,wchar_t = char16_t (utf-16編碼)
linux:utf-8編碼,std::wstring = std::u32string,wchar_t = char32_t (utf-32編碼)
編碼轉換
-
依賴的頭檔案:
#include <codecvt> #include <locale> -
轉換方法:
coding.h
#ifndef TE_TEST_CODING_H #define TE_TEST_CODING_H #include <string> namespace coding { #ifdef _WIN32 //GBK locale name in windows inline constexpr const char * GBK_LOCALE_NAME = ".936"; #else inline constexpr const char * GBK_LOCALE_NAME = "zh_CN.GBK"; #endif /** * utf-8 --> wchar * @param _utf8 要求std::string的編碼是utf-8 * @return 寬字串 */ std::wstring utf8_to_wstr(const std::string& _utf8); /** * wchar --> utf-8 * @param _wstr 寬字串 * @return 轉化為utf-8 編碼的字串 */ std::string wstr_to_utf8(const std::wstring& _wstr); /** * utf-8 --> gbk * @param _utf8 utf-8 * @return gbk */ std::string utf8_to_gbk(const std::string& _utf8); /** * gbk --> utf-8 * @param _gbk gbk * @return utf-8 */ std::string gbk_to_utf8(const std::string& _gbk); /** * gbk --> std::wstring * @param _gbk gbk * @return 寬字串 */ std::wstring gbk_to_wstr(const std::string& _gbk); /** * std::wstring --> gbk * @param _wstr 寬字串 * @return gbk */ std::string wstr_to_gbk(const std::wstring& _wstr); } #endif //TE_TEST_CODING_Hcoding.cpp
#include "coding.h" #include <codecvt> #include <locale> // 包裝 wstring/wbuffer_convert 所用的系結本地環境平面的工具 template<class Facet> struct deletable_facet : Facet { template<class ...Args> explicit deletable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {} ~deletable_facet() override = default; }; std::wstring coding::utf8_to_wstr(const std::string &_utf8) { std::wstring_convert<std::codecvt_utf8<wchar_t>> converter; return converter.from_bytes(_utf8); } std::string coding::wstr_to_utf8(const std::wstring &_wstr) { std::wstring_convert<std::codecvt_utf8<wchar_t>> convert; return convert.to_bytes(_wstr); } std::string coding::utf8_to_gbk(const std::string &_utf8) { std::wstring tmp_wstr = utf8_to_wstr(_utf8); return wstr_to_gbk(tmp_wstr); } std::string coding::gbk_to_utf8(const std::string &_gbk) { std::wstring tmp_wstr = gbk_to_wstr(_gbk); return wstr_to_utf8(tmp_wstr); } std::wstring coding::gbk_to_wstr(const std::string &_gbk) { using codecvt = deletable_facet<std::codecvt_byname<wchar_t, char, mbstate_t>>; std::wstring_convert<codecvt> convert(new codecvt(GBK_LOCALE_NAME)); return convert.from_bytes(_gbk); } std::string coding::wstr_to_gbk(const std::wstring& _wstr) { using codecvt = deletable_facet<std::codecvt_byname<wchar_t, char, mbstate_t>>; std::wstring_convert<codecvt> convert(new codecvt(GBK_LOCALE_NAME)); return convert.to_bytes(_wstr); }
補充說明
結構體deletable_facet的作用是公有化codecvt_byname模板類的解構式,該類的解構式默認為 protected,部分編譯環境實作允許析構析構方法為保護的物件,但部分(如GUN)要求自定義類,繼承 Facet 并有 public 的析構方法,否則會出現以下問題:
In file included from /usr/include/c++/6.2.1/bits/locale_conv.h:41:0,
from /usr/include/c++/6.2.1/locale:43,
from main.cpp:3: /usr/include/c++/6.2.1/bits/unique_ptr.h: In instantiation of ‘void std::default_delete<_Tp>::operator()(_Tp*) const [with _Tp = std::codecvt<wchar_t, char, __mbstate_t>]’:
/usr/include/c++/6.2.1/bits/unique_ptr.h:236:17: required from ‘std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = std::codecvt<wchar_t, char, __mbstate_t>; _Dp = std::default_delete<std::codecvt<wchar_t, char, __mbstate_t> >]’
/usr/include/c++/6.2.1/bits/locale_conv.h:218:7: required from here
/usr/include/c++/6.2.1/bits/unique_ptr.h:76:2: error: ‘virtual std::codecvt<wchar_t, char, __mbstate_t>::~codecvt()’ is protected within this context
delete __ptr;
^~~~~~
In file included from /usr/include/c++/6.2.1/codecvt:41:0,
from main.cpp:1:
/usr/include/c++/6.2.1/bits/codecvt.h:426:7: note: declared protected here
~codecvt();
^
詳情可見官方檔案與說明,
本文參考了博客并在其基礎上進行補充完善,修復了部分問題,
本文來自博客園,作者:_哲思,轉載請注明原文鏈接:https://www.cnblogs.com/zhe-si/p/16011000.html
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/444300.html
標籤:C++
上一篇:套接字連接每分鐘關閉一次
