如何在c中修復printf使用UTF8和s管理字串的方式？-有解無憂

我正在嘗試使用 c 和 printf 列印一個帶有非 ASCII 字符的字串，這是程式：

include <stdio.h>
void main(void){
  printf("<0123456789> BOTH %s\n","<%5s>");
  printf("<%5s>\n"," w ");
  printf("<%5s>\n"," δ ");
}

我得到

<0123456789> BOTH <%5s>
<   w >
<  δ >

所以字串的大小有問題。如何獲得相同大小的兩個字串？

uj5u.com熱心網友回復：

您的小寫 delta 字符不是 8 位值。它由兩個位元組表示，因此使用寬度說明符 5 列印它會導致它僅在 4 個可見空間中列印。您可以在其他希臘字母中看到同樣的問題。

您可以通過列印strlen(" δ ")which prints的結果來進一步了解這一點4。

uj5u.com熱心網友回復：

要使用 unicode，您應該使用fwprintf而不是printf.

另見7.24.2 格式化寬字符輸入/輸出函式。

uj5u.com熱心網友回復：

好的，我找到了一種計算字串列印字符數的方法。還有更簡單的......

#define ONEMASK ((size_t)(-1) / 0xFF)
#include <stdint.h>
static size_t
cp_strlen_utf8(const char * _s){
//http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html

const char * s;
size_t count = 0;
size_t u;
unsigned char b;

/* Handle any initial misaligned bytes. */
for (s = _s; (uintptr_t)(s) & (sizeof(size_t) - 1); s  ) {
    b = *s;

    /* Exit if we hit a zero byte. */
    if (b == '\0')
        goto done;

    /* Is this byte NOT the first byte of a character? */
    count  = (b >> 7) & ((~b) >> 6);
}

/* Handle complete blocks. */
for (; ; s  = sizeof(size_t)) {
    /* Prefetch 256 bytes ahead. */
    __builtin_prefetch(&s[256], 0, 0);

    /* Grab 4 or 8 bytes of UTF-8 data. */
    u = *(size_t *)(s);

    /* Exit the loop if there are any zero bytes. */
    if ((u - ONEMASK) & (~u) & (ONEMASK * 0x80))
        break;

    /* Count bytes which are NOT the first byte of a character. */
    u = ((u & (ONEMASK * 0x80)) >> 7) & ((~u) >> 6);
    count  = (u * ONEMASK) >> ((sizeof(size_t) - 1) * 8);
}

/* Take care of any left-over bytes. */
for (; ; s  ) {
    b = *s;

    /* Exit if we hit a zero byte. */
    if (b == '\0')
        break;

    /* Is this byte NOT the first byte of a character? */
    count  = (b >> 7) & ((~b) >> 6);
}

done:
    return ((s - _s) - count);
}

使用這些函式，我可以列印互補的空格數以對齊下一個表格單元格。

函式 printf() 不計算正確列印的字符。也許 printf() 必須修復。

我不知道這些是通用解決方案，還是僅適用于我現在使用的字串。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/337003.html

標籤：C 细绳 utf-8 打印输出尺寸

上一篇：洗掉和更新文本檔案中用于NER訓練資料的字串和物體索引

下一篇：為什么即使檔案中存在單詞，我的代碼也回傳0

如何在c中修復printf使用UTF8和s管理字串的方式？

如何在c中修復printf使用UTF8和s管理字串的方式？