當使用 cURL 從 Internet 上獲取字符流時,資料流何時從多位元組資料型別轉換為單位元組字符陣列?
我在這里寫了一個程式,它似乎在回呼函式中使用 ASCII 來作業。
但是,我撰寫了另一個使用 UTF-8 和 wchar_t 資料型別的程式,它似乎也可以作業。即使 wchar_t 型別在我的機器上是 4 個位元組,而 char 是 1 個位元組,資料流似乎也沒有區分這兩種資料型別。
我猜這個程式有某種型別的轉換是透明的,但我不知道(我認為在 UTF-8 ASCII 字符仍然占用 1 個位元組的記憶體,但是當程式使用 wchar_t 資料型別時,系統用零填充常規 ascii 字符,將它們轉換為 4 個位元組,但這不是程式員實作的......)。
#include "multicurl.h"
#define MAX_WAIT_MSECS 5*1000 /* Wait max. 5 seconds */
/* The largest difference between the ASCII and UTF-8 variations of this program is that this callback function is now dealing with an array of wchar_t blocks rather than chars which are always 1 byte long, but it still works the same basic way. */
static size_t write_callback(wchar_t *ptr, size_t size, size_t nmemb, void *userdata){// cURL callback function [read in datastream to memory]
// This prototype is provided by cURL, with an argument at the end for our data structure.
// This function is repeatedly called by cURL until there is no more data in the data stream; *ptr [it is assumed cURL handles memory management for this pointer].
size_t realsize = nmemb * size;// The number of bytes in the datastream [there is no NULL char]
MemType *mem = (MemType *)userdata;
wchar_t *tmp = realloc(mem->memory, mem->size realsize sizeof(wchar_t) );// We add 1 wchar_t unit for the NULL character.
if (tmp == NULL){
printf("Not Enough Memory, realloc returned NULL.\n");
exit(EXIT_FAILURE);
}
mem->memory = tmp;
memcpy(&(mem->memory[ mem->size / sizeof(wchar_t) ]), ptr, realsize );// Starting at the last element copy in datastream [it overwrites the last element]
mem->size = realsize;// The actual size, in bytes, is realsize ( 1 * sizeof(wchar_t) ), however realsize gives us the location of the last element.
mem->memory[ mem->size / sizeof(wchar_t) ] = 0;// The datastream doesn't include a NULL character, so we zeroize the last element.
// We overwrite the NULL character {the zeroized element} on the next callback iteration, if any.
return (size * nmemb);// cURL crosschecks the datastream with this return value.
}
void *SetUpCurlHandle(CURLM * mh, wchar_t *utf8_url, MemType *output){
// Take in a multi handle pointer address, a URL and a struct pointer address, set up the curl easy handle and add it to the multi handle.
/* Convert our UTF-8 URL string to a regular ASCII URL string. */
char* url = (char*) malloc ( wcslen( utf8_url ) 1 );
wcstombs(url, utf8_url, wcslen( utf8_url ) * sizeof( wchar_t ) );
CURL *hnd = NULL;
output->memory = malloc( sizeof( wchar_t ) ); // Initialize the memory component of the structure.
output->size = 0; // Initialize the size component of the structure.
// Initialize the cURL handle.
hnd = curl_easy_init();
if(hnd){
// Setup the cURL options.
curl_easy_setopt(hnd, CURLOPT_BUFFERSIZE, 102400L);
curl_easy_setopt(hnd, CURLOPT_URL, url);// Set the request URL
curl_easy_setopt(hnd, CURLOPT_NOPROGRESS, 1L);
curl_easy_setopt(hnd, CURLOPT_USERAGENT, "curl/7.80.0");
curl_easy_setopt(hnd, CURLOPT_MAXREDIRS, 50L);
curl_easy_setopt(hnd, CURLOPT_HTTP_VERSION, (long)CURL_HTTP_VERSION_2TLS);
curl_easy_setopt(hnd, CURLOPT_FTP_SKIP_PASV_IP, 1L);
curl_easy_setopt(hnd, CURLOPT_TCP_KEEPALIVE, 1L);
curl_easy_setopt(hnd, CURLOPT_WRITEFUNCTION, write_callback);// The callback function to write data to.
curl_easy_setopt(hnd, CURLOPT_WRITEDATA, (void *)output);// Send the address of the data struct to callback func.
//curl_easy_setopt(hnd, CURLOPT_VERBOSE, 1);
curl_multi_add_handle(mh, hnd);
}else{
output->memory[0] = '\0';
}
return NULL;// The output struct was passed by reference no need to return anything.
}
CURLM *SetUpMultiCurlHandle(){
curl_global_init(CURL_GLOBAL_ALL);
CURLM * mh = curl_multi_init();
return mh;
}
void *PerformMultiCurl(CURLM * mh)
/*Take in a preset multi handle, request data from the remote server asynchronously {it's assumed cURL is using threads transparent to the calling program}.
Remove the handles from memory.*/
{
CURLMsg *msg=NULL;
CURL *hnd = NULL;
CURLcode return_code = 0;
int still_running = 0;
int msgs_left = 0;
curl_multi_perform(mh, &still_running);// Perform the requests.
do {
int numfds=0;
int res = curl_multi_wait(mh, NULL, 0, MAX_WAIT_MSECS, &numfds);
if(res != CURLM_OK) {
fprintf(stderr, "error: curl_multi_wait() returned %d\n", res);
return NULL;
}
curl_multi_perform(mh, &still_running);
/* Without this loop the program will proceed to the next statement, most likely before the messages are retrieved from the server.
The easy handle requests are conducted asynchronously, but one multi handle request is obviously conducted sequentially (can use pthreads to make asynchronous multi requests).*/
} while(still_running);
/* This portion of the code will clean up and remove the handles from memory, you could change this to make them more persistent */
while ((msg = curl_multi_info_read(mh, &msgs_left))) {
if (msg->msg == CURLMSG_DONE) {
hnd = msg->easy_handle;
return_code = msg->data.result;
if(return_code!=CURLE_OK) {
fprintf(stderr, "CURL error code: %d\n", msg->data.result);
continue;
}
curl_multi_remove_handle(mh, hnd);
curl_easy_cleanup(hnd);
hnd = NULL;
}
else {
fprintf(stderr, "error: after curl_multi_info_read(), CURLMsg=%d\n", msg->msg);
}
}
curl_multi_cleanup(mh);
curl_global_cleanup();
return NULL;
}
可以在此處找到該程式的完整 UTF-8 變體
uj5u.com熱心網友回復:
如您所料,它不起作用。libcurl 無法知道函式wchar_t*何時期望 achar*
如果您檢查MyOutputStruct1.memory[0],您會發現它不包含應有的內容。例如,在請求https://stackoverflow.com 時,它包含 0x4f44213c。這顯然是錯誤的,因為這遠遠超出了有效代碼點的范圍。這實際上是將前四個代碼點 ( <!DO) 擠成一個wchar_t(按 LE 順序)。
由于第二個錯誤,它似乎可以作業。列印寬字串時,需要使用%ls,而不是%s.
wprintf(L"Output:\n%s\n", MyOutputStruct1.memory);
應該
printf("Output:\n%ls\n", MyOutputStruct1.memory);
// -or-
wprintf(L"Output:\n%ls\n", MyOutputStruct1.memory);
基本上,代碼需要一個char*整體。指標的型別是wchar_t*,但它被用作char*無處不在。因此,這兩個錯誤大多在相關程式中“取消”。(我沒看,但我預計輸入的長度不能被 整除sizeof(wchar_t)。)如果指標實際上被用作 a wchar_t*(例如,如果它的元素已經過檢查,或者它是否已傳遞給 aw函式),問題就很明顯了。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/407544.html
標籤:
