Winsocksendto在網路配接器被禁用或物理斷開后回傳廣播地址的錯誤10049(WSAEADDRNOTAVAIL)-有解無憂

這個賞金已經結束。此問題的答案有資格獲得 300聲望賞金。賞金寬限期在5 小時后結束。紙漿用戶想引起更多關注這個問題：

解釋為什么會發生這種情況（如果這是 Windows 錯誤或預期行為，如果是，為什么）以及 - 如果可能的話 - 一種檢測和處理網路介面斷開連接而沒有歧義的解決方案（不僅僅是基于“應該”的觀察者行為的啟發式方法大部分時間都在作業”——我自己可以做到）。

我正在開發一個 p2p 應用程式，為了簡化測驗，我目前正在使用 udp 廣播在我的本地網路中發現對等點。每個對等點將一個 udp 套接字系結到每個本地網路介面（通過發現GetAdaptersInfo）的 ip 地址的埠 29292，并且每個套接字定期向其網路介面/本地地址的廣播地址發送一個資料包。套接字設定為允許埠重用（通過setsockopt SO_REUSEADDR），這使我能夠在同一臺本地機器上運行多個對等點而不會發生任何沖突。在這種情況下，整個網路上只有一個對等點。

這一切都作業得很好（在 1 臺機器上使用 2 個對等點進行測驗，在 2 臺機器上使用 2 個對等點進行測驗），直到網路介面斷開連接。在 Windows 對話框中停用我的 wifi 或 USB-to-LAN 配接器的網路配接器，或僅插入配接器的 USB 電纜時，下一次呼叫sendto將失敗并回傳代碼10049。不管另一個配接器是否仍然連接，或者在開始時，它都會失敗。唯一不會讓它失敗的是通過任務欄的花哨的win10對話框停用wifi，但這并不奇怪，因為這不會停用或洗掉配接器本身。

我最初認為這是有道理的，因為當網卡消失時，系統應該如何路由資料包。但是：資料包無法到達其目標的事實與地址本身無效（這就是錯誤的含義）完全無關，所以我懷疑我在這里遺漏了一些東西。我一直在尋找任何可以用來檢測這種情況并將其與簡單的嘗試區分開來的資訊sendto INADDR_ANY，但我找不到任何東西。我開始記錄我懷疑可能已經改變的每一點資訊，但在成功sendto和崩潰的資訊（通過檢索getsockopt）上都是一樣的：

250   16.24746[886] [debug|debug] local address: 192.168.178.35
251   16.24812[886] [debug|debug] no remote address
252   16.25333[886] [debug|debug] type: SOCK_DGRAM
253   16.25457[886] [debug|debug] protocol: IPPROTO_UDP
254   16.25673[886] [debug|debug] broadcast: 1, dontroute: 0, max_msg_size: 65507, rcv_buffer: 65536, rcv_timeout: 0, reuse_addr: 1, snd_buffer: 65536, sdn_timeout: 0
255   16.25806[886] [debug|debug] Last WSA error on socket was WSA Error Code 0: The operation completed successfully.

256   16.25916[886] [debug|debug] target address windows formatted: 192.168.178.255
257   16.25976[886] [debug|debug] target address 192.168.178.255:29292
258   16.26138[886] [debug|assert] ASSERT FAILED at D:\Workspaces\spaced\source\platform\win32_platform.cpp:4141: sendto failed with (unhandled) WSA Error Code 10049: The requested address is not valid in its context.

被移除的網卡是這個：

   1.07254[0] [platform|info] Discovered Network Interface "Realtek USB GbE Family Controller" with IP 192.168.178.35 and Subnet 255.255.255.0

這是執行發送的代碼（dlog_socket_information_and_last_wsaerror生成使用收集的所有輸出getsockopt）：

void send_slice_over_udp_socket(Socket_Handle handle, Slice<d_byte> buffer, u32 remote_ip, u16 remote_port){
    PROFILE_FUNCTION();

    auto socket = (UDP_Socket*) sockets[handle.handle];
    ASSERT_VALID_UDP_SOCKET(socket);
    dlog_socket_information_and_last_wsaerror(socket);

    if(socket->is_dummy)
        return;

    if(buffer.size == 0)
        return;

    DASSERT(socket->state == Socket_State::created);

    u64 bytes_left = buffer.size;

    sockaddr_in target_socket_address = create_socket_address(remote_ip, remote_port);

    #pragma warning(push)
    #pragma warning(disable: 4996)
    dlog("target address windows formatted: %s", inet_ntoa(target_socket_address.sin_addr));
    #pragma warning(pop)
    unsigned char* parts = (unsigned char*)&remote_ip;
    dlog("target address %hhu.%hhu.%hhu.%hhu:%hu", parts[3], parts[2], parts[1], parts[0], remote_port);

    int sent_bytes = sendto(socket->handle, (char*) buffer.data, bytes_left > (u64) INT32_MAX ? INT32_MAX : (int) bytes_left, 0, (sockaddr*)&target_socket_address, sizeof(target_socket_address));

    if(sent_bytes == SOCKET_ERROR){
        #define LOG_WARNING(message) log_nonreproducible(message, Category::platform_network, Severity::warning, socket->handle); return;
        switch(WSAGetLastError()){
            //@TODO handle all (more? I guess many should just be asserted since they should never happen) cases
            case WSAEHOSTUNREACH: LOG_WARNING("socket %lld, send failed: The remote host can't be reached at this time.");
            case WSAECONNRESET: LOG_WARNING("socket %lld, send failed: Multiple UDP packet deliveries failed. According to documentation we should close the socket. Not sure if this makes sense, this is a UDP port after all. Closing the socket wont change anything, right?");
            case WSAENETUNREACH: LOG_WARNING("socket %lld, send failed: the network cannot be reached from this host at this time.");
            case WSAETIMEDOUT: LOG_WARNING("socket %lld, send failed: The connection has been dropped, because of a network failure or because the system on the other end went down without notice.");

            case WSAEADDRNOTAVAIL:

            case WSAENETRESET:
            case WSAEACCES:
            case WSAEWOULDBLOCK: //can this even happen on a udp port? I expect this to be fire-and-forget-style.
            case WSAEMSGSIZE:
            case WSANOTINITIALISED:
            case WSAENETDOWN:
            case WSAEINVAL:
            case WSAEINTR:
            case WSAEINPROGRESS:
            case WSAEFAULT:
            case WSAENOBUFS:
            case WSAENOTCONN:
            case WSAENOTSOCK:
            case WSAEOPNOTSUPP:
            case WSAESHUTDOWN:
            case WSAECONNABORTED:
            case WSAEAFNOSUPPORT:
            case WSAEDESTADDRREQ:
                ASSERT(false, tprint_last_wsa_error_as_formatted_message("sendto failed with (unhandled) ")); break;
            default: ASSERT(false, tprint_last_wsa_error_as_formatted_message("sendto failed with (undocumented) ")); //The switch case above should have been exhaustive. This is a bug. We either forgot a case, or maybe the docs were lying? (That happened to me on android. Fun times. Well. Not really.)
        }
        #undef LOG_WARNING
    }

    DASSERT(sent_bytes >= 0);
    total_bytes_sent  = (u64) sent_bytes;
    bytes_left -= (u64) sent_bytes;
    DASSERT(bytes_left == 0);
}

The code that generates the address from ip and port looks like this:

sockaddr_in create_socket_address(u32 ip, u16 port){
    sockaddr_in address_info;
    address_info.sin_family = AF_INET;
    address_info.sin_port = htons(port);
    address_info.sin_addr.s_addr = htonl(ip);
    memset(address_info.sin_zero, 0, 8);
    return address_info;
}

The error seems to be a little flaky. It reproduces 100% of the time until it decides not to anymore. After a restart its usually back.

I am looking for a solution to handle this case correctly. I could of course just re-do the network interface discovery when the error occurs, because I "know" that I don't give any broken IPs to sendto, but that would just be a heuristic. I want to solve the actual problem.

I also don't quite understand when error 10049 is supposed to fire exactly anyway. Is it just if I pass an ipv6 address to a ipv4 socket, or send to 0.0.0.0? There is no flat out "illegal" ipv4 address after all, just ones that don't make sense from context.

If you know what I am missing here, please let me know!

uj5u.com熱心網友回復：

這是人們一直面臨的問題，人們建議閱讀微軟提供的關于以下問題的檔案。“順便說一句，我不知道它們是否是相同的問題，但回傳代碼的錯誤是相同的，這就是為什么我附上了相同的鏈接！！”

https://docs.microsoft.com/en-us/answers/questions/537493/binding-winsock-shortly-after-boot-results-in-erro.html

uj5u.com熱心網友回復：

我找到了解決方案（解決方法？）

我曾經NotifyAddrChange收到對 NIC 的更改，并認為由于某種原因在我禁用 NIC 時它沒有觸發。事實證明確實如此，我只是愚蠢并且過早停止除錯：代碼中存在一個錯誤，該錯誤將結果GetAdaptersInfo與最后一個已知狀態進行比較以找出差異，因此應用程式錯過了 NIC 斷開連接。現在它觀察到斷開連接，它可以在套接字嘗試在禁用的 NIC 上發送之前終止套接字，從而防止錯誤發生。但這并不是真正的解決方案，因為這里存在競爭條件（NIC 在發送之前和檢查更改之后被禁用），所以我仍然必須處理錯誤 10049。

錯誤是這樣的：

我的期望是，當我禁用 NIC 時，遍歷所有現有 NIC 會將禁用的 NIC 顯示為已禁用。事實并非如此。發生的情況是 NIC 不再位于現有 NIC 串列中，即使 Windows 對話框仍將顯示它（已禁用）。這對我來說有點令人驚訝，但我猜并不是那么不合理。

在我進行這些檢查以檢測 NIC 中的更改之前：

NIC 之前是否存在，已啟用但現在已禁用 -> 禁用通知
NIC 之前是否存在，已禁用但現在已啟用 -> 啟用通知
之前網卡是否不存在，未啟用 -> 啟用通知

修復方法是添加第四個：

是否存在不在 NIC 串列中的現有 NIC -> 禁用通知

我仍然不是 100% 高興在比賽條件下可能會出現一些模棱兩可的錯誤，但我可能會在這里收工。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/435891.html

標籤：c windows sockets udp winsock

上一篇：WinSock在不同網路上的兩臺機器之間連接出錯C

下一篇：允許facebookcookie通過selenium中的多個會話跟蹤我