mwaitx指令不阻塞-有解無憂

根據 AMD 的官方檔案，該mwaitx指令可以與該monitorx指令一起使用來監視地址范圍并查看它是否被修改。我的代碼似乎立即回傳，似乎什么也沒做。

有問題的代碼：

push rbx
push r9
mov r9, rcx ;save parameter for later
mov eax, 80000001h ;check cpuid flag for monitorx, which is required regardless of whether or not we look at it
cpuid
mov rax, r9 ;the address to monitor
xor rcx, rcx ;no extensions
xor rdx, rdx ;no hints
monitorx rax, rcx, rdx
xor rax, rax ;no hints
mov rcx, 10b ;timeout specified in rbx, no interrupts
xor rbx, rbx ;infinite timeout
mwaitx rax, rcx, rbx ;not waiting; passes immediately
pop r9
pop rbx
ret

C 代碼：

int main()
{
    void* data = VirtualAlloc(nullptr, size, MEM_COMMIT, PAGE_READONLY);
    //int x = 5;
    std::cout << data << '\n';
    monitorAddress(data);
    std::cout << data << '\n';

    VirtualFree(data, 0, MEM_RELEASE);
    return 0;
}

uj5u.com熱心網友回復：

AMD 手冊（第 3 卷，第 3.33 版）中的檔案并沒有說ECX[0]= 0 將屏蔽中斷，即使 E/RFLAGS 中的 IF=1 也是如此。如果沒有 IO 權限級別 = 0（這將允許您運行指令），用戶空間能夠做到這一點將是瘋狂的cli，而且措辭并沒有真正暗示它。

在用戶空間中，應該沒有辦法讓 CPU 卡住，使內核難以喚醒它！如果您想在要求作業系統使該執行緒進入睡眠狀態之前等待更長時間（例如，使用 Linuxfutex喚醒您備份記憶體更改），您可以在回圈中使用它，就像使用pause或其他東西的自旋等待回圈一樣。從作業系統的角度來看，它是一樣的：這個執行緒一直在占用 CPU。

很可能您的代碼實際上確實啟用了監視器并進入優化的睡眠狀態，但最多在幾毫秒后在下一個定時器中斷時喚醒。 檢查rdtsc它的睡眠時間，因為人類對螢屏輸出的感知無法將其與根本無法睡眠區分開來。

檔案實際上對 ECX 中支持的擴展標志做了什么說明：

位 0：設定時，允許中斷喚醒 MWAITX，即使 eFLAGS.IF = 0。CPUID 指令回傳的功能標志表示支持此擴展。

因此，作為擴展，您可以覆寫在 eFLAGS 中禁用中斷的事實，以確保您不會進入持續到 NMI 的睡眠狀態。否則，如果ECX[0]= 0，則檔案中的所有先前內容都適用，包括：

導致從監視器事件掛起狀態退出的事件包括：

來自另一個處理器的存盤與 MONITORX 指令建立的地址范圍相匹配。

計時器到期。

任何未屏蔽的中斷，包括 INTR、NMI、SMI、INIT。

重置。

在 MONITORX 和 MWAITX 之間發生的任何遠控制傳輸。

如果您確實確實想讓 CPU 進入睡眠狀態，并且不會被掛起的中斷終止，那么您可以使用clibefore monitorx/ mwaitx。或者，如果您處于正確的內核模式，則使用傳統的monitor/ ，而不是在 Linux系統呼叫或其他獲得 IOPL=0 且 CPL=3 （當前特權級別）的方式之后的用戶空間，因此您無法運行特權指令一般情況下，只有 IO 權限級別允許的特定權限，如 in/out / cli/sti。mwaitiopl()

很遺憾：

退出 MWAITX 后，沒有任何跡象表明處理器退出的原因或計時器是否到期。由軟體檢查是否發生了等待存盤，如果沒有，確定是否要使用新的計時器值重新建立 MONITORX 已經過去了多長時間。

順便說一句，如果您不希望計時器成為可能的退出條件，您可以離開ECX[1]= 0

位 1：設定時，EBX 包含以軟體 P0 時鐘表示的最大等待時間，與 TSC 計數的時鐘相同。設定位 1 但在 EBX 上傳入零值相當于將位 1 設定為零。計時器不會是退出條件。

And BTW, EAX=0 isn't "no hints"; EAX[7:4] is always the desired C-state level, encoded at C-state - 1. So EAX=0 hints that you want C1 state. (To hint that you want C0 state, a less deep sleep that's faster to wake from, you'd set EAX = 0xf0, because F 1 = 0.)

It's also pointless to do xor rax,rax instead of xor eax,eax; writing a 32-bit register implicitly zeroes the upper bits of the full 64-bit register, so there's no false dependency. And there's no need to tempt the assembler into wasting a REX prefix to actually encode it as written. The MWAITX implicit input registers are all 32-bit anyway, so xor ecx, ecx would also be appropriate.

Also, r9 is call-clobbered (aka volatile) in the Windows x64 calling convention; you can just use it without saving/restoring, along with r8..r11.

And no you don't have to run a cpuid every time you want to do monitorx / mwaitx! AMD's documentation says you need to check once per program / library init, but there's no way the CPU can actually enforce that. It's not going to track across context switches which user-space process has actually run a CPUID.

;; uint32_t waitx(void *p)
;; returns TSC ticks actually slept for
waitx:
    mov   rax, rcx    ;the address to monitor
    xor   ecx, ecx    ;no extensions
    xor   edx, edx    ;no hints
    monitorx rax, ecx, edx    ; or just monitorx if your assembler complains

    lfence
    rdtsc
    lfence            ; make sure we're done reading the clock before executing later instructions
    mov   r8d, eax    ; low half of start time.   We ignore the high half in EDX, assuming the time is less than 2^32, i.e. less than about 1 second.

    xor   eax, eax    ; hint = EAX[7:4] = 0 = C1 sleep state
                      ; ECX still 0: no TSC timeout in EBX, no overriding IF to 1
    mwaitx  eax, ecx

    rdtscp            ; EAX = low half of end time, EDX = high, ECX = core #
    sub     eax, r8d  ; retval = end - start
    ret

(LFENCE serializes execution on AMD CPUs if the OS has enabled the Spectre mitigation feature bit, giving lfence that guarantee like on Intel CPUs. Otherwise it's a NOP on AMD, IIRC.)

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/437338.html

標籤：C 集会 x86-64 amd 处理器

上一篇：改變宇宙飛船運動的裝配說明（DEC到MOV）

下一篇：AMDmonitorx指令的正確語法是什么？