When msvc::musttail attribute silently fails
11 Jan 2026
Python developers recently reported a 15% speedup when using the new MSVC musttail attribute to create a threaded interpreter. Unfortunately, I found that MSVC does not always generate a tail call when you use this attribute, which can potentially lead to stack overflow when interpreting a complex program.
Background
The musttail attribute, also supported by clang, forces the compiler to generate a tail call for a return statement that calls another function. So instead of a CALL instruction to call this function followed by a RET to return from the current function, it emits a JMP instruction to jump to the next function without creating a new stack frame. MSVC recently added support for this technique, which is useful for p-code interpreters.
P-code interpreters are traditionally coded as a giant switch statement inside a loop:
for (each instruction in pcode) {
switch (opcode) {
case ADD:
// do the addition
break;
case MUL:
// do the multiplication
break;
...
}
}
The switch statement compiles to an indirect jump. All p-code instructions go through this jump, which makes it hard for the processor to predict the next branch.
If musttail is supported by your compiler, you can create a function for each p-code instruction and link these functions via a dispatch table:
function DoAdd(INSTR * instr) {
// Do the addition
// ...
// Move to the next p-code instruction
instr++;
return dispatch_table[GetOpcode(instr)](instr); // musttail
}
function DoMul(INSTR * instr) {
// Do the multiplication
// ...
// Move to the next p-code instruction
instr++;
return dispatch_table[GetOpcode(instr)](instr); // musttail
}
dispatch_table = {
ADD: DoAdd,
MUL: DoMul,
...
};
This compiles to an indirect jump in the epilogue of each function. When running your interpreter, the branch predictor will save the branch history separately for each p-code instruction, e.g. if your p-code usually runs MUL after ADD, the processor will remember this. That's why threaded code is usually faster.
The MSVC problem
The newly added [[msvc::musttail]] attribute is ignored when the function is moderately complex (so that it saves non-volatile registers on stack) and it has multiple returns (some of them without a tail call):
void __declspec(noinline) increment(int x) {
printf("%d\n", x + 1);
}
void incrementIfPositive(int x) {
DWORD64 a = GetTickCount64();
DWORD64 b = GetTickCount64();
DWORD64 c = GetTickCount64();
if (c == 0) {
return;
}
[[msvc::musttail]]
return increment(x + (int)(b - a + c / 2));
}
This is a made-up example, but a similar early return happens in a real interpreter when handling an exception. Assembly output:
; 18 : if (a == 0) {
test rax, rax
je SHORT $LN1@incrementIfPositive
...
; 25 : [[msvc::musttail]]
; 26 : return increment(x + (int)(b - a + c / 2));
shr rax, 1
lea ecx, DWORD PTR [rbx+42]
sub eax, edi
add ecx, eax
call ?increment@@YAXH@Z
mov rbx, QWORD PTR [rsp+48]
$LN1@incrementIfPositive:
; 27 : }
add rsp, 32 ; 00000020H
pop rdi
ret 0
?incrementIfPositive@@YAXH@Z ENDP
Despite the [[msvc::musttail]] attribute, the Visual C++ compiler generates a call to the increment function. I think it’s because the function epilogue is quite long (with add rsp, 32 and pop rdi instructions), so the compiler does not want to duplicate it for the if (c == 0) case. Instead, the compiler generates a conditional jump to $LN1@incrementIfPositive when c == 0, but this prevents the musttail optimization.
Visual C++ also does not produce any compilation error (as it should do according to the documentation), but just ignores the musttail attribute and generates the call instruction instead of jmp / rex_jmp.
A workaround that I found is to create a useless handle_exception function and call it instead of returning early:
int g_x;
void __declspec(noinline) handle_exception(int x) {
// Do something to avoid optimizing out this function
g_x = x;
}
void incrementIfPositive(int x) {
DWORD64 a = GetTickCount64();
if (a == 0) {
return handle_exception(x);
}
// The rest of the code is the same
// ...
Assembly output in this case:
; 18 : if (a == 0) {
test rax, rax
jne SHORT $LN2@incrementIfPositive
; 27 : }
add rsp, 32
pop rdi
; 19 : return handle_exception(x);
jmp ?handle_exception@@YAXH@Z
...
; 24 :
; 25 : [[msvc::musttail]]
; 26 : return increment(x + (int)(b - a + c / 2));
shr rax, 1
lea ecx, DWORD PTR [rbx+42]
sub eax, edi
add ecx, eax
mov rbx, QWORD PTR [rsp+48]
; 27 : }
add rsp, 32
pop rdi
; 24 :
; 25 : [[msvc::musttail]]
; 26 : return increment(x + (int)(b - a + c / 2));
jmp ?increment@@YAXH@Z
?incrementIfPositive@@YAXH@Z ENDP
Here, a tail call is correctly generated. Unfortunately, this workaround does not help if you have more than one early return from the function. Even if you create several useless functions, it still won't work.
Conclusion
I encountered this problem when trying to apply the musttail optimization to the regular expression engine in Aba Search and Replace. I don't know if it affects the Python interpreter, but I reported the bug to Microsoft and it's under their consideration now.
Stop jumping between browser tabs and random online tools. Aba Search and Replace is your Swiss army knife for fast, safe text updates across multiple files and data conversions, with all your data staying on your computer. Built for developers, testers, and analysts.
This is a blog about Aba Search and Replace, a tool for replacing text in multiple files.
- On coding with LLMs
- When msvc::musttail attribute silently fails
- Mnemonics for hidden controls
- Unix and JavaScript timestamps
- Replace only the Nth match
- Aba 2.8 released
- Anonymizing a dataset by replacing names with counters
- Automatically add width and height to img tags
- Using zero-width assertions in regular expressions
- Aba 2.7 released
- Regular Expressions 101
- 2023 in review
- Regular expression for numbers
- Aba 2.6 released
- Search from the Windows command prompt
- Empty character class in JavaScript regexes
- Privacy Policy Update - December 2022
- Aba 2.5 released
- Our response to the war in Ukraine
- Check VAT ID with regular expressions and VIES
- Which special characters must be escaped in regular expressions?
- Aba 2.4 released
- Privacy Policy Update - April 2021
- Review of Aba Search and Replace with video
- Aba 2.2 released
- Discount on Aba Search and Replace
- Using search and replace to rename a method
- Cleaning the output of a converter
- Aba 2.1 released
- How to replace HTML tags using regular expressions
- Video trailer for Aba
- Aba 2.0 released
