Questions tagged [assembly]
Assembly language questions. Please tag the processor and/or the instruction set you are using, as well as the assembler, a valid set should be like this: ([assembly] [x86] [gnu-assembler] or [att]). Use the [.net-assembly] tag instead for .NET assemblies, [cil] for .NET assembly language, [wasm] for web assembly, and for Java bytecode, use the tag java-bytecode-asm instead.
assembly
6,304
questions
116
votes
3
answers
34k
views
How to remove "noise" from GCC/clang assembly output?
I want to inspect the assembly output of applying boost::variant in my code in order to see which intermediate calls are optimized away.
When I compile the following example (with GCC 5.3 using g++ -...
75
votes
1
answer
18k
views
What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
int 0x80 on Linux always invokes the 32-bit ABI, regardless of what mode it's called from: args in ebx, ecx, ... and syscall numbers from /usr/include/asm/unistd_32.h. (Or crashes on 64-bit kernels ...
202
votes
4
answers
188k
views
What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Following links explain x86-32 system call conventions for both UNIX (BSD flavor) & Linux:
http://www.int80h.org/bsdasm/#system-calls
http://www.freebsd.org/doc/en/books/developers-handbook/x86-...
14
votes
1
answer
4k
views
Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?
I am disassembling this code on llvm clang Apple LLVM version 8.0.0 (clang-800.0.42.1):
int main() {
float a=0.151234;
float b=0.2;
float c=a+b;
printf("%f", c);
}
I compiled with no ...
189
votes
1
answer
88k
views
What is the best way to set a register to zero in x86 assembly: xor, mov or and?
All the following instructions do the same thing: set %eax to zero. Which way is optimal (requiring fewest machine cycles)?
xorl %eax, %eax
mov $0, %eax
andl $0, %eax
29
votes
5
answers
77k
views
How do I print an integer in Assembly Level Programming without printf from the c library? (itoa, integer to decimal ASCII string)
Can anyone tell me the purely assembly code for displaying the value in a register in decimal format? Please don't suggest using the printf hack and then compile with gcc.
Description:
Well, I did ...
179
votes
4
answers
50k
views
Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?
In the x86-64 Tour of Intel Manuals, I read
Perhaps the most surprising fact is that an instruction such as MOV EAX, EBX automatically zeroes upper 32 bits of RAX register.
The Intel documentation (...
100
votes
3
answers
15k
views
Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?
LOOP (Intel ref manual entry)
decrements ecx / rcx, and then jumps if non-zero. It's slow, but couldn't Intel have cheaply made it fast? dec/jnz already macro-fuses into a single uop on Sandybridge-...
90
votes
5
answers
58k
views
Fastest way to do horizontal SSE vector sum (or other reduction)
Given a vector of three (or four) floats. What is the fastest way to sum them?
Is SSE (movaps, shuffle, add, movd) always faster than x87? Are the horizontal-add instructions in SSE3 worth it?
What'...
190
votes
13
answers
26k
views
Is incrementing an int effectively atomic in specific cases?
In general, for int num, num++ (or ++num), as a read-modify-write operation, is not atomic. But I often see compilers, for example GCC, generate the following code for it (try here):
void f()
{
int ...
63
votes
4
answers
8k
views
Micro fusion and addressing modes
I have found something unexpected (to me) using the Intel® Architecture Code Analyzer (IACA).
The following instruction using [base+index] addressing
addps xmm1, xmmword ptr [rsi+rax*1]
does not ...
12
votes
2
answers
26k
views
Referencing the contents of a memory location. (x86 addressing modes)
I have a memory location that contains a character that I want to compare with another character (and it's not at the top of the stack so I can't just pop it). How do I reference the contents of a ...
42
votes
1
answer
7k
views
Why are loops always compiled into "do...while" style (tail jump)?
When trying to understand assembly (with compiler optimization on), I see this behavior:
A very basic loop like this
outside_loop;
while (condition) {
statements;
}
Is often compiled into (...
276
votes
5
answers
39k
views
Why does GCC use multiplication by a strange number in implementing integer division?
I've been reading about div and mul assembly operations, and I decided to see them in action by writing a simple program in C:
File division.c
#include <stdlib.h>
#include <stdio.h>
int ...
65
votes
1
answer
7k
views
Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? (Unrolling FP loops with multiple accumulators)
I'm a newbie at instruction optimization.
I did a simple analysis on a simple function dotp which is used to get the dot product of two float arrays.
The C code is as follows:
float dotp( ...