Skip to main content

Questions tagged [assembly]

Assembly language questions. Please tag the processor and/or the instruction set you are using, as well as the assembler, a valid set should be like this: ([assembly] [x86] [gnu-assembler] or [att]). Use the [.net-assembly] tag instead for .NET assemblies, [cil] for .NET assembly language, [wasm] for web assembly, and for Java bytecode, use the tag java-bytecode-asm instead.

116 votes
3 answers
34k views

How to remove "noise" from GCC/clang assembly output?

I want to inspect the assembly output of applying boost::variant in my code in order to see which intermediate calls are optimized away. When I compile the following example (with GCC 5.3 using g++ -...
m.s.'s user avatar
  • 16.2k
75 votes
1 answer
18k views

What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?

int 0x80 on Linux always invokes the 32-bit ABI, regardless of what mode it's called from: args in ebx, ecx, ... and syscall numbers from /usr/include/asm/unistd_32.h. (Or crashes on 64-bit kernels ...
Peter Cordes's user avatar
202 votes
4 answers
188k views

What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64

Following links explain x86-32 system call conventions for both UNIX (BSD flavor) & Linux: http://www.int80h.org/bsdasm/#system-calls http://www.freebsd.org/doc/en/books/developers-handbook/x86-...
claws's user avatar
  • 53.6k
14 votes
1 answer
4k views

Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?

I am disassembling this code on llvm clang Apple LLVM version 8.0.0 (clang-800.0.42.1): int main() { float a=0.151234; float b=0.2; float c=a+b; printf("%f", c); } I compiled with no ...
Stefano Borini's user avatar
189 votes
1 answer
88k views

What is the best way to set a register to zero in x86 assembly: xor, mov or and?

All the following instructions do the same thing: set %eax to zero. Which way is optimal (requiring fewest machine cycles)? xorl %eax, %eax mov $0, %eax andl $0, %eax
balajimc55's user avatar
  • 2,265
29 votes
5 answers
77k views

How do I print an integer in Assembly Level Programming without printf from the c library? (itoa, integer to decimal ASCII string)

Can anyone tell me the purely assembly code for displaying the value in a register in decimal format? Please don't suggest using the printf hack and then compile with gcc. Description: Well, I did ...
Kaustav Majumder's user avatar
179 votes
4 answers
50k views

Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?

In the x86-64 Tour of Intel Manuals, I read Perhaps the most surprising fact is that an instruction such as MOV EAX, EBX automatically zeroes upper 32 bits of RAX register. The Intel documentation (...
Nubok's user avatar
  • 3,622
100 votes
3 answers
15k views

Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?

LOOP (Intel ref manual entry) decrements ecx / rcx, and then jumps if non-zero. It's slow, but couldn't Intel have cheaply made it fast? dec/jnz already macro-fuses into a single uop on Sandybridge-...
Peter Cordes's user avatar
90 votes
5 answers
58k views

Fastest way to do horizontal SSE vector sum (or other reduction)

Given a vector of three (or four) floats. What is the fastest way to sum them? Is SSE (movaps, shuffle, add, movd) always faster than x87? Are the horizontal-add instructions in SSE3 worth it? What'...
FeepingCreature's user avatar
190 votes
13 answers
26k views

Is incrementing an int effectively atomic in specific cases?

In general, for int num, num++ (or ++num), as a read-modify-write operation, is not atomic. But I often see compilers, for example GCC, generate the following code for it (try here): void f() { int ...
Leo Heinsaar's user avatar
  • 4,007
63 votes
4 answers
8k views

Micro fusion and addressing modes

I have found something unexpected (to me) using the Intel® Architecture Code Analyzer (IACA). The following instruction using [base+index] addressing addps xmm1, xmmword ptr [rsi+rax*1] does not ...
Z boson's user avatar
  • 33.2k
12 votes
2 answers
26k views

Referencing the contents of a memory location. (x86 addressing modes)

I have a memory location that contains a character that I want to compare with another character (and it's not at the top of the stack so I can't just pop it). How do I reference the contents of a ...
DrakeJacks's user avatar
42 votes
1 answer
7k views

Why are loops always compiled into "do...while" style (tail jump)?

When trying to understand assembly (with compiler optimization on), I see this behavior: A very basic loop like this outside_loop; while (condition) { statements; } Is often compiled into (...
iBug's user avatar
  • 36.8k
276 votes
5 answers
39k views

Why does GCC use multiplication by a strange number in implementing integer division?

I've been reading about div and mul assembly operations, and I decided to see them in action by writing a simple program in C: File division.c #include <stdlib.h> #include <stdio.h> int ...
qiubit's user avatar
  • 4,806
65 votes
1 answer
7k views

Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? (Unrolling FP loops with multiple accumulators)

I'm a newbie at instruction optimization. I did a simple analysis on a simple function dotp which is used to get the dot product of two float arrays. The C code is as follows: float dotp( ...
Forward's user avatar
  • 935

15 30 50 per page
1
2 3 4 5
421