Questions tagged [assembly]

Ask Question

Assembly language questions. Please tag the processor and/or the instruction set you are using, as well as the assembler, a valid set should be like this: ([assembly] [x86] [gnu-assembler] or [att]). Use the [.net-assembly] tag instead for .NET assemblies, [cil] for .NET assembly language, [wasm] for web assembly, and for Java bytecode, use the tag java-bytecode-asm instead.

6,304 questions

116 votes

3 answers

34k views

How to remove "noise" from GCC/clang assembly output?

I want to inspect the assembly output of applying boost::variant in my code in order to see which intermediate calls are optimized away. When I compile the following example (with GCC 5.3 using g++ -...

m.s.

16.2k

asked Jul 24, 2016 at 12:39

75 votes

1 answer

18k views

What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?

int 0x80 on Linux always invokes the 32-bit ABI, regardless of what mode it's called from: args in ebx, ecx, ... and syscall numbers from /usr/include/asm/unistd_32.h. (Or crashes on 64-bit kernels ...

Peter Cordes

352k

asked Sep 7, 2017 at 4:20

202 votes

4 answers

188k views

What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64

Following links explain x86-32 system call conventions for both UNIX (BSD flavor) & Linux: http://www.int80h.org/bsdasm/#system-calls http://www.freebsd.org/doc/en/books/developers-handbook/x86-...

claws

53.6k

asked Mar 29, 2010 at 5:48

14 votes

1 answer

4k views

Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?

I am disassembling this code on llvm clang Apple LLVM version 8.0.0 (clang-800.0.42.1): int main() { float a=0.151234; float b=0.2; float c=a+b; printf("%f", c); } I compiled with no ...

Stefano Borini

142k

asked Nov 18, 2018 at 23:16

189 votes

1 answer

88k views

What is the best way to set a register to zero in x86 assembly: xor, mov or and?

All the following instructions do the same thing: set %eax to zero. Which way is optimal (requiring fewest machine cycles)? xorl %eax, %eax mov $0, %eax andl $0, %eax

balajimc55

2,265

asked Nov 12, 2015 at 7:55

29 votes

5 answers

77k views

How do I print an integer in Assembly Level Programming without printf from the c library? (itoa, integer to decimal ASCII string)

Can anyone tell me the purely assembly code for displaying the value in a register in decimal format? Please don't suggest using the printf hack and then compile with gcc. Description: Well, I did ...

Kaustav Majumder

asked Oct 31, 2012 at 19:20

179 votes

4 answers

50k views

Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?

In the x86-64 Tour of Intel Manuals, I read Perhaps the most surprising fact is that an instruction such as MOV EAX, EBX automatically zeroes upper 32 bits of RAX register. The Intel documentation (...

Nubok

3,622

asked Jun 24, 2012 at 11:40

100 votes

3 answers

15k views

Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?

LOOP (Intel ref manual entry) decrements ecx / rcx, and then jumps if non-zero. It's slow, but couldn't Intel have cheaply made it fast? dec/jnz already macro-fuses into a single uop on Sandybridge-...

Peter Cordes

352k

asked Mar 2, 2016 at 9:01

90 votes

5 answers

58k views

Fastest way to do horizontal SSE vector sum (or other reduction)

Given a vector of three (or four) floats. What is the fastest way to sum them? Is SSE (movaps, shuffle, add, movd) always faster than x87? Are the horizontal-add instructions in SSE3 worth it? What'...

FeepingCreature

3,748

asked Aug 9, 2011 at 13:16

190 votes

13 answers

26k views

Is incrementing an int effectively atomic in specific cases?

In general, for int num, num++ (or ++num), as a read-modify-write operation, is not atomic. But I often see compilers, for example GCC, generate the following code for it (try here): void f() { int ...

Leo Heinsaar

4,007

asked Sep 8, 2016 at 14:39

63 votes

4 answers

8k views

Micro fusion and addressing modes

I have found something unexpected (to me) using the Intel® Architecture Code Analyzer (IACA). The following instruction using [base+index] addressing addps xmm1, xmmword ptr [rsi+rax*1] does not ...

Z boson

33.2k

asked Sep 25, 2014 at 19:33

12 votes

2 answers

26k views

Referencing the contents of a memory location. (x86 addressing modes)

I have a memory location that contains a character that I want to compare with another character (and it's not at the top of the stack so I can't just pop it). How do I reference the contents of a ...

DrakeJacks

asked Dec 3, 2015 at 4:50

42 votes

1 answer

7k views

Why are loops always compiled into "do...while" style (tail jump)?

When trying to understand assembly (with compiler optimization on), I see this behavior: A very basic loop like this outside_loop; while (condition) { statements; } Is often compiled into (...

iBug

36.8k

asked Dec 13, 2017 at 0:51

276 votes

5 answers

39k views

Why does GCC use multiplication by a strange number in implementing integer division?

I've been reading about div and mul assembly operations, and I decided to see them in action by writing a simple program in C: File division.c #include <stdlib.h> #include <stdio.h> int ...

qiubit

4,806

asked Dec 16, 2016 at 11:59

65 votes

1 answer

7k views

Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? (Unrolling FP loops with multiple accumulators)

I'm a newbie at instruction optimization. I did a simple analysis on a simple function dotp which is used to get the dot product of two float arrays. The C code is as follows: float dotp( ...

Forward

asked Jul 15, 2017 at 1:14

15 30 50 per page

2 3 4 5

…

421 Next

Collectives™ on Stack Overflow

Questions tagged [assembly]

How to remove "noise" from GCC/clang assembly output?

What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?

What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64

Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?

What is the best way to set a register to zero in x86 assembly: xor, mov or and?

How do I print an integer in Assembly Level Programming without printf from the c library? (itoa, integer to decimal ASCII string)

Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?

Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?

Fastest way to do horizontal SSE vector sum (or other reduction)

Is incrementing an int effectively atomic in specific cases?

Micro fusion and addressing modes

Referencing the contents of a memory location. (x86 addressing modes)

Why are loops always compiled into "do...while" style (tail jump)?

Why does GCC use multiplication by a strange number in implementing integer division?

Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? (Unrolling FP loops with multiple accumulators)

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [assembly]

Related Tags