Questions tagged [assembly]

Ask Question

Assembly language questions. Please tag the processor and/or the instruction set you are using, as well as the assembler, a valid set should be like this: ([assembly] [x86] [gnu-assembler] or [att]). Use the [.net-assembly] tag instead for .NET assemblies, [cil] for .NET assembly language, [wasm] for web assembly, and for Java bytecode, use the tag java-bytecode-asm instead.

10,531 questions with no upvoted or accepted answers

13 votes

0 answers

423 views

Why does a NOP (as a 5th uop) speed up a 4 uop loop on Ice Lake?

All benchmarks are done on: Icelake: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz (ark) Edit: I was not able to reproduce this on broadwell and @PeterCordes was unable to reproduce it on skylake I was ...

Noah

1,759

asked Feb 12, 2021 at 1:22

10 votes

0 answers

2k views

Difference between VMOVDQA and VMOVAPS?

I read the ISA reference and I am clear that the 2 instructions differ in the type of value they load (integer vs single precision float). What I don't understand is that the effect of the load is ...

Anmol Sahoo

asked Feb 22, 2021 at 0:59

9 votes

0 answers

204 views

Are there processors on which VPMASKMOVD generates faults for the masked-out elements?

Are there processors on which VPMASKMOVD generates faults for the masked-out elements? Going by the Intel Software Developer's Manual, the answer is plainly "no": Faults occur only due to ...

user555045

63.9k

asked Jan 28 at 15:16

9 votes

0 answers

300 views

Why does gcc -O3 produce wildly different assembly for the same function?

I have a loop in a physics engine which detects collisions like this: // now check for collisions // we only allow 1 collision per 2 partcles per frame so the // one with the lower index will ...

brenzo

asked Dec 31, 2021 at 6:15

9 votes

1 answer

460 views

Best way to do a packed 16 element blend using SSE

I would like to implement the following function using SSE. It blends elements from a with packed elements from b, where elements are only present if they are used. void packedBlend16(uint8_t mask, ...

Nick

asked May 16, 2020 at 19:52

9 votes

0 answers

751 views

What's up with the "half fence" behavior of rdtscp?

For many years x86 CPUs supported the rdtsc instruction, which reads the "time stamp counter" of the current CPU. The exact definition of this counter has changed over time, but on recent CPUs it is a ...

BeeOnRope

63.1k

asked Sep 4, 2018 at 3:53

9 votes

0 answers

561 views

Intellisense warning that it can't find function definition for assembly function

In my MSVC 2015 project I have a function, int foo(int, int) which is implemented in an .asm file. When I extern "C" declare this function in a .cpp file in the same project, Intellisense complains ...

BeeOnRope

63.1k

asked Jun 12, 2016 at 3:44

9 votes

0 answers

5k views

ld: Undefined symbols for architecture x86_64

I have made a nasm assembly hello world program like this: global start section .text start: mov rax, 0x20000004 mov rdi, 1 lea rsi, [rel msg] mov rdx, msg.len syscall mov ...

Jerfov2

5,455

asked Jul 4, 2015 at 16:25

8 votes

0 answers

272 views

Golang goroutine preemption

I was wondering how Golang does preemption of goroutines, after 1.14 version where scheduler became non-cooperative and studied the source code, but it seems my knowledge is not enough to comprehend ...

toozyfuzzy

1,198

asked Apr 19 at 11:23

8 votes

1 answer

153 views

Why does GCC fail to reduce a loop that increments two locations of the same buffer?

Here is a bounded loop that increments two locations of the same buffer. unsigned int getid(); void foo(unsigned int *counter, unsigned int n) { unsigned int A = getid(); unsigned int ...

AceSrc

asked Dec 7, 2023 at 7:37

8 votes

0 answers

165 views

Why newer clang is generating one more instruction than just popcntl to count the bits of an int on haswell architecture?

While watching this talk by Matt Godbolt, I was astonished to see that Clang, if instructed to compile for the Haswell¹ architecture, works out that the following code int foo(int a) { int count = ...

Enlico

26.7k

asked Aug 28, 2023 at 7:47

8 votes

0 answers

826 views

Optimizing cumulative sum

I need some help to understand how an optimization I tried is even working. The cumsum function gets a vector, and writes a vector with the accumulated sum. I tried the following to optimize this: ...

user13963867

asked Aug 31, 2021 at 17:20

8 votes

0 answers

556 views

gdb tui -- turn off printing function parameters for asm layout

Gdb in tui mode in asm layout prints something like: <address+0> <namespace:func(int, int, ..... many many many parameters)+0> instruction1 <address+4> <namespace:func(int, int, ....

JenyaKh

2,268

asked Dec 8, 2020 at 12:15

8 votes

0 answers

296 views

Why is x/10 optimized with an unnecessary shift when x has a restricted range?

I have this function long long int divideBy10(long long int a){ return a / 10; } it's compiled to: mov rax, rdi movabs rcx, 7378697629483820647 imul rcx ...

Slei

asked Jun 10, 2020 at 17:07

8 votes

0 answers

1k views

Usage of instruction pxor before SSE instruction cvtsi2ss

I am currently writing various implementations of a color to black/white image converter. I would like to do a : Simple C++ implementation Self made ASM implementation Self made ASM implementation ...

Sydney Hauke

asked Feb 16, 2017 at 23:36

15 30 50 per page

2 3 4 5

…

703 Next

Collectives™ on Stack Overflow

Questions tagged [assembly]

Why does a NOP (as a 5th uop) speed up a 4 uop loop on Ice Lake?

Difference between VMOVDQA and VMOVAPS?

Are there processors on which VPMASKMOVD generates faults for the masked-out elements?

Why does gcc -O3 produce wildly different assembly for the same function?

Best way to do a packed 16 element blend using SSE

What's up with the "half fence" behavior of rdtscp?

Intellisense warning that it can't find function definition for assembly function

ld: Undefined symbols for architecture x86_64

Golang goroutine preemption

Why does GCC fail to reduce a loop that increments two locations of the same buffer?

Why newer clang is generating one more instruction than just popcntl to count the bits of an int on haswell architecture?

Optimizing cumulative sum

gdb tui -- turn off printing function parameters for asm layout

Why is x/10 optimized with an unnecessary shift when x has a restricted range?

Usage of instruction pxor before SSE instruction cvtsi2ss

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [assembly]

Related Tags