Questions tagged [cuda]

Ask Question

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units). CUDA provides an interface to NVIDIA GPUs through a variety of programming languages, libraries, and APIs.

14,514 questions

0 votes

1 answer

8 views

How to partition data in a warp based on a predicate so all keep items are consecutive

I have a warp full of data, some of which I want to keep and some I want to discard. I want to store the keep items in contiguous memory. For example, say I only want to keep prime numbers input ...

Johan

75.5k

asked 1 hour ago

0 votes

1 answer

13 views

cuobjdump emit no PTX arithmetic instruction

Why doesn't cuobjdump emit the PTX mul instruction below? Has nvcc optimized the cubin output iteself? Is the result calculated at compile-time? If so, for this simplest case nvcc can reasonably ...

sof

9,511

asked 1 hour ago

0 votes

1 answer

35 views

Build issue with MatX concerning initialisation of shared variables

I'm attempting to build and install MatX onto my Linux machine. I'm following the instructions found here. Except when I run the make -j command, I get the following trace: /home/<me>/Documents/...

Hugo Phibbs

asked yesterday

0 votes

1 answer

30 views

Calculate network from cugraph

I have been playing around with cugraph and nx_cugraph in python, but I am struggling to calculate the number of connected components from the graph. I have been getting a lot of errors. To calculate ...

Tan Linh

asked yesterday

0 votes

1 answer

55 views

Problems evaluating CUDNN for SGEMM

I used cudnn to test sgemm for C[stride x stride] = A[stride x stride] x B[stride x stride] below, Configuration GPU: T1000/SM_75 cuda-12.0.1/driver-535 installed (via the multiverse repos on ubuntu-...

sof

9,511

asked yesterday

0 votes

0 answers

30 views

CUDA Thrust Sort Error C2338: ‘unimplemented for this system’ in Visual Studio 2022 after Git Pull [closed]

I'm facing an issue with a CUDA project that was previously compiling and running successfully. After pulling the latest code from GitLab, I'm now encountering a static_assert error from the Thrust ...

Tang SuKai

asked yesterday

2 votes

1 answer

32 views

CUDA: Nth set bit indexes using all threads in a warp in O(1) time

I have a 32-bit bit mask holding a set of valid items. From that bit mask I want to extract the indices of valid entries as a list. Let's say I obtained the bit mask using a ballot, and I want to know ...

Johan

75.5k

asked yesterday

-4 votes

0 answers

29 views

How to build and use Nvidia cuCollections on Windows? [closed]

Is there a way to make this work on windows? https://github.com/NVIDIA/cuCollections I am unable to compile it and unable to use the .cuh files as a part of my project. The bottom line is that the lib ...

realPro

1,760

asked yesterday

-2 votes

0 answers

20 views

Want to run a Local LLM on Nvidia Jetson AGX Orin over GPU

I am looking to run a local LLM (Large Language Model) on an Nvidia Jetson AGX Orin over the GPU CUDA Cores . Could anyone provide guidance or share resources on how to achieve this? Thank you in ...

Mausam Jain

asked 2 days ago

0 votes

0 answers

14 views

What is the meaning of each member variable of the CUDA_BATCH_MEM_OP_NODE_PARAMS structure?

typedef struct CUDA_BATCH_MEM_OP_NODE_PARAMS_st { CUcontext ctx; unsigned int count; CUstreamBatchMemOpParams *paramArray; unsigned int flags; } CUDA_BATCH_MEM_OP_NODE_PARAMS; I want ...

zhe ming

asked 2 days ago

0 votes

0 answers

29 views

cuda device class data modification fails for large number of threads [duplicate]

I instantiate device only classes with large data members. Subsequently, the data members are modified by all cuda threads via class pointers. The program works for small number of threads (for ...

minsuk ji

asked 2 days ago

1 vote

0 answers

45 views

Weird behaviour of CUDA recursion

In the following minimal reproducible example, when the recursion in device_func is active, the __synchthreads() barrier is ignored, and when debugged, breakpoint 2 occurs before breakpoint 1. If the ...

larrycaverga

asked 2 days ago

2 votes

2 answers

55 views

How to correctly simulate `atomicAdd` on `u64` by using two `u32` buffers?

I'm trying to do atomic operations on u64. But since it's not supported, the number is stored in TWO u32 buffers The issue is that I'm not sure how to do atomicAdd correctly to simulate the effect it ...

RRR

asked 2 days ago

0 votes

0 answers

39 views

I want to use 11.7 version of cuda (but my driver wants 12.2) [closed]

I am a beginner in artificial intelligence. In order to test specific artificial intelligence, version 11.7 of CUDA is required. The recommended CUDA version of the driver is 12.2, but I want to use ...

황수현

asked 2 days ago

0 votes

0 answers

23 views

Gromacs with PTX

I want to generate the final gromacs library to contain ptx files. I have found that there are some options to (probably) achieve that, but I am not sure how I can use them. For instance, the file ...

MANOS

asked 2 days ago

15 30 50 per page

2 3 4 5

…

968 Next

Collectives™ on Stack Overflow

Questions tagged [cuda]

How to partition data in a warp based on a predicate so all keep items are consecutive

cuobjdump emit no PTX arithmetic instruction

Build issue with MatX concerning initialisation of shared variables

Calculate network from cugraph

Problems evaluating CUDNN for SGEMM

CUDA Thrust Sort Error C2338: ‘unimplemented for this system’ in Visual Studio 2022 after Git Pull [closed]

CUDA: Nth set bit indexes using all threads in a warp in O(1) time

How to build and use Nvidia cuCollections on Windows? [closed]

Want to run a Local LLM on Nvidia Jetson AGX Orin over GPU

What is the meaning of each member variable of the CUDA_BATCH_MEM_OP_NODE_PARAMS structure?

cuda device class data modification fails for large number of threads [duplicate]

Weird behaviour of CUDA recursion

How to correctly simulate `atomicAdd` on `u64` by using two `u32` buffers?

I want to use 11.7 version of cuda (but my driver wants 12.2) [closed]

Gromacs with PTX

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [cuda]

Related Tags