Questions tagged [cuda]
CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units). CUDA provides an interface to NVIDIA GPUs through a variety of programming languages, libraries, and APIs.
14,514
questions
0
votes
1
answer
8
views
How to partition data in a warp based on a predicate so all keep items are consecutive
I have a warp full of data, some of which I want to keep and some I want to discard.
I want to store the keep items in contiguous memory.
For example, say I only want to keep prime numbers
input ...
0
votes
1
answer
13
views
cuobjdump emit no PTX arithmetic instruction
Why doesn't cuobjdump emit the PTX mul instruction below? Has nvcc optimized the cubin output iteself? Is the result calculated at compile-time? If so, for this simplest case nvcc can reasonably ...
0
votes
1
answer
35
views
Build issue with MatX concerning initialisation of shared variables
I'm attempting to build and install MatX onto my Linux machine.
I'm following the instructions found here.
Except when I run the make -j command, I get the following trace:
/home/<me>/Documents/...
0
votes
1
answer
30
views
Calculate network from cugraph
I have been playing around with cugraph and nx_cugraph in python, but I am struggling to calculate the number of connected components from the graph. I have been getting a lot of errors.
To calculate ...
0
votes
1
answer
55
views
Problems evaluating CUDNN for SGEMM
I used cudnn to test sgemm for C[stride x stride] = A[stride x stride] x B[stride x stride] below,
Configuration
GPU: T1000/SM_75
cuda-12.0.1/driver-535 installed (via the multiverse repos on ubuntu-...
0
votes
0
answers
30
views
CUDA Thrust Sort Error C2338: ‘unimplemented for this system’ in Visual Studio 2022 after Git Pull [closed]
I'm facing an issue with a CUDA project that was previously compiling and running successfully. After pulling the latest code from GitLab, I'm now encountering a static_assert error from the Thrust ...
2
votes
1
answer
32
views
CUDA: Nth set bit indexes using all threads in a warp in O(1) time
I have a 32-bit bit mask holding a set of valid items.
From that bit mask I want to extract the indices of valid entries as a list.
Let's say I obtained the bit mask using a ballot, and I want to know ...
-4
votes
0
answers
29
views
How to build and use Nvidia cuCollections on Windows? [closed]
Is there a way to make this work on windows?
https://github.com/NVIDIA/cuCollections
I am unable to compile it and unable to use the .cuh files as a part of my project.
The bottom line is that the lib ...
-2
votes
0
answers
20
views
Want to run a Local LLM on Nvidia Jetson AGX Orin over GPU
I am looking to run a local LLM (Large Language Model) on an Nvidia Jetson AGX Orin over the GPU CUDA Cores . Could anyone provide guidance or share resources on how to achieve this?
Thank you in ...
0
votes
0
answers
14
views
What is the meaning of each member variable of the CUDA_BATCH_MEM_OP_NODE_PARAMS structure?
typedef struct CUDA_BATCH_MEM_OP_NODE_PARAMS_st {
CUcontext ctx;
unsigned int count;
CUstreamBatchMemOpParams *paramArray;
unsigned int flags;
} CUDA_BATCH_MEM_OP_NODE_PARAMS;
I want ...
0
votes
0
answers
29
views
cuda device class data modification fails for large number of threads [duplicate]
I instantiate device only classes with large data members. Subsequently, the data members are modified by all cuda threads via class pointers.
The program works for small number of threads (for ...
1
vote
0
answers
45
views
Weird behaviour of CUDA recursion
In the following minimal reproducible example, when the recursion in device_func is active, the __synchthreads() barrier is ignored, and when debugged, breakpoint 2 occurs before breakpoint 1. If the ...
2
votes
2
answers
55
views
How to correctly simulate `atomicAdd` on `u64` by using two `u32` buffers?
I'm trying to do atomic operations on u64. But since it's not supported, the number is stored in TWO u32 buffers
The issue is that I'm not sure how to do atomicAdd correctly to simulate the effect it ...
0
votes
0
answers
39
views
I want to use 11.7 version of cuda (but my driver wants 12.2) [closed]
I am a beginner in artificial intelligence. In order to test specific artificial intelligence, version 11.7 of CUDA is required.
The recommended CUDA version of the driver is 12.2, but I want to use ...
0
votes
0
answers
23
views
Gromacs with PTX
I want to generate the final gromacs library to contain ptx files. I have found that there are some options to (probably) achieve that, but I am not sure how I can use them. For instance, the file ...