Skip to main content

Questions tagged [cuda]

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units). CUDA provides an interface to NVIDIA GPUs through a variety of programming languages, libraries, and APIs.

0 votes
1 answer
8 views

How to partition data in a warp based on a predicate so all keep items are consecutive

I have a warp full of data, some of which I want to keep and some I want to discard. I want to store the keep items in contiguous memory. For example, say I only want to keep prime numbers input ...
Johan's user avatar
  • 75.5k
0 votes
1 answer
13 views

cuobjdump emit no PTX arithmetic instruction

Why doesn't cuobjdump emit the PTX mul instruction below? Has nvcc optimized the cubin output iteself? Is the result calculated at compile-time? If so, for this simplest case nvcc can reasonably ...
sof's user avatar
  • 9,511
0 votes
1 answer
35 views

Build issue with MatX concerning initialisation of shared variables

I'm attempting to build and install MatX onto my Linux machine. I'm following the instructions found here. Except when I run the make -j command, I get the following trace: /home/<me>/Documents/...
Hugo Phibbs's user avatar
0 votes
1 answer
30 views

Calculate network from cugraph

I have been playing around with cugraph and nx_cugraph in python, but I am struggling to calculate the number of connected components from the graph. I have been getting a lot of errors. To calculate ...
Tan Linh's user avatar
0 votes
1 answer
55 views

Problems evaluating CUDNN for SGEMM

I used cudnn to test sgemm for C[stride x stride] = A[stride x stride] x B[stride x stride] below, Configuration GPU: T1000/SM_75 cuda-12.0.1/driver-535 installed (via the multiverse repos on ubuntu-...
sof's user avatar
  • 9,511
0 votes
0 answers
30 views

CUDA Thrust Sort Error C2338: ‘unimplemented for this system’ in Visual Studio 2022 after Git Pull [closed]

I'm facing an issue with a CUDA project that was previously compiling and running successfully. After pulling the latest code from GitLab, I'm now encountering a static_assert error from the Thrust ...
Tang SuKai's user avatar
2 votes
1 answer
32 views

CUDA: Nth set bit indexes using all threads in a warp in O(1) time

I have a 32-bit bit mask holding a set of valid items. From that bit mask I want to extract the indices of valid entries as a list. Let's say I obtained the bit mask using a ballot, and I want to know ...
Johan's user avatar
  • 75.5k
-4 votes
0 answers
29 views

How to build and use Nvidia cuCollections on Windows? [closed]

Is there a way to make this work on windows? https://github.com/NVIDIA/cuCollections I am unable to compile it and unable to use the .cuh files as a part of my project. The bottom line is that the lib ...
realPro's user avatar
  • 1,760
-2 votes
0 answers
20 views

Want to run a Local LLM on Nvidia Jetson AGX Orin over GPU

I am looking to run a local LLM (Large Language Model) on an Nvidia Jetson AGX Orin over the GPU CUDA Cores . Could anyone provide guidance or share resources on how to achieve this? Thank you in ...
Mausam Jain's user avatar
0 votes
0 answers
14 views

What is the meaning of each member variable of the CUDA_BATCH_MEM_OP_NODE_PARAMS structure?

typedef struct CUDA_BATCH_MEM_OP_NODE_PARAMS_st { CUcontext ctx; unsigned int count; CUstreamBatchMemOpParams *paramArray; unsigned int flags; } CUDA_BATCH_MEM_OP_NODE_PARAMS; I want ...
zhe ming's user avatar
0 votes
0 answers
29 views

cuda device class data modification fails for large number of threads [duplicate]

I instantiate device only classes with large data members. Subsequently, the data members are modified by all cuda threads via class pointers. The program works for small number of threads (for ...
minsuk ji's user avatar
1 vote
0 answers
45 views

Weird behaviour of CUDA recursion

In the following minimal reproducible example, when the recursion in device_func is active, the __synchthreads() barrier is ignored, and when debugged, breakpoint 2 occurs before breakpoint 1. If the ...
larrycaverga's user avatar
2 votes
2 answers
55 views

How to correctly simulate `atomicAdd` on `u64` by using two `u32` buffers?

I'm trying to do atomic operations on u64. But since it's not supported, the number is stored in TWO u32 buffers The issue is that I'm not sure how to do atomicAdd correctly to simulate the effect it ...
RRR's user avatar
  • 497
0 votes
0 answers
39 views

I want to use 11.7 version of cuda (but my driver wants 12.2) [closed]

I am a beginner in artificial intelligence. In order to test specific artificial intelligence, version 11.7 of CUDA is required. The recommended CUDA version of the driver is 12.2, but I want to use ...
황수현's user avatar
0 votes
0 answers
23 views

Gromacs with PTX

I want to generate the final gromacs library to contain ptx files. I have found that there are some options to (probably) achieve that, but I am not sure how I can use them. For instance, the file ...
MANOS's user avatar
  • 31

15 30 50 per page
1
2 3 4 5
968