Skip to main content

Questions tagged [cuda]

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units). CUDA provides an interface to NVIDIA GPUs through a variety of programming languages, libraries, and APIs.

0 votes
1 answer
15 views

Hide warnings and/or errors when importing Keras

My script imports the following Keras modules: from keras.models import Sequential from keras.layers import Dense, Input from keras.utils import to_categorical and every time the same warnings/errors ...
Gabriel's user avatar
  • 42.1k
-3 votes
0 answers
14 views

i getting a KeyError in numba but the Key exists [duplicate]

i getting KeyError in this code: from numba import jit, cuda import numpy as np from timeit import default_timer as timer @jit(target_backend='cuda') def func2(a): for ...
LiogamerYT's user avatar
-1 votes
0 answers
27 views

Contradict specs on tensor cores on my GPU [duplicate]

My GPU is Quadro T1000 Mobile (SM_75). I've received the contrary device specs on tensor cores. The GPU has 14 SMs and the chapter compute capacity 7.x lists 8 tensor cores per SM straightly. If so, ...
sof's user avatar
  • 9,509
-2 votes
0 answers
12 views

CUDA Issues Kubeflow

In our company we have Kubeflow running with GPUs available. I'm using a standard docker image jupyter-pytorch-cuda-full:v1.8.0 as base image. torch.version = 2.1.0+cu121 is installed, the GPU is ...
Romero Azzalini's user avatar
0 votes
0 answers
22 views

Compiling CUDA programs with clang takes over an hour [closed]

I am using clang-18 to compile CUDA programs, and the compilation process does not report any errors, but it takes a very long time (even over an hour). The program can be compiled very quickly using ...
putong's user avatar
  • 1
0 votes
0 answers
21 views

Cannot open source file "crtdefs.h" in VSC (CUDA script), but CUDA compilation works

My CUDA script (.cu) can be compiled without error, but #include <stdio.h> line raises VSC's error: #include errors detected. Please update your includePath. Squiggles are disabled for this ...
TaihouKai's user avatar
  • 301
0 votes
1 answer
24 views

How to partition data in a warp based on a predicate so all keep items are consecutive

I have a warp full of data, some of which I want to keep and some I want to discard. I want to store the keep items in contiguous memory. For example, say I only want to keep prime numbers input ...
Johan's user avatar
  • 75.5k
0 votes
1 answer
31 views

cuobjdump emit no PTX arithmetic instruction

Why doesn't cuobjdump emit the PTX mul instruction below? Has nvcc optimized the cubin output iteself? Is the result calculated at compile-time? If so, for this simplest case nvcc can reasonably ...
sof's user avatar
  • 9,509
0 votes
1 answer
36 views

Build issue with MatX concerning initialisation of shared variables

I'm attempting to build and install MatX onto my Linux machine. I'm following the instructions found here. Except when I run the make -j command, I get the following trace: /home/<me>/Documents/...
Hugo Phibbs's user avatar
0 votes
1 answer
32 views

Calculate network from cugraph

I have been playing around with cugraph and nx_cugraph in python, but I am struggling to calculate the number of connected components from the graph. I have been getting a lot of errors. To calculate ...
Tan Linh's user avatar
0 votes
1 answer
59 views

Problems evaluating CUDNN for SGEMM

I used cudnn to test sgemm for C[stride x stride] = A[stride x stride] x B[stride x stride] below, Configuration GPU: T1000/SM_75 cuda-12.0.1/driver-535 installed (via the multiverse repos on ubuntu-...
sof's user avatar
  • 9,509
0 votes
0 answers
32 views

CUDA Thrust Sort Error C2338: ‘unimplemented for this system’ in Visual Studio 2022 after Git Pull [closed]

I'm facing an issue with a CUDA project that was previously compiling and running successfully. After pulling the latest code from GitLab, I'm now encountering a static_assert error from the Thrust ...
Tang SuKai's user avatar
2 votes
1 answer
32 views

CUDA: Nth set bit indexes using all threads in a warp in O(1) time

I have a 32-bit bit mask holding a set of valid items. From that bit mask I want to extract the indices of valid entries as a list. Let's say I obtained the bit mask using a ballot, and I want to know ...
Johan's user avatar
  • 75.5k
-4 votes
0 answers
30 views

How to build and use Nvidia cuCollections on Windows? [closed]

Is there a way to make this work on windows? https://github.com/NVIDIA/cuCollections I am unable to compile it and unable to use the .cuh files as a part of my project. The bottom line is that the lib ...
realPro's user avatar
  • 1,759
-2 votes
0 answers
24 views

Want to run a Local LLM on Nvidia Jetson AGX Orin over GPU

I am looking to run a local LLM (Large Language Model) on an Nvidia Jetson AGX Orin over the GPU CUDA Cores . Could anyone provide guidance or share resources on how to achieve this? Thank you in ...
Mausam Jain's user avatar

15 30 50 per page
1
2 3 4 5
968