This is a companion discussion topic for the original entry at https://linuxconfig.org/how-to-install-cuda-on-ubuntu-20-04-focal-fossa-linux
Unfortunately it did not yet work for me you documented. I get the warning
dpkg: error processing archive /var/cache/apt/archives/libcublas10_10.2.2.89-1_amd64.deb (–unpack):
trying to overwrite ‘/usr/lib/x86_64-linux-gnu/libnvblas.so.10’, which is also in package libnvblas10:amd64 10.1.243-3
Errors were encountered while processing:
E: Sub-process /usr/bin/dpkg returned an error code (1)
Did I miss a step?
I had this happen as well. It may be related to an older version of CUDA being installed, but possibly not. Either way I was able to get past the error by running
$ sudo dpkg -i --force-overwrite /var/cache/apt/archives/libcublas10_10.2.2.89-1_amd64.deb $ sudo apt --fix-broken install
You will get some warnings after running the first command but the --fix-broken-install clears that right up.
sudo apt install nvidia-cuda-toolkit
Reading package lists… Done
Building dependency tree
Reading state information… Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
nvidia-cuda-toolkit : Depends: nvidia-cuda-dev (= 10.1.243-3) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
Hi, the installation worked well.
But when I run the sample hello.cu code, it get the output as:
Max error: 2.000000
I suppose it should be 0.000000. Is this fine? if so what is the reason for this difference?
I’m not sure what the difference is, but when I was getting Max error:2.000000 something was wrong with my nvidia-driver (maybe due to SecureBoot) but running sudo dpkg --configure -a and then running sudo apt install nvidia-cuda-toolkit . This fixed the driver issue and then I got the error of 0.000
These instructions install the latest cuda version which at this time is cuda 11. There is a serious version mismatch with the 440 driver and version 11. Before installing this when I ran nvidia-smi it indicated that is used cuda 10.2. After installing cuda nvidia-smi returns
Failed to initialize NVML: Driver/library version mismatch
I read somewhere that Cuda 11 was required when running the 5.3 kernel, so it’s not clear that I could downgrade to cuda 10.2. Maybe I need a version 440 driver written with cuda 11, if such a thing exists.
sudo apt remove cuda
does not remove the cuda installation so nvidia-smi still errors.
Nvidia has created a terrible mess with this.
Here’s what I did to get things to work.
sudo apt remove cuda I had to run
sudo apt autoremove to actually remove the cuda installation. After rebooting I checked the Additional Drivers tab in Software & Updates and noticed that the 440 driver was gone and my system was not using the xorg nouveau driver. Apparently the cuda installation overwrites your driver and removing cuda removes the overwritten driver leaving you running the xorg driver.
sudo apt install cuda and rebooted. The Additional Drivers tab in Software & Updates shows a manually installed driver in use. The
nvidia-smi shows a driver version 450 and cuda version 11.0. My other cuda binaries now run too.
It looks like the one thing that needs to be added to the installation instructions is to be sure you are running the xorg nouveau driver when you install cuda. If you are running an Nvidia driver it will be overwritten by the cuda install leaving you with driver and library version mismatches.