There's a --device flag you can pass. I've been trying to get `--device cuda` to...

magicalhippo · on Sept 22, 2022

> I've been trying to get `--device cuda` to work on my Windows machine and it's saying that torch wasn't compiled with CUDA.

I struggled with the same. Here's what worked for me:

Use pip to uninstall pytorch first, should be "pip uninstall torch" or similar.

Find the CUDA version you got installed[1]. Go to PyTorch get started page[2] and use their guide/wizard to generate the pip string, and run that. I had to change pip3 to pip FWIW, and with Cuda 11.6 installed I ended up with "pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116".

After that I could use --device cuda, and the difference was immense. On my 2080Ti it went from roughly an hour for a minute with large model, to 10-20 seconds.

[1]: https://stackoverflow.com/a/55717476

[2]: https://pytorch.org/get-started/locally/

Smaug123 · on Sept 21, 2022

Yep, same for me, on M1 after enabling MPS (with `model.to("mps")`) it just either SIGSEGV or SIGABRTs every time with that line. The extremely unclean nature of the abort is making it hard to debug :(

dceddia · on Sept 21, 2022

I noticed the size seems to correspond to the model. With a large model, the error is tensor<1x1280x3000xf16>. With tiny, it's tensor<1x384x3000xf16>, and with medium it's tensor<1x1024x3000xf16>. It also seems like a bad thing that those are f16's but the "expected" data is f32.

Smaug123 · on Sept 21, 2022

I'm giving up for the night, but https://github.com/Smaug123/whisper/pull/1/files at least contains the setup instructions that may help others get to this point. Got it working on the GPU, but it's… much much slower than the CPU? Presumably due to the 'aten::repeat_interleave.self_int' CPU fallback.

Also hitting a nice little PyTorch bug:

> File "/Users/patrick/Documents/GitHub/whisper/whisper/decoding.py", line 388, in apply logits[:, self.tokenizer.encode(" ") + [self.tokenizer.eot]] = -np.inf

> RuntimeError: dst_.nbytes() >= dst_byte_offset INTERNAL ASSERT FAILED at "/Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Copy.mm":200, please report a bug to PyTorch.