Nvidia Warp: A Python framework for high performance GPU simulation and graphics

raytopia

I love how many python to native/gpu code projects there are now. It's nice to see a lot of competition in the space. An alternative to this one could be Taichi Lang [0] it can use your gpu through Vulkan so you don't have to own Nvidia hardware. Numba [1] is another alternative that's very popular. I'm still waiting on a Python project that compiles to pure C (unlike Cython [2] which is hard to port) so you can write homebrew games or other embedded applications.

[0] https://www.taichi-lang.org/

[1] http://numba.pydata.org/

[2] https://cython.readthedocs.io/en/stable/

eigenvalue

I really like how nvidia started doing more normal open source and not locking stuff behind a login to their website. It makes it so much easier now that you can just pip install all the cuda stuff for torch and other libraries without authenticating and downloading from websites and other nonsense. I guess they realized that it was dramatically reducing the engagement with their work. If it’s open source anyway then you should make it as accessible as possible.

w-m

I was playing around with taichi a little bit for a project. Taichi lives in a similar space, but has more than an NVIDIA backend. But its development has stalled, so I’m considering switching to warp now.

It’s quite frustrating that there’s seemingly no long-lived framework that allows me to write simple numba-like kernels and try them out in NVIDIA GPUs and Apple GPUs. Even with taichi, the Metal backend was definitely B-tier or lower: Not offering 64 bit ints, and randomly crashing/not compiling stuff.

Here’s hoping that we’ll solve the GPU programming space in the next couple years, but after ~15 years or so of waiting, I’m no longer holding my breath.

https://github.com/taichi-dev/taichi

VyseofArcadia

Aren't warps already architectural elements of nvidia graphics cards? This name collision is going to muddy search results.

marmaduke

Ive dredged though Julia, Numba, Jax, Futhark, looking a way to have good CPU performance in absence of GPU, and I'm not really happy with any of them. Especially given how many want you to lug LLVM along with.

A recent simulation code when pushed with gcc openmp-simd matched performance on a 13900K vs jax.jit on a rtx 4090. This case worked because the overall computation can be structured into pieces that fit in L1/L2 cache, but I had to spend a ton of time writing the C code, whereas jax.jit was too easy.

So I'd still like to see something like this but which really works for CPU as well.

TNWin

Slightly related

What's this community's take on Triton? https://openai.com/index/triton/

Are there better alternatives?

owenpalmer

> Warp is designed for spatial computing

What does this mean? I've mainly heard the term "spatial computing" in the context of the Vision Pro release. It doesn't seem like this was intended for AR/VR

dudus

Gotta keep digging that CUDA moat as hard and as fast as possible.

wallscratch

Can anyone comment on how efficient the Warp code is compared to manually written / fine-tuned CUDA?

jarmitage

> What's Taichi's take on NVIDIA's Warp?

> Overall the biggest distinction as of now is that Taichi operates at a slightly higher level. E.g. implict loop parallelization, high level spatial data structures, direct interops with torch, etc.

> We are trying to implement support for lower level programming styles to accommodate such things as native intrinsics, but we do think of those as more advanced optimization techniques, and at the same time we strive for easier entry and usage for beginners or people not so used to CUDA's programming model

– https://github.com/taichi-dev/taichi/discussions/8184

jkbbwr

I really wish python would stop being the go-to language for GPU orchestration or machine learning, having worked with it again recently for some proof of concepts its been a massive pain in the ass.

arvinsim

As someone who is not in the simulation and graphic space, what does this library bring that current libraries do not?

jorlow

Does this compete at all with openAI's triton (which is sort of a higher level cuda without the vendor lock in)?

bytesandbits

How is this different than Triton?

beebmam

Why Python? I really don't understand this choice of language other than accessibility.

BenoitP

This should be seen in light of the Great Differentiable Convergence™:

NERFs backpropagating pixels colors into the volume, but also semantic information from the image label, embedded from an LLM reading a multimedia document.

Or something like this. Anyway, wanna buy an NVIDIA GPU ;)?

nurettin

How is this different than taichi? Even the decorators look similar.

paulluuk

While this is really cool, I have to say..

> import warp as wp

Can we please not copy this convention over from numpy? In the example script, you use 17 characters to write this just to save 18 characters later on in the script. Just import the warp commands you use, or if you really want "import warp", but don't rename imported libraries, please.

jokoon

funny that now some softwares are hardware dependent

OpenCL seems like it's just obsolete

water-your-self

>GPU support requires a CUDA-capable NVIDIA GPU and driver (minimum GeForce GTX 9xx).

Very tactful from nvidia. I have a lovely AMD gpu and this library is worthless for it.