Two ways:
You’re probably seeing something like this:
Traceback (most recent call last):
File "fail.py", line 32, in <module>
cuda.memcpy_dtoh(a_doubled, a_gpu)
RuntimeError: cuMemcpyDtoH failed: launch failed
terminate called after throwing an instance of 'std::runtime_error'
what(): cuMemFree failed: launch failed
zsh: abort python fail.py
What’s going on here? First of all, recall that launch failures in CUDA are asynchronous. So the actual traceback does not point to the failed kernel launch, it points to the next CUDA request after the failed kernel.
Next, as far as I can tell, a CUDA context becomes invalid after a launch failure, and all following CUDA calls in that context fail. Now, that includes cleanup (see the cuMemFree in the traceback?) that PyCUDA tries to perform automatically. Here, a bit of PyCUDA’s C++ heritage shows through. While performing cleanup, we are processing an exception (the launch failure reported by cuMemcpyDtoH). If another exception occurs during exception processing, C++ gives up and aborts the program with a message.
In principle, this could be handled better. If you’re willing to dedicate time to this, I’ll likely take your patch.
No. I would be more than happy to make them available, but that would be mostly either-or with the rest of PyCUDA, because of the following sentence in the CUDA programming guide:
[CUDA] is composed of two APIs:
- A low-level API called the CUDA driver API,
- A higher-level API called the CUDA runtime API that is implemented on top of the CUDA driver API.
These APIs are mutually exclusive: An application should use either one or the other.
PyCUDA is based on the driver API. CUBLAS uses the high-level API. Once can violate this rule without crashing immediately. But sketchy stuff does happen. Instead, for BLAS-1 operations, PyCUDA comes with a class called pycuda.gpuarray.GPUArray that essentially reimplements that part of CUBLAS.
If you dig into the history of PyCUDA, you’ll find that, at one point, I did have rudimentary CUBLAS wrappers. I removed them because of the above issue. If you would like to make CUBLAS wrappers, feel free to use these rudiments as a starting point. That said, Arno Pähler’s python-cuda has complete ctypes-based wrappers for CUBLAS. I don’t think they interact natively with numpy, though.
Of course you can. But don’t come whining if it breaks or goes away in a future release. Being open-source, neither of these two should be show-stoppers anyway, and we welcome fixes for any functionality, documented or not.
The rule is that if something is documented, we will in general make every effort to keep future version backward compatible with the present interface. If it isn’t, there’s no such guarantee.
Try adding:
CXXFLAGS = ['-DBOOST_PYTHON_NO_PY_SIGNATURES']
to your pycuda/siteconf.py or $HOME/.aksetup-defaults.py.
No. It does know which context each object belongs, and it does implicitly activate contexts for cleanup purposes. Since I’m not entirely sure how costly context activation is supposed to be, PyCUDA will not juggle contexts for you if you’re talking to an object from a context that’s not currently active. Here’s a rule of thumb: As long as you have control over invocation order, you have to manage contexts yourself. Since you mostly don’t have control over cleanup, PyCUDA manages contexts for you in this case. To make this transparent to you, the user, PyCUDA will automatically restore the previous context once it’s done cleaning up.
As of version 0.93, PyCUDA supports threading. There is an example of how this can be done in examples/multiple_threads.py in the PyCUDA distribution. When you use threading in PyCUDA, you should be aware of one peculiarity, though. Contexts in CUDA are a per-thread affair, and as such all contexts associated with a thread as well as GPU memory, arrays and other resources in that context will be automatically freed when the thread exits. PyCUDA will notice this and will not try to free the corresponding resource–it’s already gone after all.
There is another, less intended consequence, though: If Python’s garbage collector finds a PyCUDA object it wishes to dispose of, and PyCUDA, upon trying to free it, determines that the object was allocated outside of the current thread of execution, then that object is quietly leaked. This properly handles the above situation, but it mishandles a situation where:
- You use reference cycles in a GPU driver thread, necessitating the GC (over just regular reference counts).
- You require cleanup to be performed before thread exit.
- You rely on PyCUDA to perform this cleanup.
To entirely avoid the problem, do one of the following:
- Use multiprocessing instead of threading.
- Explicitly call free() on the objects you want cleaned up.
Note
Version 0.93 is currently in release candidate status. If you’d like to try a snapshot, you may access PyCUDA’s source control archive via the PyCUDA homepage.
Warning
Version 0.93 makes some changes to the PyCUDA programming interface. In all cases where documented features were changed, the old usage continues to work, but results in a warning. It is recommended that you update your code to remove the warning.
Note
If you’re upgrading from prior versions, you may delete the directory $HOME/.pycuda-compiler-cache to recover now-unused disk space.
Note
During this release time frame, I had the honor of giving a talk on PyCUDA for a class that a group around Nicolas Pinto was teaching at MIT. If you’re interested, the slides for it are available.
PyCUDA is licensed to you under the MIT/X Consortium license:
Copyright (c) 2009 Andreas Klöckner and Contributors.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.