numba numpy计算加速器官方教程 GPU CUDA配置

官网：http://numba.pydata.org/官方教程：http://numba.pydata.org/numba-doc/latest/user/5minguide.html因为我3.7版本的python（也有可能是其他因素影响）找不到numba.autojit加速了，所以想到官网看看到底发生了什么示例以下代码加速理想from numba import jitimport numpy a

Dontla

1701人浏览 · 2020-06-29 10:14:52

Dontla · 2020-06-29 10:14:52 发布

官网：http://numba.pydata.org/

官方教程：http://numba.pydata.org/numba-doc/latest/user/5minguide.html

因为我3.7版本的python（也有可能是其他因素影响）找不到numba.autojit加速了，所以想到官网看看到底发生了什么

示例

以下代码加速理想

from numba import jit
import numpy as np

x = np.arange(100).reshape(10, 10)

@jit(nopython=True) # Set "nopython" mode for best performance, equivalent to @njit
def go_fast(a): # Function is compiled to machine code when called the first time
    trace = 0.0
    for i in range(a.shape[0]):   # Numba likes loops
        trace += np.tanh(a[i, i]) # Numba likes NumPy functions
    return a + trace              # Numba likes NumPy broadcasting

print(go_fast(x))

以下代码加速不理想（函数不能享受numba加速）

from numba import jit
import pandas as pd

x = {'a': [1, 2, 3], 'b': [20, 30, 40]}

@jit
def use_pandas(a): # Function will not benefit from Numba jit
    df = pd.DataFrame.from_dict(a) # Numba doesn't know about pd.DataFrame
    df += 1                        # Numba doesn't understand what this is
    return df.cov()                # or this!

print(use_pandas(x))

需要注意的是，numba使用函数装饰器来加速函数，第一次执行函数时，会将函数编译成机器码，需要耗费一定时间，以后每次调用函数，就是直接用机器码执行，从而获得加速

所以，一般比较常用的是@njit或@jit(nopython=True)（一样的）

其他功能

Numba has quite a few decorators, we’ve seen @jit, but there’s also:


@njit - this is an alias for @jit(nopython=True) as it is so commonly used!

@vectorize - produces NumPy ufunc s (with all the ufunc methods supported). Docs are here.

@guvectorize - produces NumPy generalized ufunc s. Docs are here.

@stencil - declare a function as a kernel for a stencil like operation. Docs are here.

@jitclass - for jit aware classes. Docs are here.

@cfunc - declare a function for use as a native call back (to be called from C/C++ etc). Docs are here.

@overload - register your own implementation of a function for use in nopython mode, e.g. @overload(scipy.special.j0). Docs are here.

Extra options available in some decorators:

parallel = True - enable the automatic parallelization of the function.

fastmath = True - enable fast-math behaviour for the function.

ctypes/cffi/cython interoperability:

cffi - The calling of CFFI functions is supported in nopython mode.

ctypes - The calling of ctypes wrapped functions is supported in nopython mode. .

Cython exported functions are callable.

GPU targets:

Numba can target Nvidia CUDA and (experimentally) AMD ROC GPUs. You can write a kernel in pure Python and have Numba handle the computation and data movement (or do this explicitly). Click for Numba documentation on CUDA or ROC.

http://numba.pydata.org/numba-doc/latest/cuda/index.html#numba-for-cuda-gpus

有点多

不过没有看到@autojit，莫非是取消了？？

Triton中文社区

欢迎来到由智源人工智能研究院发起的Triton中文社区，这里是一个汇聚了AI开发者、数据科学家、机器学习爱好者以及业界专家的活力平台。我们致力于成为业内领先的Triton技术交流与应用分享的殿堂，为推动人工智能技术的普及与深化应用贡献力量。

更多推荐

早鸟票倒计时｜TVM/Triton/TileLang同台炫技，Meet AI Compiler邀你一同解锁AI编译器的无限可能！

Triton中文社区

triton inference server的backend插件机制代码流程梳理、模型加载代码梳理

triton inference server代码流程梳理、模型加载代码梳理

Triton中文社区

obs-backgroundremoval项目CUDA依赖问题的分析与解决方案

obs-backgroundremoval项目CUDA依赖问题的分析与解决方案obs-backgroundremoval是OBS Studio的一个插件，主要用于在肖像图像和视频中替换背景以及增强低光场景。在将该项目打包为Fedora RPM时，发现了一个关于CUDA依赖的重要技术问题。问题根源分析项目默认构建会捆绑onnxruntime库，这些库在默认构建配置下强制包含了CUDA运行时路径...