

1st iteration

C:                   too fast to measure
pure-python:         68 s
numba:               2 s
numba-cuda:          0.29 s    0.22 s
numba-cuda-smem:     0.47 s    0.23 s



100 iteration
a
C:                   5 s
pure-python:         impossible to wait that long
numba:               215 s
numba-cuda:          18 s      13 s
numba-cuda-smem:     17 s      13 s



Total 1000 iteration:

C:                   51 s
pure-python:         impossible to wait that long
numba:               impossible to wait that long
numba-cuda:          181 s     133 s
numba-cuda-smem:     170 s     137 s
