Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cumprod() with quaternions crashes #225

Open
jordens opened this issue Jan 22, 2024 · 5 comments
Open

cumprod() with quaternions crashes #225

jordens opened this issue Jan 22, 2024 · 5 comments

Comments

@jordens
Copy link

jordens commented Jan 22, 2024

Describe the bug

Calling numpy.cumprod() on an array of quaternions segfaults.

To Reproduce

import numpy as np
import quaternion

np.cumprod(quaternion.as_quat_array(np.zeros((1000, 4), dtype=np.float64)))

Result:

Segmentation fault (core dumped)

Expected behavior
Runs without crashing

Environment (please complete the following information):

  • Ubuntu 23.10
  • Conda installation Python 3.9.18
  • Numpy 1.26.3
  • Quaternion 2023.0.2

Additional context

gdb backtrace
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
PyErr_Occurred () at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h:14
14      /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h: No such file or directory.
─── Assembly ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
~
~
~
~
 0x00000000004d9500  PyErr_Occurred+0  mov    0x26fa51(%rip),%rax        # 0x748f58 <_PyRuntime+568>
 0x00000000004d9507  PyErr_Occurred+7  mov    0x58(%rax),%rax
 0x00000000004d950b  PyErr_Occurred+11 ret
~
~
~
─── Breakpoints ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
─── Expressions ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
─── History ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
─── Memory ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
─── Registers ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      rax 0x0000000000000000       rbx 0x00007ffff495feb0       rcx 0x0000000001499b1f           rdx 0x000000000148a100           rsi 0x0000000000000020       rdi 0x0000000000007cc0      rbp 0x0000000000000000
      rsp 0x00007fffffffc6e8        r8 0x0000000000000020        r9 0x0000000000000020           r10 0x0000000001499b20           r11 0x00000000000003e7       r12 0x0000000000000000      r13 0x00007ffff3ba5b10
      r14 0x00007ffff7e2e4f0       r15 0x0000000000000000       rip 0x00000000004d9507        eflags [ IF RF ]                     cs 0x00000033                ss 0x0000002b               ds 0x00000000
       es 0x00000000                fs 0x00000000                gs 0x00000000               fs_base 0x00007ffff7e9a500       gs_base 0x0000000000000000
─── Source ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Cannot display "pycore_pyerrors.h"
─── Stack ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
[0] from 0x00000000004d9507 in PyErr_Occurred+7 at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h:14
[1] from 0x00007ffff6de8956 in generic_wrapped_legacy_loop
[2] from 0x00007ffff6df1773 in PyUFunc_GenericReduction
[3] from 0x00000000004ef64b in cfunction_vectorcall_FASTCALL_KEYWORDS+75 at /usr/local/src/conda/python-3.9.18/Objects/methodobject.c:446
[4] from 0x00007ffff6d8412f in PyArray_GenericAccumulateFunction
[5] from 0x00007ffff6d1669f in PyArray_CumProd
[6] from 0x00007ffff6d63746 in array_cumprod
[7] from 0x0000000000507387 in cfunction_call+55 at /usr/local/src/conda/python-3.9.18/Objects/methodobject.c:543
[8] from 0x0000000000505878 in _PyObject_Call+302 at /usr/local/src/conda/python-3.9.18/Objects/call.c:281
[9] from 0x0000000000505878 in PyObject_Call+344 at /usr/local/src/conda/python-3.9.18/Objects/call.c:293
[+]
─── Threads ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
[1] id 2469270 name python from 0x00000000004d9507 in PyErr_Occurred+7 at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h:14
─── Variables ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
loc tstate = 0x0: Cannot access memory at address 0x0
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
>>> bt
#0  PyErr_Occurred () at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_pyerrors.h:14
#1  0x00007ffff6de8956 in generic_wrapped_legacy_loop () from /home/rj/src/conda/envs/sci/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so
#2  0x00007ffff6df1773 in PyUFunc_GenericReduction () from /home/rj/src/conda/envs/sci/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so
#3  0x00000000004ef64b in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0x7ffff7537950, args=0x7ffff3ba6958, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.18/Objects/methodobject.c:446
#4  0x00007ffff6d8412f in PyArray_GenericAccumulateFunction () from /home/rj/src/conda/envs/sci/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so
#5  0x00007ffff6d1669f in PyArray_CumProd () from /home/rj/src/conda/envs/sci/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so
#6  0x00007ffff6d63746 in array_cumprod () from /home/rj/src/conda/envs/sci/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so
#7  0x0000000000507387 in cfunction_call (func=0x7ffff7537b30, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.9.18/Objects/methodobject.c:543
#8  0x0000000000505878 in _PyObject_Call (kwargs=<optimized out>, args=0x7ffff7e5b040, callable=0x7ffff7537b30, tstate=0x76fb70) at /usr/local/src/conda/python-3.9.18/Objects/call.c:281
#9  PyObject_Call (callable=0x7ffff7537b30, args=0x7ffff7e5b040, kwargs=<optimized out>) at /usr/local/src/conda/python-3.9.18/Objects/call.c:293
#10 0x00000000004ed746 in do_call_core (kwdict=0x7ffff75b84c0, callargs=0x7ffff7e5b040, func=0x7ffff7537b30, tstate=<optimized out>) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:5097
#11 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7fff74418dd0, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:3582
#12 0x00000000004e6b2a in _PyEval_EvalFrame (throwflag=0, f=0x7fff74418dd0, tstate=0x76fb70) at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_ceval.h:40
#13 _PyEval_EvalCode (tstate=tstate@entry=0x76fb70, _co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x7ffff6b87c58,
    kwargs=0x7fff744d17b0, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x7ffff6b87bb0, qualname=0x7ffff6b87bb0)
    at /usr/local/src/conda/python-3.9.18/Python/ceval.c:4329
#14 0x00000000004f7e54 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.9.18/Objects/call.c:396
#15 0x00000000004e8c61 in _PyObject_VectorcallTstate (kwnames=0x7ffff6b87c40, nargsf=<optimized out>, args=<optimized out>, callable=0x7ffff6bf7b80, tstate=0x76fb70)
    at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:118
#16 PyObject_Vectorcall (kwnames=0x7ffff6b87c40, nargsf=<optimized out>, args=<optimized out>, callable=0x7ffff6bf7b80) at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:127
#17 call_function (kwnames=0x7ffff6b87c40, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:5077
#18 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7fff744d1610, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:3537
#19 0x00000000004e6b2a in _PyEval_EvalFrame (throwflag=0, f=0x7fff744d1610, tstate=0x76fb70) at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_ceval.h:40
#20 _PyEval_EvalCode (tstate=tstate@entry=0x76fb70, _co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x7cb0d0,
    kwcount=<optimized out>, kwstep=1, defs=0x7ffff6b95318, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x7ffff74297f0, qualname=0x7ffff74297f0) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:4329
#21 0x00000000004f7e54 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.9.18/Objects/call.c:396
#22 0x00007ffff6d135e8 in dispatcher_vectorcall () from /home/rj/src/conda/envs/sci/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so
#23 0x00000000004ec764 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7cb0c8, callable=0x7ffff6baaab0, tstate=0x76fb70)
    at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:118
#24 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7cb0c8, callable=0x7ffff6baaab0) at /usr/local/src/conda/python-3.9.18/Include/cpython/abstract.h:127
#25 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x76fb70) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:5077
#26 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7caf50, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:3489
#27 0x00000000004e6b2a in _PyEval_EvalFrame (throwflag=0, f=0x7caf50, tstate=0x76fb70) at /usr/local/src/conda/python-3.9.18/Include/internal/pycore_ceval.h:40
#28 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=argcount@entry=0, kwnames=0x0, kwargs=0x0,
    kwcount=<optimized out>, kwstep=2, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:4329
#29 0x00000000004e67b7 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=argcount@entry=0, kwnames=<optimized out>, kwargs=0x0,
    kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:4361
#30 0x00000000004e6769 in PyEval_EvalCodeEx (_co=_co@entry=0x7ffff75bb9d0, globals=globals@entry=0x7ffff75b8280, locals=locals@entry=0x7ffff75b8280, args=args@entry=0x0, argcount=argcount@entry=0,
    kws=kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:4377
#31 0x000000000059466b in PyEval_EvalCode (co=co@entry=0x7ffff75bb9d0, globals=globals@entry=0x7ffff75b8280, locals=locals@entry=0x7ffff75b8280) at /usr/local/src/conda/python-3.9.18/Python/ceval.c:828
#32 0x00000000005c1dc7 in run_eval_code_obj (tstate=tstate@entry=0x76fb70, co=co@entry=0x7ffff75bb9d0, globals=globals@entry=0x7ffff75b8280, locals=locals@entry=0x7ffff75b8280)
    at /usr/local/src/conda/python-3.9.18/Python/pythonrun.c:1221
#33 0x00000000005bddd0 in run_mod (mod=mod@entry=0x7eeb78, filename=filename@entry=0x7ffff75b6a30, globals=globals@entry=0x7ffff75b8280, locals=locals@entry=0x7ffff75b8280, flags=flags@entry=0x7fffffffd9d8,
    arena=arena@entry=0x7ffff7e19950) at /usr/local/src/conda/python-3.9.18/Python/pythonrun.c:1242
#34 0x000000000045674e in pyrun_file (fp=fp@entry=0x76d420, filename=filename@entry=0x7ffff75b6a30, start=start@entry=257, globals=globals@entry=0x7ffff75b8280, locals=locals@entry=0x7ffff75b8280,
    closeit=closeit@entry=1, flags=0x7fffffffd9d8) at /usr/local/src/conda/python-3.9.18/Python/pythonrun.c:1140
#35 0x00000000005b7ab2 in pyrun_simple_file (flags=0x7fffffffd9d8, closeit=1, filename=0x7ffff75b6a30, fp=0x76d420) at /usr/local/src/conda/python-3.9.18/Python/pythonrun.c:450
#36 PyRun_SimpleFileExFlags (fp=0x76d420, filename=<optimized out>, closeit=1, flags=0x7fffffffd9d8) at /usr/local/src/conda/python-3.9.18/Python/pythonrun.c:483
#37 0x00000000005b502e in pymain_run_file (cf=0x7fffffffd9d8, config=0x76e340) at /usr/local/src/conda/python-3.9.18/Modules/main.c:379
#38 pymain_run_python (exitcode=0x7fffffffd9d0) at /usr/local/src/conda/python-3.9.18/Modules/main.c:604
#39 Py_RunMain () at /usr/local/src/conda/python-3.9.18/Modules/main.c:683
#40 0x0000000000588719 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/local/src/conda/python-3.9.18/Modules/main.c:1129
#41 0x00007ffff7c280d0 in __libc_start_call_main (main=main@entry=0x5886d0 <main>, argc=argc@entry=2, argv=argv@entry=0x7fffffffdc08) at ../sysdeps/nptl/libc_start_call_main.h:58
#42 0x00007ffff7c28189 in __libc_start_main_impl (main=0x5886d0 <main>, argc=2, argv=0x7fffffffdc08, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdbf8)
    at ../csu/libc-start.c:360
#43 0x00000000005885ce in _start ()
@moble
Copy link
Owner

moble commented Jan 22, 2024

Thanks for the bug report. This is pretty weird. I can consistently get the segfault with a variety of combinations of numpy and/or quaternion — but only with python 3.9 or later, whether I fix the numpy and quaternion versions across python versions, or update to the latest in each case. Also, it consistently works with 501 or fewer quaternions, and fails with 502 or more, on both macOS ARM, and Linux x86_64. Also worth noting that this behavior seems to happen with cumsum as well.

It may take some time for me to figure out this bug, and I imagine I'll need some insight from numpy devs.

Meanwhile, a couple remarks. First, it's not entirely obvious to me that cumprod is guaranteed to always respect the ordering of the input array — which won't matter for the usual number types, but will matter for quaternions. I can imagine optimizations that might move across the input array out of order, like vectorization (though specifically vectorization presumably won't happen automatically). Anyway, I'd just be wary of cumsum for that reason; you can roll your own just as easily, and it probably won't make a huge difference in your overall time. Second, I guess you're probably using cumsum to build up a rotation from small rotations. You might be better off reformulating your problem, e.g., as a differential equation.

@jordens
Copy link
Author

jordens commented Jan 22, 2024

Thanks for the analysis and the nice library.

I'd be surprised if this is an issue with cumsum/cumprod. They probably hold up their end of the contract around dtypes. My guess is that poking around with GDB (potentially with more debugging symbols and less optimization) or valgrind will point to the issue.

A manual cumsum implementation is orders of magnitude slower than the numpy one. That does matter in my case.

Yes, this is an integration. No, I'm not doing first order. It's an implementation of a high order geometric integrator based the Magnus expansion. That systematically beats the RK of integrate_angular_velocity().

@moble
Copy link
Owner

moble commented Jan 22, 2024

I'd be surprised if this is an issue with cumsum/cumprod. They probably hold up their end of the contract around dtypes.

To be clear, I'm not blaming numpy for being buggy, but there is no contract; there's just a minimally documented array finterface that's been going through huge changes these past few years.

A manual cumsum implementation is orders of magnitude slower than the numpy one.

Have you tried it? For small arrays, they're almost identical; for ~1,000 elements and up it's one order of magnitude. If that really matters, you might want to take a look at numba and quaternionic, which avoid many of the performance penalties inherent in numpy. (Of course, you'd have to roll your own there because np.cumsum doesn't work with quaternionic either — apparently because of some raveling that cumsum does somewhere under the covers.)

It's an implementation of a high order geometric integrator based the Magnus expansion.

Ooh! I'd be interested to see it when you're done.

@jordens
Copy link
Author

jordens commented Jan 23, 2024

To be clear, I'm not blaming numpy for being buggy, but there is no contract; there's just a minimally documented array finterface that's been going through huge changes these past few years.

Argh.

Have you tried it? For small arrays, they're almost identical; for ~1,000 elements and up it's one order of magnitude. If that really matters, you might want to take a look at numba and quaternionic, which avoid many of the performance penalties inherent in numpy. (Of course, you'd have to roll your own there because np.cumsum doesn't work with quaternionic either — apparently because of some raveling that cumsum does somewhere under the covers.)

True. The manual impl vs np.cumsum is 85 µs vs 8 µs for 500 elements on my machine. I can live with that for now.

It's an implementation of a high order geometric integrator based the Magnus expansion.

Ooh! I'd be interested to see it when you're done.

Will certainly publish once done.

Basically, the core part is this.

def magnus6(w):
    """exponential map for 6th order magnus expansion for integration of `w`

    `q'(t) = q(t)*w(t)`
    `w` (3, n): pure quaternion samples (given as vector so that `np.cross` works)
      of w at the 3 nodes `_magnus6_nodes` in the n sample intervals `i`
      i.e. supply `w[j, i] = h*w(h*(i + magnus6_nodes[j]))`
      
    returns the optimal pure quaternion g[:n] such that
    q[i + 1] = q[i]*exp(g[i])
    
    Note the right multiplication for naturally ordered arrays where increasing index
    is increasing time
    
    Implementation of:
    S. Blanes, F. Casas, and J. Ros. High order optimized geometric integrators
    for linear differential equations. BIT, 42:262–284, 2002.
    """
    a1 = w[1]
    a2 = np.sqrt(5/3)*(w[0] - w[2])
    a3 = 5/3*(w[0] - 2*w[1] + w[2])
    s1 = np.cross(a1, a2)
    s2 = 1/15*np.cross(a1, 2*a3 + s1)
    g = a1 + 1/6*a3 + 1/60*np.cross(-10*a1 - a3 + s1, a2 - s2)
    return g

# nodes
_magnus6_nodes = 1/2 + np.array([-1, 0, 1])*np.sqrt(3/5)/2

def vec2q(p):
    """Convert R3 vector to pure quaternion"""
    shape = p.shape[:-1]
    q = np.empty(shape + (4,), np.float64)
    q[..., 0] = 0.
    q[..., 1:] = p
    return q.view(np.quaternion).reshape(shape)

# integrating

w0 = 1.  # splitting
f0 = 1.  # drive frequency
g0 = .8  # drive strength

# standard qubit drive (angular velocity)
def w(t):
    return np.array([
        g0*np.cos(f0*t),
        g0*np.sin(f0*t),
        w0/2*np.ones_like(t),
    ]).T

# t: time steps
# h: step size
h = 1e-2
wp = np.sqrt((w0 + f0)**2 + 4*g0**2)
t1 = 1000*2*np.pi/wp
t = np.arange(0, t1, h)

np.prod(np.exp(vec2q(magnus6(h*w((t[:, None] + h*(_magnus6_nodes - 1)).ravel()).reshape((t.shape[0], 3, 3)).swapaxes(0, 1)))))

# vs

ode = integrate.ode(lambda t, y: (quaternion.quaternion(*y)*quaternion.quaternion(*w(t))).components)
ode.set_initial_value([1, 0, 0, 0.], 0.)
ode.set_integrator("dop853", ...)

On the toy problem of Blanes et al. it's about two orders of magnitude faster to get to machine precision than dop853. I don't need adaptive step size in my use case and I haven't done any optimization or tweaking.

@kohlerjl
Copy link

I've encountered this issue as well, running with Python 3.11.7 and numpy 1.26.3.

Have you been able to make any progress locating the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants