segfaults are really hard to debug as they tend to happen somewhere in c code, and often in an external module - e.g. sklearn, etc …
Looking at your recipe, there are a bunch of non-standard recipe modules (
localisations.ClusterTimeRange) which appear to have been added to the PYME source. This makes it very hard to say anything with certainty as we have no idea what is in those modules and/or if any of the standard modules have been altered. As a general rule it’s not good practice to hack custom modules into the core recipe modules within a source version of PYME. A much better approach is to use the plugin interface. That said, my debugging advice is as follows:
identify all modules which might use c code. In terms of the core modules, that would be
MergeClumps. Speculating that
DBSCANClustering2 is a modified copy of the core
DBSCANClustering, this would also call c code in sklearn.
modify the recipe by removing suspect modules until the segfault goes away. Once you have identified the culprit, you have a few options:
- If it involves a call to a library, e.g. sklearn, try upgrading or downgrading that library and/or looking at the release notes/issues to see if it is a known issue.
- if it is not a library call, and/or you want to debug it yourself, construct a minimal test-case that can reconstruct the error outside the recipe context (e.g. in an ipython session).
In general, you are likely to get segfaults in c extension code in two circumstances:
a) out of bounds array indexing
b) reference counting errors (e.g. missing INCREFs or extra DECREFs)
The latter can be quite tricky to debug as they can manifest after the actual erroneous code, at a non- deterministic time when the garbage collector gets around to dealing with the array with the incorrect reference count.
Other debugging strategies include:
- making debug builds of python, and all libraries and running through a debugger (generally prohibitive by the time you have python and all libraries such as numpy etc …)
- turning on core dumps and examining these. In the absence of debug symbols these will only tell you which shared library (.so) the segfault happened in, but that can still be a useful clue
An untested, but potentially interesting option would be to run your code through the
PYME.util.fProfile profiler (or add your own profiler hook). This will record all function calls and returns to a file on disk and looking at the
tail of the profile file might give a psuedo-traceback for the segfault (modulo file IO buffer flushing).