[SciPy-Dev] Enhancements to scipy.spatial.cKDTree

Sturla Molden sturla@molden...
Mon Jul 16 07:34:34 CDT 2012


Den 16.07.2012 02:00, skrev Patrick Varilly:

> Wow, I must confess I wasn't quite expecting the Spanish Inquisition :-D

Sorry, don't take it personally. It is important that code that goes 
into SciPy can be maintained for several years ahead, and preferably not 
just by you.


>
> The original reason I got into this project was to bring a fast 
> query_ball_point to cKDTree (which I really needed, the KDTree 
> implementation is so slow as to be unusable for analysing a long 
> molecular dynamics trajectory).  When I started, I thought I was only 
> porting over one method of KDTree, so I strove to keep the code 
> changes to a minimum (hence not abstracting out the incremental 
> updates to the minimum/maximum distances between the query point and a 
> node's bounding hyperrectangle) and to keep a very similar style to 
> the existing code (hence the int's, double's, malloc/free, etc.).  I 
> only went about porting over the rest of the methods essentially to 
> submit a complete enhancement to SciPy instead of just the juicy bit I 
> cared about (after parsing Anne's code, I figured 90% of the effort 
> was done already).  I also agree that Anne's code is really 
> transparent, and should stay no matter what, but speed does matter a 
> fair bit to users of kd-trees.  And in my defense with respect to the 
> code being "full of bugs", I tried hard to make all the unit tests 
> pass on my machine before submitting and was completely unaware of the 
> nightmares of 32 vs 64-bit ints in Windows 64.

Yes, Linux and Apple cheated and made a C long 64 bits. Whereas AMD 
designed AMD64 (aka x64) to use a 64-bit pointer with a 32-bit offset. 
Windows followed that pattern and uses a 32-bit long. Whereas on IA64, 
Intel has specified a 64-bit offset, so Windows uses a 64-bit long 
there. It is actually Windows that does this correctly (pedantially 
speaking), but the 64-bit long on Apple and Linux obviously makes life 
easier. On the good side, this makes Windows 64 an excellent pantform 
for tracing down integer size bugs :)


>
> That said, now that the initial effort is complete, it's obvious that 
> a clear pattern recurs in the kd-tree walking algorithms: keeping 
> incremental track of the minimum/maximum distances between two 
> hyperrectangles as they're successively split (alternatively, a 
> hyperrectangle and a point).  If that primitive is abstracted, the 
> code becomes a lot more readable again.  I've had a first go at what 
> such an abstraction might look like, and updated the code to 
> query_ball_tree.  Does this look like a way forward for enhancing the 
> code's maintainability?

I'll look at it, but this is a very important issue.

I think this rewrite cannot be used in SciPy unless it is so 
readable/understandable that several SciPy developers can understand and 
change the code. At least this is my main objection with it right now.

Also observe that Cython has closures now, and you can nest Python 
functions in Cython (and declaring them cpdef makes the call almost as 
efficient as a cdef).


>
> Here are another couple of issues / questions:
>
> * Is there any difference between 
> <np.float64_t*>np.PyArray_DATA(my_array) and &my_array[0] ?  If not, 
> the second choice is clearly preferable.
>

I checked Dag Sverre's Cython code again.

<np.float64_t*>np.PyArray_DATA(my_array) and &my_array[0]  are equivalent.

my_Array.data is wrong.

Go ahead and change my code.

&my_array[0] will also make it easier to update to memoryviews when 
Python 2.4 support is dropped.



> * I get the issues with int vs np.int <http://np.int> vs np.npy_intp, 
> but is there really any SciPy-supported platform in which double 
> doesn't mean np.float64_t?

Do not use C int, ever. (Except where we know it does not overflow, such 
as the void functions I changed to propagate errors.)

np.int should be a C long (or C int?), for which you don't know the 
size, and it might not correspond to the indexer we want (npy_intp or 
Py_ssize_t). That is why you should either use np.int32 or np.int64, and 
check at runtime which is applicable.

Also do not use dtype='i' or anything of that sort.



>
> * As for memory management, it's probably easier to let NumPy & Python 
> handle most things, i.e. make Rectangle store two NumPy arrays.  This 
> needs a bit of surgery to do right, but I've tried to show what I mean 
> in the query_ball_tree version I'm submitting now.
>

I thought the same thing. You can often use NumPy instead of malloc in 
your code.  Then you could also remove the try/finally blocks I added to 
your code (but leave those I added to Anne's cKDTree code).



> * I fixed a small glitch with calls to realloc(): Sturla's changes for 
> handling memory errors ended up setting the first argument to realloc 
> to NULL, which effectively wipes out the contents of the data that 
> were there before the resize.

Thanks.



> * There seems to be something funny about raw_mins and raw_maxes: they 
> are member variables of cKDTree, but the data they point to seems to 
> me like it should go out of scope at the end of cKDTree.__init__.  Am 
> I misunderstanding something here?

Yes you do :) See below:


cdef class cKDTree:

    cdef readonly object maxes

    def __init__(cKDTree self):

        cdef np.ndarray[np.float64_t, ndim=1] inner_maxes

        self.maxes = np.ascontiguousarray(np.amax(self.data,axis=0), 
dtype=np.float64)

        inner_maxes = self.maxes

        self.raw_maxes = <np.float64_t*>np.PyArray_DATA(inner_maxes)


"maxes" stores the contiguous array as an attribute of the cKDTree 
extension object. It will be alive until the object of type cKDTree is 
reclaimed.

"inner_maxes" unboxes the array struct to a collection of local 
variables for fast indexing access inside cKDTree. It also tells Cython 
that the type is np.ndarray (which extends NumPy's PyArrayObject).

Then finally "raw_maxes" gets the C pointer to the contiguous buffer 
stored in "maxes", by using the unboxed "inner_maxes".


We could also have done this (at least for Python 2.x):

self.raw_maxes = <np.float64_t*> 
cpython.PyLong_AsVoidPtr(self.maxes.__array_interface__['data'][0])

or this

self.raw_maxes = &inner_maxes[0]

But the current "inner_maxes.data" should not be used.


Another issue is this:

Where ever you use list.append and set.add to save a tuple of indexes 
(i,j), do this:


if sizeof(long) < sizeof(np.npy_intp):
   # Win 64
   if self.raw_indices[i] < self.raw_indices[j]:
      results.add((int(self.raw_indices[i]), int(self.raw_indices[j])))
   else:
      results.add((int(self.raw_indices[j]), int(self.raw_indices[i])))

else:
   # Other platforms
   if self.raw_indices[i] < self.raw_indices[j]:
      results.add((self.raw_indices[i], self.raw_indices[j]))
   else:
      results.add((self.raw_indices[j], self.raw_indices[i]))


sizeof(long) < sizeof(np.npy_intp) is a compile time constant so it will 
be optimized by the compiler. Then you can take out the stuff I added to 
fix the set for Win64 before the return of query_pairs.

You might want to put this into two inline functions (for list.append 
and set.add) to avoid repeating yourself.


cdef inline int set_add_pair(set results, np.npy_intp i, np.npy_intp j) 
except -1:
if sizeof(long) < sizeof(np.npy_intp):
         # Win 64
         if i < j:
             results.add((int(j), int(j)))
         else:
             results.add((int(j), int(i)))
     else:
         # Other platforms
         if i < j:
             results.add((i, j))
         else:
             results.add((j, i))
     return 0


The difference is that on Win 64 we should use Python long istead of 
Python int if a C long (i.e. Pyhton int) overflows, which the function 
int() will ensure. Cython automatically converts np.npy_intp to Python 
long on Win 64, which we want to convert to a Python int if it is 
possible. On other platforms we don't want this extra overhead.

I think Cython will optimize set.add, otherwise you should put it in a 
local variable istead of getting a dict lookup on results.add (I know 
list.append is optimized).

The int return value for set_add_pair is used to propagate Python 
exceptions, which can result from results.add() or int().



Sturla


















-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20120716/b5d6bcf9/attachment-0001.html 


More information about the SciPy-Dev mailing list