[Numpy-discussion] numpy.ndarrays as C++ arrays (wrapped with boost)

Alexander Schmolck a.schmolck@gmx....
Wed Sep 12 05:31:39 CDT 2007


Christopher Barker <Chris.Barker@noaa.gov> writes:

> Alexander Schmolck wrote:
>>  I just saw a closely related question posted one
>> week ago here (albeit mostly from a swig context).
>
> SWIG, Boost, whatever, the issues are similar. I guess what I'd love to 
> find is an array implementation that plays well with modern C++, and 
> also numpy.
>
>
>> The code currently mostly just uses plain C double arrays passed around by
>> pointers and I'd like to encapsulate this at least with something like
>> stl::vector (or maybe valarray), but I've been wondering whether it might
>> not make sense to use (slightly wrapped) numpy ndarrays --
>
> Well, you can go back and forth between pointers to data blacks and 
> numpy arrays pretty easily. Where you thinking of doing this at the 
> python-C++ interface, or where you looking for something you could use 
> throughout your code. 

The latter -- I'd ideally like something that I can more or less transparently
pass and return data between python and C++ and I want to use numpy arrays on
the python side. It'd also be nice to have reference semantics and reference
counting working fairly painlessly between both sides.

> If the later, then I expect you don't want to use a Python Object (unless
> you're using your code only from Python).

Yup; that would be somewhat perverse -- although as I said I expect that most
data I deal with will be pretty large, so overheads from creating python
objects aren't likely to matter that much.

> Our case is such: We want to have a nice array-like container that we 
> can use in C++ code that makes sense both for pure C++, and interacts 
> well with numpy arrays, as the code may be used in pure C++ app, but 
> also want to test it, script it, etc from Python.

Yes, that's exactly what I'm after. What's your current solution for this?

>> Also, ndarrays
>> provide fairly rich functionality even at the C-API-level
>
> Yes, the more I look into this, the more I'm impressed with numpy's design.
>
>
>> but there doesn't seem to be one obvious choice, as
>> there is for python. 
>
> Though there may be more than one good choice -- did you check out 
> boost::multiarray ? I didn't see that on your list.

No, I hadn't looked at that -- thanks. It looks like a raw, stripped down
version of a multidimensional array -- no . Since I'm mostly going to use
matrices (and vectors, here and there), maybe ublas, which does provide useful
numeric functionality is a better choice. I must say I find it fairly painful
to figure out how to do things I consider quite basic with the matrix/array
classes I come accross in C++ (I'm not exactly a C++ expert, but still); I
also can't seem to find a way to construct an ublas matrix or vector from
existing C-array data. 

>> Things that would eventually come in handy, although they're not needed yet,
>> are basic linear algebra and maybe two or three LAPACK-level functions (I can
>> think of cholesky decomposition and SVD right now)
>
> It would be nice to just have that (is MTL viable?)

No idea -- as far as I can tell the webpage is broken, so I can't look at the
examples (http://osl.iu.edu/research/mtl/examples.php3). It doesn't seem to
provide SVD out of th box either though -- and since I've already got a boost
dependency my first instinct would be to use something from there. What's the
advantage of MTL over ublas?

> but writing connection code to LAPACK for a few functions is not too bad.

>
>> I think I could get all these things (and more) from scipy
>> (and kin) with too much fuzz (although I haven't tried wavelet support yet)
>> and it seems like picking together the same functionality from different C++
>> libs would require considerably more work.
>
> True -- do-able, but you'd have to do it!
>
>> So my question is: might it make sense to use (a slightly wrapped)
>> numpy.ndarray,
>
> I guess what I'd like is a C++ array that was essentially an ndarray 
> without the pyobject stuff -- it could then be useful for C++, but also 
> easy to go back and forth between numpy and C++.

Indeed.

> Ideally, there'd be something that already fits that bill. I see a 
> couple design issues that are key:
>
> "View" semantics: numpy arrays have the idea of "views" of data built in 
> to them -- a given array can have it's own data block, or a be a view 
> onto another. This is quite powerful and flexible, and can save a lot a 
> data copying. The STL containers don't seem to have that concept at all. 

Yes. C++ copying semantics seem completely braindamaged to me.

> std::valarray has utility classes that are views of a valarray, but they 
> really only useful as temporaries - they are not full-blown valarrays.
>
> It looks like boost::multiarrays have a similar concept though
> """
> The MultiArray concept defines an interface to hierarchically nested 
> containers. It specifies operations for accessing elements, traversing 
> containers, and creating views of array data.
> """
>
> Another issue is dynamic typing. Templates provide a way to do generic 
> programming, but it's only generic at the code level. At compile time, 
> types are fixed, so you have a valarray<double>, for instance. 
> numpy arrays, on the other hand are of only one type - with the data type
> specified as meta-data essentially. I don't know what mismatch this may
> cause, but it's a pretty different way to structure things. (Side note: I
> used this feature once to re-type an array in place, using the same data
> block -- it was a nifty hack used to unpack an odd binary format). Would it
> make sense to use this approach in C++? I suspect not -- all your
> computational code would have to deal with it.

Any solution that just works fine for doubles as element type would perfectly
suffice for me, but yes, I'm sure compile time vs. run-time
element-type-specification causes impedance mismatch.

>
> There is also the re-sizing issue. It's pretty handy to be able to 
> re-size arrays -- but then the data pointer can change, making it pretty 
> impossible to share the data. Maybe it would be helpful to have a 
> pointer-to-a-pointer instead, so that the shared pointer wouldn't 
> change. However, there could be uglyness with the pointer changing while 
> some other view is working with it.
>
>> <http://thread.gmane.org/gmane.comp.python.c++/11559/focus=11560>
>
> That does look promising -- and it used boost::multiarrays

Yes (and also ublas vectors and matrices). Unfortunately, the author just
wrote in the c++-sig noted that he's unlikely to work on the code again -- but
it might still make a good starting point for someone looking into creating
nice-seamless integration between numpy and a decent C++ matrix/array type.
Unfortunately I haven't time for this; I might start out just using multiarray
or ublas matrices/vectors and use some primitive explicit hack to convert.

> The more I look at boost::multiarray, the better I like it (and the more 
> it looks like numpy) -- does anyone here have experience (good or bad) 
> with it

I'd be interested to hear about that too.

cheers,

'as


More information about the Numpy-discussion mailing list