[Numpy-discussion] numpy.ndarrays as C++ arrays (wrapped with boost)

Christopher Barker Chris.Barker@noaa....
Wed Sep 12 11:01:38 CDT 2007

David Cournapeau wrote:
> Maybe I am naive, but I think a worthy goal would be a minimal C++ 
> library which wraps ndarray, without thinking about SWIG, boost and co 
> first.

That's exactly what I had in mind. If you have something that works well 
with ndarray -- then SWIG et al. can work with it. In principle, if you 
can do the transition nicely with hand-written wrappers, then you can do 
it with the automated tools too.

 > I don't know what other people are looking for, but for me, the
> interesting things with using C++ for ndarrays would be (in this order 
> of importance):
>     1 much less error prone memory management
less than what? std:valarray, etc. all help with this.

>     2 a bit more high level than plain C ndarrays (syntactic sugar 
> mostly: keyword args, overloaded methods and so on)
>     3 more high level things for views
I think views are key.
>     4 actual computation (linear algebra, SVD, etc...)

This is last on my list -- key is the core data type. I may be an 
unusual user, but what I expect is that in a given pile of code, we need 
one or two linear algebra routines, so I don't mind hand-wrapping 
LAPACK. Not that it wouldn't be nice to have it built in, but it's not a 
deal breaker. In any case, it should be separate: a core set of array 
objects, an a linear algebra (or whatever else) package built on top of it.

> 3 
> would be a pain to do right without e.g boost::multiarray;

Yes, it sure would be nice to build it on an existing code base, and 
boost::multiarray seems to fit.

> One huge advantage of being independant of external libraries would be 
> that the wrapper could then be included in numpy, and you could expect 
> it everywhere.

That would be nice, but may be too much work.

I"m really a C++ newbie, but it seems like the key here is the view 
semantics -- and perhaps the core solution is to have a "data block" 
class -- all it would have is a pointer to a block of data, and a 
reference counter. Then each array object would have a view of one of 
those -- each new array object that used a given instance would increase 
the ref count, and decrease it on deletion. The view would destroy 
itself when its refcount went to zero. (is this how numpy works now?)

Even if this makes sense, I have no idea how compatible it would be with 
numpy and/or python.

boost:multiarray does not seem to take this approach. Rather it has two 
classes: a multi_array: responsible for its own data block, and a 
multi_array_ref: which uses a view on another multiarray's data block. 
This is getting close, but it means that when you create a 
multi_array_ref, the original multi_array needs to stay around. I'd 
rather have much more flexible system,where you could create an array, 
create a view of that array, then destroy the original, then have the 
data block go away when you destroy the view. This could cause little 
complications if you started with a huge array, made a view into a tiny 
piece of it, then the whole data block would stick around -- but that 
would be up to the user to think about.

>> Would it make sense to use this approach in C++? I suspect not -- all 
>> your computational code would have to deal with it.
> Why not making one non template class, and having all the work done 
> inside the class instead ?
> class ndarray {
> private:
>      ndarray_imp<double> a;
> };

hm. that could work (as far as my limited C++ knowledge tells me),b ut 
it's still static at run time -- which may be OK -- and is C++-is anyway.

> If you have an array with several views on it, why not just enforcing 
> that the block data address cannot change as long as you have a view ? 

Maybe I"m missing what you're suggesting but this would lock in the 
original array once any views were on it -- that would greatly restrict 
flexibility. My suggestion above may help, but I think maybe I could 
just live without re-sizing.

> This should not be too complicated, right ? I don't use views that much 
> myself in numpy (other than implicitly, of course), so I may missing 
> something important here

Implicitly, we're all using them all the time -- which is why I think 
views are key.

Alexander Schmolck wrote:
> I'd ideally like something that I can more or less transparently
> pass and return data between python and C++ and I want to use numpy arrays on
> the python side. It'd also be nice to have reference semantics and reference
> counting working fairly painlessly between both sides.

Can the python gurus here comment on how possible that is?

> as I said I expect that most
> data I deal with will be pretty large, so overheads from creating python
> objects aren't likely to matter that much.

I"m not so much worried about the overhead as the dependency -- to use 
your words, it would feel perverse to by including python.h for a 
program that wasn't using python at all.

>> Our case is such: We want to have a nice array-like container that we 
>> can use in C++ code that makes sense both for pure C++, and interacts 
>> well with numpy arrays, as the code may be used in pure C++ app, but 
>> also want to test it, script it, etc from Python.
> Yes, that's exactly what I'm after. What's your current solution for this?

We're trying to build it now. The old code used Mac-OS Handles, those 
have been converted to std::valarrays, and we're working on wrapping 
those for with numpy arrays -- which, at the moment looks like copying 
the data back and forth -- fine for testing code, but maybe not OK for 
production work.

>> did you check out 
>> boost::multiarray ? I didn't see that on your list.

>  Since I'm mostly going to use
> matrices (and vectors, here and there), maybe ublas, which does provide useful
> numeric functionality is a better choice.

Well, one of the lesson's I learned from numpy is that I'm much happier 
with a general purpose n-d array than with a "matrix" and "vector". the 
latter can be built on top of the former if you want (like it is in 
numpy). How compatible are multiarray and ublas matrices? It kind of 
looks like boost isn't really a single project, so things that could be 
related may not be.

Hmmm -- if my concept above works, then all you need is for your n-d 
arrays and your matrices and vectors to all share the data "data block" 

 > I must say I find it fairly painful
> to figure out how to do things I consider quite basic with the matrix/array
> classes I come accross in C++ (I'm not exactly a C++ expert, but still);

neither am I -- but I think it's the nature of C++!

> I
> also can't seem to find a way to construct an ublas matrix or vector from
> existing C-array data. 

This functionality seems to be missing from many (moat) of these C++ 
containers. I suspect that it's the memory management issue. One of the 
points of these containers it to take care of memory management for you 
-- if you pass in a pointer to an existing data block -- it's not 
managing your memory any more.

>> It would be nice to just have that (is MTL viable?)
> No idea -- as far as I can tell the webpage is broken, so I can't look at the
> examples (http://osl.iu.edu/research/mtl/examples.php3).

Too many dead or sleeping projects....

> Yes. C++ copying semantics seem completely braindamaged to me.

It's the memory management issue again -- C++ doesn't have it built in 
-- so it's built in to each class instead.

>>> <http://thread.gmane.org/gmane.comp.python.c++/11559/focus=11560>
>> That does look promising -- and it used boost::multiarrays
> Yes (and also ublas vectors and matrices). Unfortunately, the author just
> wrote in the c++-sig noted that he's unlikely to work on the code again --


> it might still make a good starting point for someone

The advantage of open source!

Full Disclosure: I have neither the skills nor the time to actually 
implement any of these ideas. If no one else does, then I guess we're 
just blabbing -- not that there is anything wrong with blabbing!


More information about the Numpy-discussion mailing list