[Numpy-discussion] Graphs in numarray?
perry at stsci.edu
Wed Apr 17 12:10:32 CDT 2002
On Behalf Of Magnus Lie Hetland
> I'm looking at various ways of implementing graphs in Python (beyond
> simple dict-based stuff -- more performance is needed). kjbuckets
> looks like a nice alternative, as does the Boost Graph Library (not
> sure how easy it is to use with Boost.Python) but if numarray is to
> become a part of the standard library, it could be beneficial to use
> For dense graphs, it makes sense to use an adjacency matrix directly
> in numarray, I should think. (I haven't implemented many graph
> algorithms with ufuncs yet, but it seems doable...) For sparse graphs
> I guess some sort of sparse array implementation would be useful,
> although the archives indicate that creating such a thing isn't a core
> part of the numarray project.
First of all, it may make sense, but I should say a few words about
what scale sizes make sense. Currently numarray is implemented mostly
in Python (excepting the very low level, very simple C functions
that do the computational and indexing loops. This means it currently
has a pretty sizable overhead to set up an array operation (I'm
guessing an order of magnitude slower than Numeric). Once set up,
it generally is pretty fast. So it is pretty good for very large
data sets. Very lousy for very small ones. We haven't measured
efficiency lately (we are deferring optimization until we have all
the major functionality present first), but I wouldn't be at all
surprised to find that the set up time can be equal to the time
to actually process ~10,000-20,000 elements (i.e., the time spent per
element for a 10K array is roughly half that for much larger arrays.
So if you are working with much smaller arrays than 10K, you won't
see total execution time decrease much (it was already spending
half its time in setup, which doesn't change). We would like
to reduce this size threshhold in the future, either by optimizing the
Python code, or moving some of it into C. This optimization wouldn't
be for at least a couple more months; we have more urgent features
to deal with. I doubt that we will ever surpass the current Numeric
in its performance on small arrays (though who knows, perhaps we
can come close).
> What do you think -- is it reasonable to use numarray for graph
> algorithms? Perhaps an additional module with standard graph
> algorithms would be interesting? (I'm sure I could contribute some if
> there is any interest...)
Before I go further, I need to find out if the preceeding has made
you gasp in horror or if the timescale is too slow for you to
accept. (This particular issue also makes me wonder if numarray would
ever be a suitable substitute for the existing array module).
What size graphs are you most concerned about as far as speed goes?
> And -- is there any chance of getting sparse matrices in numarray?
Since talk is cheap, yes :-). But I doubt it would be in the "core"
and some thought would have to be given to how best to represent them.
In one sense, since the underlying storage is different than numarray
assumes for all its arrays, sparse arrays don't really share the
same underlying C machinery very well. While it certainly would be
possible to devise a class with the same interface as numarray objects,
the implementation may have to be completely different.
On the other hand, since numarray has much better support for index
arrays, i.e., an array of indices that may be used to index another
array of values, index array(s), value array pair may itself serve
as a storage model for sparse arrays. One still needs to implement
ufuncs and other functions (including simple things like indexing)
using different machinery. It is something that would be nice to have,
but I can't say when we would get around to it and don't want to
raise hopes about how quickly it would appear.
More information about the Numpy-discussion