[Numpy-discussion] Speeding up numarray -- questions on its design
perry at stsci.edu
Tue Jan 18 17:52:46 CST 2005
> Thanks for the comments that have been made. One of my reasons for
> commenting is to get an understanding of which design issues of Numarray
> are felt to be important and which can change. There seems to be this
> idea that small arrays are not worth supporting. I hope this is just
> due to time-constraints and not some fundamental idea that small arrays
> should never be considered with Numarray. Otherwise, there will
> always be two different array implementations developing at their
> own pace.
I wouldn't say that we are "hostile" to small arrays. We do only have
limited resources and can't do everything we would like. More on this
> I really want to gauge how willing developers of numarray are to
> changing things.
Without going into all the details below, I think I can address this
point. I suppose it all depends on what you mean by "how willing
developers of numarray are to changing things." If you mean are we
open to changes to numarray that speed up small arrays (and address
other noted shortcomings). Yes, certainly (so long as they don't
hurt the large array issues significantly). If it means we will drop
everything and address all these issues immediately ourselves. No,
we have other things to do regarding numarray that have higher
priority before we can address these things. I would have a very
hard time justifying the effort when there are other things needed
by STScI more. We would love it if others could address them sooner
though. More on related issues below.
> >> 1) Are there plans to move the nd array entirely into C?
> I do not think it would be difficult at this point to move it all to C
> and then make future changes there (you can always call pure Python code
> from C). With the structure in place and some experience behind you,
> now seems like as good a time as any. Especially, because now is a
> better time for me than any... I like what numarray is doing by not
> always defaulting to ints with the maybelong type. It is a good idea.
I hope that is true, but we've found doing moving thing to C a bigger
effort than we would like. I'd like to be proved wrong by someone who
can tackle sooner than we can.
> >> 2) Why is the ND array C-structure so large? Why are the dimensions
> >> and strides array static? Why can't the extra stuff that the fancy
> >> arrays need be another structure and the numarray C structure just
> >> extended with a pointer to the extra stuff?
> > When Todd moved NDArray into C, he tried to keep it simple. As
> > such, it
> > has no "moving parts." We think making dimensions and strides malloc'ed
> > rather than static would be fairly easy. Making the "extra stuff"
> > variable is something we can look at.
> But allocating dimensions and strides when needed is not difficult and
> it reduces the overhead of the ndarray object. Currently, that overhead
> seems extreme. I could be over-reacting here, but it just seems like it
> would have made more sense to expand the array object as little as
> possible to handle the complexity that you were searching for. It seems
> like more modifications were needed in the ufunc then in the arrayobject.
I'm not convinced that this is a big issue, but we have no objection to
someone making this change. But it falls well below small array
performance in priority for us.
> > The bottom line is that adding the variability adds complexity and we're
> > not sure we understand the storage economics of why we would doing it.
> > Numarray was designed, first and foremost, for large arrays.
> Small arrays are never going to disappear (Fernando Perez has an
> excellent example) and there are others. A design where a single
> pointer not being NULL is all that is needed to distinguish "simple"
> Numeric-like arrays from "fancy" numarray-like arrays seems like a great
> way to make sure that
I won't quarrel with that (but I'm not sure what you are suggesting
in the bigger picture).
> On another fundamental note, numarray is being sold as a replacement for
> Numeric. But, then, on closer inspection many things that Numeric does
> well, numarray is ignoring or not doing very well. I think this
> presents a certain amount of false advertising to new users, who don't
> understand the history. Most of them would probably never need the
> fanciness that numarray provides and would be quite satisfied with
> Numeric. They just want to know what others are using. I think it is
> a disservice to call numarray a replacement for Numeric until it
> actually is. It should currently be called an "alternative
> implementation" focused on large arrays. This (unintentional) slight of
> hand that has been occurring over the past year has been my biggest
> complaint with numarray. Making numarray a replacement for Numeric
> means that it has to support small arrays, object arrays, and ufuncs at
> least as well as but preferably better than Numeric. It should also be
> faster than Numeric whenever possible, because Numeric has lots of
> potential optimizations that have never been applied. If numarray does
> not do these things, then in my mind it cannot be a replacement for
> Numeric and should stop being called that on the numpy web site.
It distresses me to be accused of false advertising. We were pretty
up front at the beginning of the process of writing numarray that the
approach we would be taking would likely mean slower small array
performance. There were those (like you and Eric that expressed
concern about that), but it wasn't at all clear what the consensus
was regarding how much it could change and be acceptable. (I recall
at one point when IDL was ported from Fortran to C which resulted
in a factor of 2 overall slowdown in speed. People didn't accuse
RSI of providing something that wasn't a replacement for IDL.)
The fact was that at the time we started, several thought that
backward compatibility wasn't that important. We didn't even try
at the beginning to make the C-API the same. At the start, there
was no claim that numarray would be an exact replacement for
Numeric. (And I didn't hear huge objections at the time on the
point and some that actually encouraged a break with how Numeric
did things.) Much of the attempts to provide backward compatiblity
have come well after the first implementations. We have strove to
provide the full functionality of what Numeric had as we went to
version 1.0. Sure, there are some holes for object arrays.
So the issue of whether numarray is a replacement or not seems
to be arguing over what the intent of the project was. Paul Dubois
wrote the numpy page that make that reference, and sure, I didn't
object to it (But why didn't you at the time? It's been there a
long time, and the goals and direction of numarray have been
quite visible for a long time. This wasn't some dark, secret
project. Many of the things you are complaining about have been
true for some time.) If people want to call numarray an alternative
implementation, I'm fine with that. It was a replacement in our
case. If we didn't develop it, we likely wouldn't be using Python
in the full sense that we are now. Numeric wasn't really an option.
At the time, many supported the idea of a reimplementation so it
seemed like a good opportunity to add what we needed and do that.
Obviously, we misread the importance of small array performance
for a significant part of the community. (But I keep saying,
if small array peformance is really that important, it would seem
to me that much bigger wins are available as Fernando mentioned)
It's been clear for a better part of a year that it would be a
long time before there was any sort of unification between the
two. That distressed me as I'm sure it did you. So some sort of
useful sharing of libraries and packages seemed like the obvious
way to go. In more specialized areas, there would be some
divergence (e.g., we have dependencies on record arrays that
we just can't provide in Numeric). I can no longer justify
sinking many more months of work into numarray for issues of
no value to STScI (other than the hope that it would convince
others to switch, which isn't clear at all that it would). We
need to move towards providing a lot of the tools that are available
for Numeric. I can justify that work.
The current situation is far from ideal (Paul called it "insane"
at scipy if you prefer more colorful language). What we have are
two camps that cannot afford to give up the capabilities that are
unique to each version. But with most of the C-API compatable, and
a way of coding most libraries (except for Ufuncs) to be compatible
with both, we certainly can improve the situation.
If you can help remove the biggest obstacle, small array
performance, so that we could unify the two I would be
thrilled, but most of the effort can't come from us, at
least not in the near term (next year). We can help
at some level.
> I never really understood the "code is too complicated" argument
You lost me on this one. You mean the complaint that it was too
complicated in Numeric way back?
> anyway. I was just wondering if there is some support for reducing the
> number of source code files, or reorganizing them a bit.
Yes, I'd say that this has relatively high priority. It would be nice to
have feedback and advice on how to do this best.
> >> 4) Object arrays must be supported. This was a bad oversight and an
> >> important feature of Numeric arrays.
> > The current implementation does support them (though in a different
> > way, and generally not as efficiently, though Todd is more up on the
> > details here). What aspect of object arrays are you finding lacking?
> > C-api?
> I did not see such support when I looked at it, but given the previous
> comment, I could easily have missed where that support is provided. I'm
> mainly following up on Konrad's comment that his Automatic
> differentiation does not work with Numarray because of the missing
> support for object arrays. There are other applications for object
> arrays as well. Most of the support needs to come from the ufunc side.
I think Robert Kern pointed to the issue in a subsequent message.
> >> Again, thanks to the work that has been done. I'm really interested to
> >> see if some of these modifications can be done as in my mind it will
> >> help the process of unifying the two camps.
> > I'm glad to see that you are taking a look at it and welcome the
> > comments and
> > any offers of help in improving speed.
> I would be interested in helping if there is support for really making
> numarray a real replacement for Numeric, by addressing the concerns that
> I've outlined. As stated at the beginning, I'm really just looking
> for how receptive numarray developers would be to the kinds of changes
> I'm talking about: (1) reducing the size of the array structure, (2)
> moving the ndarray entirely into C, (3) improving support for object
> arrays, (4) improving ufunc API support.
I'm not exactly sure what you mean by 4). If you mean having a compatible
api to numeric, that seem like a lot of work since the way ufuncs work
in numarray is quite different. But you may mean something else.
More information about the Numpy-discussion