[Numpy-discussion] Proposed Roadmap Overview

Travis Oliphant travis@continuum...
Sat Feb 18 22:38:48 CST 2012

>> The decision will not be made until NumPy 2.0 work is farther along.     The most likely outcome is that Mark will develop something quite nice in C++ which he is already toying with, and we will either choose to use it in NumPy to build 2.0 on --- or not.   I'm interested in sponsoring Mark and working as closely as I can with he and Chuck to see what emerges.
> Would it be fair to say then, that you are expecting the discussion
> about C++ will mainly arise after the Mark has written the code?   I
> can see that it will be easier to specific at that point, but there
> must be a serious risk that it will be too late to seriously consider
> an alternative approach.

We will need to see examples of what Mark is talking about and clarify some of the compiler issues.   Certainly there is some risk that once code is written that it will be tempting to just use it.   Other approaches are certainly worth exploring in the mean-time, but C++ has some strong arguments for it. 

>>> Can you say a little more about your impression of the previous Cython
>>> refactor and why it was not successful?
>> Sure.  This list actually deserves a long writeup about that.   First, there wasn't a "Cython-refactor" of NumPy.   There was a Cython-refactor of SciPy.   I'm not sure of it's current status.   I'm still very supportive of that sort of thing.
> I think I missed that - is it on git somewhere?

I thought so, but I can't find it either.  We should ask Jason McCampbell of Enthought where the code is located.   Here are the distributed eggs:   http://www.enthought.com/repo/.iron/


>> Another factor.   the decision to make an extra layer of indirection makes small arrays that much slower.   I agree with Mark that in a core library we need to go the other way with small arrays being completely allocated in the data-structure itself (reducing the number of pointer de-references
> Does that imply there was a review of the refactor at some point to do
> things like benchmarking?   Are there any sources to get started
> trying to understand the nature of the Numpy refactor and where it ran
> into trouble?  Was it just the small arrays?

The main trouble was just the pace of development of NumPy and the divergence of the trees so that the re-factor branch did not keep up.  It's changes were quite extensive, and so were some of Mark's.    So, that created the difficulty in merging them together.   Mark's review of the re-factor was that small-array support was going to get worse.   I'm not sure if we ever did any bench-marking in that direction. 

>> So, Cython did not play a major role on the NumPy side of things.   It played a very nice role on the SciPy side of things.
> I guess Cython was attractive because the desire was to make a
> stand-alone library?   If that is still the goal, presumably that
> excludes Cython from serious consideration?  What are the primary
> advantages of making the standalone library?  Are there any serious
> disbenefits?

From my perspective having a standalone core NumPy is still a goal.   The primary advantages of having a NumPy library (call it NumLib for the sake of argument) are 

	1) Ability for projects like PyPy, IronPython, and Jython to use it more easily
	2) Ability for Ruby, Perl, Node.JS, and other new languages to use the code for their technical computing projects.
	3) increasing the number of users who can help make it more solid
	4) being able to build the user-base (and corresponding performance with eye-balls from Intel, NVidia, AMD, Microsoft, Google, etc. looking at the code). 

The disadvantages I can think of: 
	1) More users also means we might risk "lowest-commond-denominator" problems --- i.e. trying to be too much to too many may make it not useful for anyone. Also, more users means more people with opinions that might be difficult to re-concile. 
	2) The work of doing the re-write is not small:  probably at least 6 person-months
	3) Not being able to rely on Python objects (dictionaries, lists, and tuples are currently used in the code-base quite a bit --- though the re-factor did show some examples of how to remove this usage).
	4) Handling of "Object" arrays requires some re-design.

I'm sure there are other factors that could be added to both lists. 


> Thanks a lot for the reply,
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120218/4a674b76/attachment.html 

More information about the NumPy-Discussion mailing list