[Numpy-discussion] Numpy 1.6 schedule (was: Numpy 2.0 schedule)
Charles R Harris
Sun Mar 6 07:54:39 CST 2011
On Sat, Mar 5, 2011 at 11:11 PM, Mark Wiebe <firstname.lastname@example.org> wrote:
> On Sat, Mar 5, 2011 at 8:13 PM, Travis Oliphant <email@example.com>wrote:
>> On Mar 5, 2011, at 5:10 PM, Mark Wiebe wrote:
>> On Thu, Mar 3, 2011 at 10:54 PM, Ralf Gommers <
>> firstname.lastname@example.org> wrote:
>>> >>> I've had a look at the bug tracker, here's a list of tickets for
>>> >>> #1748 (blocker: regression for astype('str'))
>>> >>> #1619 (issue with dtypes, with patch)
>>> >>> #1749 (distutils, py 3.2)
>>> >>> #1601 (distutils, py 3.2)
>>> >>> #1622 (Solaris segfault, with patch)
>>> >>> #1713 (Solaris segfault)
>>> >>> #1631 (Solaris segfault)
>>> The distutils tickets are resolved.
>>> >>> Proposed schedule:
>>> >>> March 15: beta 1
>>> >>> March 28: rc 1
>>> >>> April 17: rc 2 (if needed)
>>> >>> April 24: final release
>>> Any comments on the schedule or tickets?
>> That all looks fine to me. There are a few things that I've changed in the
>> core that could stand some discussion before being finalized in 1.6, mostly
>> due to what was required to make things work without depending on the data
>> type enumeration order. The combination of the numpy and scipy tests were
>> pretty effective, but as Travis mentioned my changes are fairly invasive.
>> * When copying array to array, structured types now copy based on field
>> names instead of positions, effectively behaving like a 'dict' instead of a
>> 'labeled tuple'. This behaviour is more intuitive to me, and several fixed
>> bugs such as dtype comparison completely ignoring the structured type data
>> suggest that this changes an area of numpy that has been used in a more
>> limited fashion. It might be worthwhile to introduce a tuple-style flag in a
>> future version which causes data to be copied by position instead of by
>> name, as it is likely useful in some contexts.
>> This is a semantic change that does make me a tiny bit nervous.
>> Structured arrays are actually used quite a bit in the wild, and so this
>> could raise some errors. What I don't know is how often sub-parts of a
>> structured arrays get copied into other structured arrays with a different
>> order to the fields. From what I gather, Mark's changes would allow this
>> case and do an arguably useful thing. Previously, a copy was only allowed
>> if the structured array contained the same fields in the same order. It
>> seems like this is a relaxation of a rule and should not raise any errors
>> (unless extant code was relying on the previous errors for some reason).
> Another important factor is that previously the performance was poor,
> because each copy involved converting the array element to a Python tuple,
> then copying the tuple into the destination array. The new code directly
> copies the elements with no Python overhead. I haven't directly benchmarked
> this, but if someone wants to confirm this with some numbers that would be
>> * Array memory layouts are preserved in many cases. This means that if a,
>> b are Fortran ordered, a+b will be as well. It could be made more pervasive,
>> for example ndarray.copy defaults to C-order, and that could be changed to
>> 'K' to preserve the memory layout by default. Any comments about that?
>> I like this change quite a bit, but it has similar potential "expectation"
>> issues. I think the default should be changed to 'K' in NumPy 2.0, but
>> perhaps we should preserve C-order for now to avoid the subtle breakages
>> that might occur based on changed expectations. What are others thoughts?
> I suspect defaulting to 'C' might be desirable, but I initially set it to
> 'K' to see how it would work out. Defaulting it to 'C' unfortunately kills
> most of the performance benefits of the new code, so it might be nice to
> leave it as 'K' if no issues arise that are traced back to here.
I suppose this might cause a problem with lazy/quick c extensions that
expected elements in a certain order, so some breakage could occur. The
strict rule for backward compatibility would be no breakage, and if there
was no performance gain I would opt for that. But in this case there is a
real gain in breaking compatibility in a small way that is unlikely to be
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion