[Numpy-discussion] Created NumPy 1.7.x branch
Charles R Harris
Mon Jun 25 11:20:23 CDT 2012
On Sun, Jun 24, 2012 at 10:09 PM, Travis Oliphant <email@example.com>wrote:
> What has been done in the past is that an intent to fork is announced some
> two weeks in advance so that people can weigh in on what needs to be done
> before the fork. The immediate fork was a bit hasty. Likewise, when I
> suggested going to the github issue tracking, I opened a discussion on
> needed tags, but voila, there it was with an incomplete set and no
> discussion. That to seemed hasty.
> My style is just different. I like to do things and then if discussion
> requires an alteration, we alter. It is just a different style. The
> labels can be altered, they are not set in stone. I prefer to have
> something to talk about and a starting point to alter from --- especially
> on potential bike-shedding discussions. There are several people that can
> make changes to the labels. If we have difficulty agreeing then we can
> go from that point.
>>> There is time before the first Release candidate to make changes on the
>>> 1.7.x branch. If you want to make the changes on master, and just
>>> indicate the Pull requests, Ondrej can make sure they are added to the
>>> 1.7.x. branch by Monday. We can also delay the first Release Candidate
>>> by a few days to next Wednesday and then bump everything 3 days if that
>>> will help. There will be a follow-on 1.8 release before the end of the
>>> year --- so there is time to make changes for that release as well. The
>>> next release will not take a year to get out, so we shouldn't feel
>>> pressured to get *everything* in this release.
>> What are we going to do for 1.8?
>> Let's get 1.7 out the door first.
> Mark proposed a schedule for the next several releases, I'd like to know
> if we are going to follow it.
> We should discuss it again. I don't recall the specifics and I believe it
> was just a proposal. I do not recall much feedback on it.
>> Yes, the functions will give warnings otherwise.
>> I think this needs to be revisited. I don't think these changes are
>> necessary for *every* use of macros. It can cause a lot of effort for
>> people downstream without concrete benefit.
> The idea is to slowly move towards hiding the innards of the array type.
> This has been under discussion since 1.3 came out. It is certainly the case
> that not all macros need to go away.
> I know it's been under discussion, but it looks like a lot of changes were
> made just last year (and I am just starting to understand the implications
> of all those changes). I think there are many NumPy users that will be
> in the same position over the coming years. This is a bit more than
> just hiding the array innards. The array innards have been "hidden" by
> using the macros since NumPy 1.0. There was a specific intent to create
> macros for all array access and encourage use of those macros --- precisely
> so that the array object could change. The requirement of ABI
> compatibility was not pre-envisioned in NumPy 1.0
> Neither was NumPy 1.0 trying to provide type-safety in all cases. I
> don't recall a discussion on the value of having macros that can be
> interpreted at least as both PyObject * and PyArrayObject *. Perhaps
> this is possible, and I just need to be educated. But, my opinion is that
> it's not always useful to require type-casting especially between those
>>> That's not as nice to type.
>> So? The point is to have correctness, not ease of typing.
>> I'm not sure if a pun was intended there or not. C is not a safe and
>> fully-typed system. That is one of its weaknesses according to many.
>> But, I would submit that not being forced to give everything a "type" (and
>> recognizing the tradeoffs that implies) is also one reason it gets used.
> C was famous for bugs due to the lack of function prototypes. This was
> fixed with C99 and the stricter typing was a great help.
> Bugs are not "due to lack of function prototypes". Bugs are due to
> mistakes that programmers make (and I know all about mistakes programmers
> make). Function prototypes can help detect some kinds of mistakes which is
> helpful. But, this doesn't help the question of how to transition a
> weakly-typed program or whether or not that is even a useful exercise.
Oh, come on. Writing correct C code used to be a guru exercise. A friend of
mine, a Putnam fellow, was the Weitek guru for drivers. To say bugs are
programmer mistakes is information free, the question is how to minimize
>>> Is that assuming that PyArray_NDIM will become a function and need a
>>> specific object type for its argument (and everything else cast....).
>>> That's one clear disadvantage of inline functions versus macros in my mind:
>>> no automatic polymorphism.
>> That's a disadvantage of Python. The virtue of inline functions is
>> precisely type checking.
>> Right, but we need to be more conscientious about this. Not every use
>> of Macros should be replaced by inline function calls and the requisite
>> *forced* type-checking. type-chekcing is not *universally* a virtue ---
>> if it were, nobody would use Python.
>>> I don't think type safety is a big win for macros like these. We
>>> need to be more judicious about which macros are scheduled for function
>>> inlining. Some just don't benefit from the type-safety implications as
>>> much as others do, and you end up requiring everyone to change their code
>>> downstream for no real reason.
>>> These sorts of changes really feel to me like unnecessary spelling
>>> changes that require work from extension writers who now have to modify
>>> their code with no real gain. There seems to be a lot of that going on in
>>> the code base and I'm not really convinced that it's useful for end-users.
>> Good style and type checking are useful. Numpy needs more of both.
>> You can assert it, but it doesn't make it so. "Good style" depends on
>> what you are trying to accomplish and on your point of view. NumPy's style
>> is not the product of one person, it's been adapted from multiple styles
>> and inherits quite a bit from Python's style. I don't make any claims for
>> it other than it allowed me to write it with the time and experience I had
>> 7 years ago. We obviously disagree about this point. I'm sorry about
>> that. I'm pretty flexible usually --- that's probably one of your big
>> criticisms of my "style".
> Curiously, my criticism would be more that you are inflexible, slow to
> change old habits.
> I don't mind changing old habits at all. In fact, I don't think you know
> me very well if that's your take. You have a very narrow window into my
> activity and behavior. Of course habits are always hard to change
> (that's why they call them habits). Mostly, I need to be convinced of
> the value of changing old patterns --- just like everyone else (including
> existing NumPy users). On the type-question, I'm just not convinced that
> the most pressing matter in NumPy and SciPy is to re-write existing code to
> be more strictly typed. I'm quite open to other view points on that
> --- as long as backward compatibility is preserved, or a clear upgrade
> story is provided to existing users.
>> But, one of the things I feel quite strongly about is how hard we make it
>> for NumPy users to upgrade. There are two specific things I disagree
>> with pretty strongly:
>> 1) Changing defined macros that should work the same on PyArrayObjects or
>> PyObjects to now *require* types --- if we want to introduce new macros
>> that require types than we can --- as long as it just provides warnings but
>> still compiles then I suppose I could find this acceptable.
>> 2) Changing MACROS to require semicolons when they were previously not
>> needed. I'm going to be very hard-nosed about this one.
>>> I'm going to be a lot more resistant to that sort of change in the code
>>> base when I see it.
>> Numpy is a team effort. There are people out there who write better code
>> than you do, you should learn from them.
>> Exactly! It's a team effort. I'm part of that team as well, and while
>> I don't always have strong opinions about things. When I do, I'm going to
>> voice it.
>> I've learned long ago there are people that write better code than me.
>> There are people that write better code than you.
> Of course. Writing code is not my profession, and even if it were, there
> are people out there who would be immeasurable better. I have tried to
> improve my style over the years by reading books and browsing code by
> people who are better than me. I also recognize common bad habits naive
> coders tend to pick up when they start out, not least because I have at one
> time or another had many of the same bad habits.
> We are really quite a like here. I have done and continue to do exactly
> the same thing. My priorities are just different. I don't believe it is
> universally useful to alter patterns in existing code. I have typically
> adapted my style to the code I'm working with. Numeric had a style which
> I adapted to. Python has a style which I adapted to. I think that
> people reading code and seeing multiple styles will find the code harder to
> read. Such changes of style take work, and quite often the transition
> is not worth the effort. I have not been nor will I continue to be in
> opposition to changes that improve things (under any developers notion of
> "improvement"). The big exception to that is when it seems to me that the
> changes will make it more difficult for existing users to use their code.
> I know you are trying to make it easier for NumPy developers as you
> understand it. I really admire you for doing what you feel strongly about.
> I think we are both in our way trying to encourage more NumPy developers
> (you by making the code easier to get in to) and me by trying to figure out
> acceptable ways to fund them.
> I just think that we must recognize the users out there who have written
> to the existing NumPy interface. Any change that requires effort from
> users should be met with skepticism. We can write new interfaces and
> encourage new users to use those new interfaces. We can even re-write
> NumPy internals to use those interfaces. But, we can't just change
> documented interfaces (and even be careful about undocumented but implied
> interfaces -- I agree that this gets difficult to really adhere to, but we
> can and should try at least for heavily used code-paths). One thing I'm
> deeply aware of is the limited audience of this list compared to the user
> base of NumPy and the intertia of old NumPy releases. Discussions on
> this list are just not visible to the wider user base. My recent activity
> and interest is in protecting that user-base from the difficulties that
> recent changes are going to be on people upgrading from 1.5.
> My failing last year was to encourage and pay for (through Enthought)
> Mark's full-time activity on this list but not have the time to provide
> enough guidance to him about my understanding of the implications of his
> changes and think hard enough about those to understand them in the time.
I thought Mark's activities actually declined once he entered the Enthought
black hole. To be more specific, Mark did things that interested Enthought.
I'd like to know what Mark himself would have liked to do. When an original
thinker with impressive skills comes along it is worth letting them have a
fair amount of freedom to move things, it's the only way to avoid
> That is not the question here at all. The question here is not
>> requiring a *re-write* of code in order to get their extensions to compile
>> using NumPy headers. We should not be making people change their code to
>> get their extensions to compile in NumPy 1.X
> I think a bit of rewrite here and there along the way is more palatable
> than a big change coming in as one big lump, especially if the changes are
> done with a long term goal in mind. We are working towards a Numpy 2, but
> we can't just go off for a year or two and write it, we have to get there
> step by step. And that requires a plan.
> We see things a little differently on that front, I think. A bit of
> re-write here and there for down-stream users is exactly the wrong approach
> in my view. I think it depends on the user. For one who is tracking
> every NumPy release and has time to make any and all changes needed, I
> think you are right --- that approach will work for them. However, there
> are people out there who are using NumPy in ways (either significantly or
> only indirectly) where having to change *any* code from one release to
> another will make them seriously annoyed and we will start losing users.
Remember the lessons of 2.0, and of Python 3.0 for that matter.
>>> One particularly glaring example to my lens on the world: I think it
>>> would have been better to define new macros which require semicolons than
>>> changing the macros that don't require semicolons to now require
>>> That feels like a gratuitous style change that will force users of those
>>> macros to re-write their code.
>> It doesn't seem to be much of a problem.
>> Unfortunately, I don't trust your judgment on that. My experience and
>> understanding tells a much different story. I'm sorry if you disagree
>> with me.
> I'm sorry I made you sorry ;) The problem here is that you don't come
> forth with specifics. People tell you things, but you don't say who or what
> their specific problem was. Part of working with a team is keeping folks
> informed, it isn't that useful to appeal to authority. I watch the list,
> which is admittedly a small window into the community, and I haven't seen
> show stoppers. Bugs, sure, but that isn't the same thing.
> I came up with a very specific thing. I'm not sure what you are talking
> about. If you are talking about discussions with people off list, then
> I can't speak for them unless they have allowed me to. I encourage them to
> speak up here as often as they can. Yes, you will have to trust that a
> little bit of concern might just be an iceberg waiting to sink the ship.
>>> Sure, it's a simple change, but it's a simple change that doesn't do
>>> anything for you as an end user. I think I'm going to back this change
>>> out, in fact. I can't see requiring people to change their C-code like
>>> this will require without a clear benefit to them. I'm quite sure there
>>> is code out there that uses these documented APIs (without the semicolon).
>>> If we want to define new macros that require colons, then we do that, but
>>> we can't get rid of the old ones --- especially in a 1.x release.
>>> Our policy should not be to allow gratuitous style changes just because
>>> we think something is prettier another way. The NumPy code base has come
>>> from multiple sources and reflects several styles. It also follows an
>>> older style of C-programming (that is quite common in the Python code
>>> base). It can be changed, but those changes shouldn't be painful for a
>>> library user without some specific gain for them that the change allows.
>> You use that word 'gratuitous' a lot, I don't think it means what you
>> think it means. For instance, the new polynomial coefficient order wasn't
>> gratuitous, it was doing things in a way many found more intuitive and
>> generalized better to different polynomial basis. People
>> have different ideas, that doesn't make them gratuitous.
>> That's a slightly different issue. At least you created a new object
>> and api which is a *little* better. My complaint about the choice there
>> is now there *must* be two interfaces and added confusion as people will
>> have to figure out which assumption is being used. I don't really care
>> about the coefficient order --- really I don't. Either one is fine in my
>> mind. I recognize the reasons. The problem is *changing* it without a
>> *really* good reason. Now, we have to have two different APIs. I would
>> much preferred to have poly1d disappear and just use your much nicer
>> polynomial classes. Now, it can't and we are faced with a user-story
>> that is either difficult for someone transitioning from MATLAB
> Most folks aren't going to transition from MATLAB or IDL. Engineers tend
> to stick with the tools they learned in school, they aren't interested in
> the tool itself as long as they can get their job done. And getting the job
> done is what they are paid for. That said, I doubt they would have much
> problem making the adjustment if they were inclined to switch tools.
> I don't share your pessimism. You really think that "most folks aren't
> going to transition". It's happening now. It's been happening for
> several years.
I still haven't seen it. Once upon a time code for optical design was a new
thing and many folks wrote their own, myself for one. These days they reach
for Code V or Zemax. When they make the schematics they use something like
Solidworks. When it comes time for thermal anaysis they run the Solidworks
design into another commercial program. When it comes time to manufacture
the parts another package takes the Solidworks data and produces nc
instructions to drive the tools. The thing is, there is a whole ecosystem
built around a few standard design tools. Similar considerations hold in
civil engineering, architecture, and many other areas.
Another example would be Linux on the desktop. That never really took off,
Microsoft is still the dominant presence there. Where Linux succeeded was
in embedded devices and smart phones, markets that hadn't yet developed a
large ecosystem and where pennies count.
Now to Matlab, suppose you want to analyse thermal effects on an orbiting
satellite. Do you sit down and start writing new code in Python or do you
buy a package for Matlab that deals with orbital calculations and knows all
about shading and illumination? Suppose further that you have a few weeks
to pull it off and have used the Matlab tools in the past. Matlab wins in
this situation, Python isn't even a consideration.
There are certainly places for Python out there. HPC is one, because last I
looked Matlab licenses were still based around the number of cpu cores, so
there are significant cost savings. Research that needs innovative software
is another area where Python has an advantage. First, because in research
it is expected that time will be spent exploring new things, and second
because it is easier to write Python than Matlab scripts and there are more
tools available at no cost. On the other hand, if you need sophisticated
mathematics, Mathematica is the easy way to go.
Engineering is a big area, and only a small part of it offers opportunity
for Python to make inroads.
> or a "why did you do that?" puzzled look from a new user as to why we
>> support both coefficient orders. Of course, that could be our story ---
>> hey we support all kinds of orders, it doesn't really matter, you just have
>> to tell us what you mean when passing in an unadorned array of
>> coefficients. But, this is a different issue.
>> I'm using the word 'gratuitous' to mean that it is "uncalled for and
>> lacks a good reason". There needs to be much better reasons given for
>> code changes that require someone to re-write working code than "it's
>> better style" or even "it will help new programmers avoid errors". Let's
>> write another interface that new programmers can use that fits the world
>> the way you see it, don't change what's already working just because you
>> don't like it or wish a different choice had been made.
> Well, and that was exactly what you meant when you called to coefficient
> order 'gratuitous' in your first post to me about it. The problem was that
> you didn't understand why I made the change until I explained it, but
> rather made the charge sans explanation. It might be that some of the other
> things you call gratuitous are less so than you think. These are hasty
> judgements I think.
> I'm sure we all have our share of hasty judgments to go around. Even
> after your explanation, I still disagree with it. But, I appreciate the
> reminder to give you the benefit of the doubt when I encounter something
> that makes me raise my eyebrows. I hope you will do the same.
>>> There are significant users of NumPy out there still on 1.4. Even the
>>> policy of deprecation that has been discussed will not help people trying
>>> to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple
>>> times. The easier we can make this process for users the better. I
>>> remain convinced that it's better and am much more comfortable with making
>>> a release that requires a re-compile (that will succeed without further
>>> code changes --- because of backward compatibility efforts) than to have
>>> supposed ABI compatibility with subtle semantic changes and required C-code
>>> changes when you do happen to re-compile.
>> Cleanups need to be made bit by bit. I don't think we have done anything
>> that will cause undo trouble.
>> I disagree substantially on the impact of these changes. You can
>> disagree about my awareness of NumPy users, but I think I understand a
>> large number of them and why NumPy has been successful in getting users.
>> I agree that we have been unsuccessful at getting serious developers and
>> I'm convinced by you and Mark as to why that is. But, we can't sacrifice
>> users for the sake of getting developers who will spend their free time
>> trying to get around the organic pile that NumPy is at this point.
>> Because of this viewpoint, I think there is some adaptation and cleanup
>> right now, needed, so that significant users of NumPy can upgrade based on
>> the changes that have occurred without causing them annoying errors (even
>> simple changes can be a pain in the neck to fix).
>> I do agree changes can be made. I realize you've worked hard to keep
>> the code-base in a state that you find more adequate. I think you go
>> overboard on that front, but I acknowledge that there are people that
>> appreciate this. I do feel very strongly that we should not require
>> users to have to re-write working C-code in order to use a new minor
>> version number in NumPy, regardless of how the code "looks" or how much
>> "better" it is according to some idealized standard.
>> The macro changes are border-line (at least I believe code will still
>> compile --- just raise warnings, but I need to be sure about this). The
>> changes that require semi-colons are not acceptable at all.
> I was tempted to back them out myself, but I don't think the upshot will
> be earth shaking.
> I think it's important that code using NumPy headers that compiled with
> 1.5 will compile with 1.7.
>> Look Charles, I believe we can continue to work productively together and
>> our differences can be a strength to the community. I hope you feel the
>> same way. I will continue to respect and listen to your perspective ---
>> especially when I disagree with it.
> Sounds like a threat to me. Who are you to judge? If you are going to be
> the dictator, let's put that out there and make it official.
> Wow, charles! I think you should re-read what I wrote. It was not a
> threat at all. It was an appeal to work more closely together, and a
> commitment on my end to listen to your point of view and try to sift from
> any of my own opposition the chaff from the wheat.
> I am just not thinking in those terms at all. I do not think it is
> appropriate to talk about a dictator in this context. I have no control
> over what you do, and you have no control over what I do. We can only
> work cooperatively or independently for the benefit of NumPy.
> Perhaps there are things I've said and done that really bother you, or
> have offended you. I'm sorry for anything I've said that might have grated
> on you personally. I do appreciate your voice, ability, perspective, and
> skill. I suspect there are others in the NumPy community that feel the
> same way.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion