[Numpy-discussion] Created NumPy 1.7.x branch

Travis Oliphant travis@continuum...
Sun Jun 24 23:09:17 CDT 2012


> 
> What has been done in the past is that an intent to fork is announced some two weeks in advance so that people can weigh in on what needs to be done before the fork. The immediate fork was a bit hasty. Likewise, when I suggested going to the github issue tracking, I opened a discussion on needed tags, but voila, there it was with an incomplete set and no discussion. That to seemed hasty. 

My style is just different.   I like to do things and then if discussion requires an alteration, we alter.    It is just a different style.    The labels can be altered, they are not set in stone.   I prefer to have something to talk about and a starting point to alter from --- especially on potential bike-shedding discussions.   There are several people that can make changes to the labels.   If we have difficulty agreeing then    we can go from that point.  

>>  
>> There is time before the first Release candidate to make changes on the 1.7.x branch.   If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday.    We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help.     There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well.    The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release.
>> 
>> What are we going to do for 1.8?
> 
> Let's get 1.7 out the door first. 
> 
> Mark proposed a schedule for the next several releases, I'd like to know if we are going to follow it.

We should discuss it again.  I don't recall the specifics and I believe it was just a proposal.    I do not recall much feedback on it. 
 
> 
>> 
>> Yes, the functions will give warnings otherwise.
> 
> I think this needs to be revisited.  I don't think these changes are necessary for *every* use of macros.   It can cause a lot of effort for people downstream without concrete benefit. 
> 
> The idea is to slowly move towards hiding the innards of the array type. This has been under discussion since 1.3 came out. It is certainly the case that not all macros need to go away.

I know it's been under discussion, but it looks like a lot of changes were made just last year (and I am just starting to understand the implications of all those changes).    I think there are many NumPy users that will be in the same position over the coming years.     This is a bit more than just hiding the array innards.   The array innards have been "hidden" by using the macros since NumPy 1.0.   There was a specific intent to create macros for all array access and encourage use of those macros --- precisely so that the array object could change.      The requirement of ABI compatibility was not pre-envisioned in NumPy 1.0

Neither was NumPy 1.0 trying to provide type-safety in all cases.   I don't recall a discussion on the value of having macros that can be interpreted at least as both PyObject * and PyArrayObject *.     Perhaps this is possible, and I just need to be educated.  But, my opinion is that it's not always useful to require type-casting especially between those two.   
 
> 
>>  
>>   That's not as nice to type.
>> 
>> So? The point is to have correctness, not ease of typing.
> 
> I'm not sure if a pun was intended there or not.    C is not a safe and fully-typed system.    That is one of its weaknesses according to many.  But, I would submit that not being forced to give everything a "type" (and recognizing the tradeoffs that implies) is also one reason it gets used.
> 
> C was famous for bugs due to the lack of function prototypes. This was fixed with C99 and the stricter typing was a great help.

Bugs are not "due to lack of function prototypes".  Bugs are due to mistakes that programmers make (and I know all about mistakes programmers make).  Function prototypes can help detect some kinds of mistakes which is helpful.   But, this doesn't help the question of how to transition a weakly-typed program or whether or not that is even a useful exercise.
 
> 
> 
>>  
>>  Is that assuming that PyArray_NDIM will become a function and need a specific object type for its argument (and everything else cast....).   That's one clear disadvantage of inline functions versus macros in my mind:  no automatic polymorphism.
>> 
>> That's a disadvantage of Python. The virtue of inline functions is precisely type checking.
> 
> Right, but we need to be more conscientious about this.   Not every use of Macros should be replaced by inline function calls and the requisite *forced* type-checking.   type-chekcing is not *universally* a virtue --- if it were, nobody would use Python. 
> 
>>  
>> I don't think type safety is a big win for macros like these.     We need to be more judicious about which macros are scheduled for function inlining.  Some just don't benefit from the type-safety implications as much as others do, and you end up requiring everyone to change their code downstream for no real reason.  
>> 
>> These sorts of changes really feel to me like unnecessary spelling changes that require work from extension writers who now have to modify their code with no real gain.   There seems to be a lot of that going on in the code base and I'm not really convinced that it's useful for end-users.
>> 
>> Good style and type checking are useful. Numpy needs more of both.
> 
> You can assert it, but it doesn't make it so.   "Good style" depends on what you are trying to accomplish and on your point of view.  NumPy's style is not the product of one person, it's been adapted from multiple styles and inherits quite a bit from Python's style.   I don't make any claims for it other than it allowed me to write it with the time and experience I had 7 years ago.    We obviously disagree about this point.  I'm sorry about that.  I'm pretty flexible usually --- that's probably one of your big criticisms of my "style". 
> 
> Curiously, my criticism would be more that you are inflexible, slow to change old habits.

I don't mind changing old habits at all.  In fact, I don't think you know me very well if that's your take.   You have a very narrow window into my activity and behavior.     Of course habits are always hard to change (that's why they call them habits).    Mostly, I need to be convinced of the value of changing old patterns --- just like everyone else (including existing NumPy users).    On the type-question, I'm just not convinced that the most pressing matter in NumPy and SciPy is to re-write existing code to be more strictly typed.      I'm quite open to other view points on that --- as long as backward compatibility is preserved, or a clear upgrade story is provided to existing users. 


>  
> 
> But, one of the things I feel quite strongly about is how hard we make it for NumPy users to upgrade.    There are two specific things I disagree with pretty strongly: 
> 
> 	1) Changing defined macros that should work the same on PyArrayObjects or PyObjects to now *require* types --- if we want to introduce new macros that require types than we can --- as long as it just provides warnings but still compiles then I suppose I could find this acceptable.
> 
> 	2) Changing MACROS to require semicolons when they were previously not needed.    I'm going to be very hard-nosed about this one. 
> 
>>   
>> I'm going to be a lot more resistant to that sort of change in the code base when I see it.
>> 
>> Numpy is a team effort. There are people out there who write better code than you do, you should learn from them.
> 
> Exactly!  It's a team effort.   I'm part of that team as well, and while I don't always have strong opinions about things.  When I do, I'm going to voice it.    
> 
> I've learned long ago there are people that write better code than me.    There are people that write better code than you.
> 
> Of course. Writing code is not my profession, and even if it were, there are people out there who would be immeasurable better. I have tried to improve my style over the years by reading books and browsing code by people who are better than me. I also recognize common bad habits naive coders tend to pick up when they start out, not least because I have at one time or another had many of the same bad habits.

We are really quite a like here.   I have done and continue to do exactly the same thing.   My priorities are just different.   I don't believe it is universally useful to alter patterns in existing code.  I have typically adapted my style to the code I'm working with.   Numeric had a style which I adapted to.   Python has a style which I adapted to.   I think that people reading code and seeing multiple styles will find the code harder to read.      Such changes of style take work, and quite often the transition is not worth the effort.     I have not been nor will I continue to be in opposition to changes that improve things (under any developers notion of "improvement").   The big exception to that is when it seems to me that the changes will make it more difficult for existing users to use their code.

I know you are trying to make it easier for NumPy developers as you understand it.  I really admire you for doing what you feel strongly about.   I think we are both in our way trying to encourage more NumPy developers (you by making the code easier to get in to) and me by trying to figure out acceptable ways to fund them.   

I just think that we must recognize the users out there who have written to the existing NumPy interface.  Any change that requires effort from users should be met with skepticism.   We can write new interfaces and encourage new users to use those new interfaces.  We can even re-write NumPy internals to use those interfaces.  But, we can't just change documented interfaces (and even be careful about undocumented but implied interfaces -- I agree that this gets difficult to really adhere to, but we can and should try at least for heavily used code-paths).     One thing I'm deeply aware of is the limited audience of this list compared to the user base of NumPy and the intertia of old NumPy releases.    Discussions on this list are just not visible to the wider user base.   My recent activity and interest is in protecting that user-base from the difficulties that recent changes are going to be on people upgrading from 1.5.  

My failing last year was to encourage and pay for (through Enthought) Mark's full-time activity on this list but not have the time to provide enough guidance to him about my understanding of the implications of his changes and think hard enough about those to understand them in the time. 

> 
> That is not the question here at all.     The question here is not requiring a *re-write* of code in order to get their extensions to compile using NumPy headers.    We should not be making people change their code to get their extensions to compile in NumPy 1.X
> 
> I think a bit of rewrite here and there along the way is more palatable than a big change coming in as one big lump, especially if the changes are done with a long term goal in mind. We are working towards a Numpy 2, but we can't just go off for a year or two and write it, we have to get there step by step. And that requires a plan.

We see things a little differently on that front, I think.    A bit of re-write here and there for down-stream users is exactly the wrong approach in my view.    I think it depends on the user.    For one who is tracking every NumPy release and has time to make any and all changes needed, I think you are right --- that approach will work for them.    However, there are people out there who are using NumPy in ways (either significantly or only indirectly) where having to change *any* code from one release to another will make them seriously annoyed and we will start losing users. 
>  
> 
>>  
>> 
>> One particularly glaring example to my lens on the world:   I think it would have been better to define new macros which require semicolons than changing the macros that don't require semicolons to now require semicolons:  
>> 
>>     NPY_BEGIN_THREADS_DEF
>>     NPY_BEGIN_THREADS
>>     NPY_ALLOW_C_API
>>     NPY_ALLOW_C_API_DEF
>>     NPY_DISABLE_C_API
>> 
>> That feels like a gratuitous style change that will force users of those macros to re-write their code.
>> 
>> It doesn't seem to be much of a problem.
> 
> Unfortunately, I don't trust your judgment on that.   My experience and understanding tells a much different story.   I'm sorry if you disagree with me. 
> 
> 
> I'm sorry I made you sorry ;) The problem here is that you don't come forth with specifics. People tell you things, but you don't say who or what their specific problem was. Part of working with a team is keeping folks informed, it isn't that useful to appeal to authority. I watch the list, which is admittedly a small window into the community, and I haven't seen show stoppers. Bugs, sure, but that isn't the same thing. 

I came up with a very specific thing.  I'm not sure what you are talking about.     If you are talking about discussions with people off list, then I can't speak for them unless they have allowed me to.  I encourage them to speak up here as often as they can.  Yes, you will have to trust that a little bit of concern might just be an iceberg waiting to sink the ship.  

>>  
>> Sure, it's a simple change, but it's a simple change that doesn't do anything for you as an end user.   I think I'm going to back this change out, in fact.   I can't see requiring people to change their C-code like this will require without a clear benefit to them.    I'm quite sure there is code out there that uses these documented APIs (without the semicolon).   If we want to define new macros that require colons, then we do that, but we can't get rid of the old ones --- especially in a 1.x release. 
>> 
>> Our policy should not be to allow gratuitous style changes just because we think something is prettier another way.   The NumPy code base has come from multiple sources and reflects several styles.   It also follows an older style of C-programming (that is quite common in the Python code base).    It can be changed, but those changes shouldn't be painful for a library user without some specific gain for them that the change allows. 
>> 
>> 
>> You use that word 'gratuitous' a lot, I don't think it means what you think it means. For instance, the new polynomial coefficient order wasn't gratuitous, it was doing things in a way many found more intuitive and generalized better to different polynomial basis. People
>> have different ideas, that doesn't make them gratuitous.
> 
> That's a slightly different issue.    At least you created a new object and api which is a *little* better.     My complaint about the choice there is now there *must* be two interfaces and added confusion as people will have to figure out which assumption is being used.   I don't really care about the coefficient order --- really I don't.   Either one is fine in my mind.   I recognize the reasons.    The problem is *changing* it without a *really* good reason.   Now, we have to have two different APIs.   I would much preferred to have poly1d disappear and just use your much nicer polynomial classes.    Now, it can't and we are faced with a user-story that is either difficult for someone transitioning from MATLAB
> 
> Most folks aren't going to transition from MATLAB or IDL. Engineers tend to stick with the tools they learned in school, they aren't interested in the tool itself as long as they can get their job done. And getting the job done is what they are paid for. That said, I doubt they would have much problem making the adjustment if they were inclined to switch tools.

I don't share your pessimism.  You really think that "most folks aren't going to transition".   It's happening now.  It's been happening for several years.   

> 
> or a "why did you do that?" puzzled look from a new user as to why we support both coefficient orders.  Of course, that could be our story --- hey we support all kinds of orders, it doesn't really matter, you just have to tell us what you mean when passing in an unadorned array of coefficients.   But, this is a different issue. 
> 
> I'm using the word 'gratuitous' to mean that it is "uncalled for and lacks a good reason".     There needs to be much better reasons given for code changes that require someone to re-write working code than "it's better style" or even "it will help new programmers avoid errors".   Let's write another interface that new programmers can use that fits the world the way you see it, don't change what's already working just because you don't like it or wish a different choice had been made.
> 
> Well, and that was exactly what you meant when you called to coefficient order 'gratuitous' in your first post to me about it. The problem was that you didn't understand why I made the change until I explained it, but rather made the charge sans explanation. It might be that some of the other things you call gratuitous are less so than you think. These are hasty judgements I think.

I'm sure we all have our share of hasty judgments to go around.    Even after your explanation, I still disagree with it.   But, I appreciate the reminder to give you the benefit of the doubt when I encounter something that makes me raise my eyebrows.    I hope you will do the same.  

>  
> 
>>  
>> There are significant users of NumPy out there still on 1.4.    Even the policy of deprecation that has been discussed will not help people trying to upgrade from 1.4 to 1.8.   They will be forced to upgrade multiple times.    The easier we can make this process for users the better.    I remain convinced that it's better and am much more comfortable with making a release that requires a re-compile (that will succeed without further code changes --- because of backward compatibility efforts) than to have supposed ABI compatibility with subtle semantic changes and required C-code changes when you do happen to re-compile.  
>> 
>> 
>> Cleanups need to be made bit by bit. I don't think we have done anything that will cause undo trouble.
> 
> I disagree substantially on the impact of these changes.  You can disagree about my awareness of NumPy users, but I think I understand a large number of them and why NumPy has been successful in getting users.    I agree that we have been unsuccessful at getting serious developers and I'm convinced by you and Mark as to why that is.    But, we can't sacrifice users for the sake of getting developers who will spend their free time trying to get around the organic pile that NumPy is at this point.  
> 
> Because of this viewpoint, I think there is some adaptation and cleanup right now, needed, so that significant users of NumPy can upgrade based on the changes that have occurred without causing them annoying errors (even simple changes can be a pain in the neck to fix).  
> 
> I do agree changes can be made.    I realize you've worked hard to keep the code-base in a state that you find more adequate.   I think you go overboard on that front, but I acknowledge that there are people that appreciate this.    I do feel very strongly that we should not require users to have to re-write working C-code in order to use a new minor version number in NumPy, regardless of how the code "looks" or how much "better" it is according to some idealized standard.  
> 
> The macro changes are border-line (at least I believe code will still compile --- just raise warnings, but I need to be sure about this).    The changes that require semi-colons are not acceptable at all.
> 
> I was tempted to back them out myself, but I don't think the upshot will be earth shaking.

I think it's important that code using NumPy headers that compiled with 1.5 will compile with 1.7.     

>  
> 
> Look Charles, I believe we can continue to work productively together and our differences can be a strength to the community.  I hope you feel the same way.  I will continue to respect and listen to your perspective --- especially when I disagree with it.
> 
> Sounds like a threat to me. Who are you to judge? If you are going to be the dictator, let's put that out there and make it official.

Wow, charles!   I think you should re-read what I wrote.  It was not a threat at all.   It was an appeal to work more closely together, and a commitment on my end to listen to your point of view and try to sift from any of my own opposition the chaff from the wheat.     
 
I am just not thinking in those terms at all.   I do not think it is appropriate to talk about a dictator in this context.  I have no control over what you do, and you have no control over what I do.   We can only work cooperatively or independently for the benefit of NumPy. 

Perhaps there are things I've said and done that really bother you, or have offended you.  I'm sorry for anything I've said that might have grated on you personally.   I do appreciate your voice, ability, perspective, and skill.    I suspect there are others in the NumPy community that feel the same way. 

Best regards,

-Travis




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120624/20f92c0a/attachment-0001.html 


More information about the NumPy-Discussion mailing list