[Numpy-discussion] .T Transpose shortcut for arrays again
tim.hochberg at cox.net
Fri Jul 7 02:24:46 CDT 2006
Bill Baxter wrote:
> On 7/7/06, *Robert Kern* <robert.kern at gmail.com
> <mailto:robert.kern at gmail.com>> wrote:
> Bill Baxter wrote:
> > I am also curious, given the number of times I've heard this
> > argument of "there are lots kinds of numerical computing that don't
> > invlolve linear algebra", that no one ever seems to name any of
> > "lots of kinds". Statistics, maybe? But you can find lots of
> > algebra in statistics.
> That's because I'm not waving my hands at general fields of
> application. I'm
> talking about how people actually use array objects on a
> line-by-line basis. If
> I represent a dataset as an array and fit a nonlinear function to
> that dataset,
> am I using linear algebra at some level? Sure! Does having a .T
> attribute on
> that array help me at all? No. Arguing about how fundamental
> linear algebra is
> to numerical endeavors is entirely besides the point.
> Ok. If line-by-line usage is what everyone really means, then I'll
> get off the linear algebra soap box, but that's not what it sounded
> like to me.
> So, if you want to talk line-by-line, I really can't talk about much
> beside my own code. But I just grepped through it and out of 2445
> non-empty lines of code:
> 927 lines contain '='
> 390 lines contain a '['
> 75 lines contain matrix,asmatrix, or mat
> ==> 47 lines contain a '.T' or '.transpose' of some sort. <==
> 33 lines contain array, or asarray, or asanyarray
> 24 lines contain 'rand(' --- I use it for generating bogus test data
> a lot
> 17 lines contain 'newaxis' or 'NewAxis'
> 16 lines contain 'zeros('
> 13 lines contain 'dot('
> 12 lines contain 'empty('
> 8 lines contain 'ones('
> 7 lines contain 'inv('
In my main project theres about 26 KLOC (including blank lines), 700 or
so of which use numeric (I prefix everything with np. so it's easy to
count. Of those lines 29 use transpose, and of those 29 lines at most 9
could use a T attribute. It's probably far less than that since I didn't
check the dimensionality of the arrays involved. Somewhere between 0 and
5 seems likely.
> I'm pretty new to numpy, so that's all the code I got right now. I'm
> sure I've written many more lines of emails about numpy than I have
> lines of actual numpy code. :-/
> But from that, I can say that -- at least in my code -- transpose is
> pretty common. If someone can point me to some larger codebases
> written in numpy or numeric, I'd be happy to do a similar analysis of
> I'm not saying that people who do use arrays for linear algebra
> are rare or
> unimportant. It's that syntactical convenience for one set of
> conventional ways
> to use an array object, by itself, is not a good enough reason to
> add stuff to
> the core array object.
> I wish I had a way to magically find out the distribution of array
> dimensions used by all numpy and numeric code out there. My guess is
> it would be something like 1-d: 50%, 2-d: 30%, 3-d: 10%, everything
> else: 10%. I can't think of a good way to even get an estimate on
> that. But in any event, I'm positive ndims==2 is a significant
> percentage of all usages. It seems like the opponents to this idea
> are suggesting the distribution is more flat than that. But whatever
> the distribution is, it has to have a fairly light tail since memory
> usage is exponential in ndim. If ndim == 20, then it takes 8
> megabytes just to store the smallest possible non-degenerate array of
> float64s ( i.e. a 2x2x2x2x...)
I would guess that it falls off fast after n=3, but that's just a guess.
Personally, the majority of my code deals in 3D arrays (2x2xN and 4x4xN
for the most part). These are arrays of vectors holding scattering data
at N different frequency or time points. The 2D arrays that I do use are
for rendering imaging (the actual rendering is done in C since Python
wasn't fast enough and numpy wasn't really suitable for it). So, you see
that for me at least, a T attribute is complete cruft. Useless for the
3D arrays, not needed for the 2D arrays, and again useless for the 1D
arrays. I suspect that in general, the image processing types, who use a
lot of 2D arrays, are probably not heavy users of transpose, but I'm not
certain of that.
> It seems crazy to even be arguing this. Transposing is not some
> specialized esoteric operation. It's important enough that R and S
> give it a one letter function, and Matlab, Scilab, K all give it a
> single-character operator. [*] Whoever designed the numpy.matrix
> class also thought it was worthy of a shortcut, and I think came up
> with a pretty good syntax for it. And the people who invented math
> decided it was worth assigning a 1-character exponent to it.
> So I think there's a clear argument for having a .T attribute. But
> ok, let's say you're right, and a lot of people won't use it. Fine.
> IT WILL DO THEM ABSOLUTELY NO HARM. They don't have to use it if they
> don't like it! Just ignore it. Unlike a t() function, .T doesn't
> pollute any namespace users can define symbols in, so you really can
> just ignore it if you're not interested in using it. It won't get in
> your way.
This is a completely bogus argument. All features cost -- good and ill
alike. There's implementation cost and maintenance cost, both likely
small in this case, but not zero. There's cognitive costs associated
with trying to hold all of the various numpy methods, attributes and
functions in ones head at once. There's pedagogical costs trying to
explain how things fit together. There's community costs in that people
who are allegedly coding with core numpy end up using mutually
incomprehensible dialects. TANSTAFL.
The ndarray object has far too many methods and attributes already IMO,
and you have not made a case that I find convincing that this is
important enough to further cruftify it.
> For the argument that ndarray should be pure like the driven snow,
> just a raw container for n-dimensional data,
Did anyone make that argument. No? I didn't think so.
> I think that's what the basearray thing that goes into Python itself
> should be. ndarray is part of numpy and numpy is for numerical
> [*] Full disclosure: I did find two counter-examples -- Maple and
> Mathematica. Maple has only a transpose() function and Mathematica
> has only Transpose (but you can use [esc]tr[esc] as a shortcut)
> However, both of those packages are primarily known for their
> _symbolic_ math capabilities, not their number crunching, so they less
> are similar to numpy than R,S,K,Matlab and Scilab in that regard.
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
More information about the Numpy-discussion