[Numpy-discussion] www.numpy.org home page

Paul Ivanov pivanov314@gmail....
Mon Dec 17 20:19:29 CST 2012


On Mon, Dec 17, 2012 at 5:50 PM, Paul Ivanov <pivanov314@gmail.com> wrote:
> On Mon, Dec 17, 2012 at 12:30 PM, Chris Barker - NOAA Federal
> <chris.barker@noaa.gov> wrote:
>> Interesting -- I was asked to review the Numpy 1.5 Beginner's Guide,
>> and I did read through the whole thing, and make notes, but never
>> wrote up a full review. One reason is that I found it hard to motivate
>> myself to write what would have been a bad review.
>
> This was also my experience. I would go so far as to say that it would
> be a disservice to our community to link to that book. Our
> documentation is better.

I dug up the skeleton of the review that I had written up until I lost
steam and interest in going further, I think it may shed more light on
my negative opinion of this book.

Packt publishing approached me about doing a review of one of their
newest Python books: _NumPy 1.5 Beginner's Guide_

I think it's great that publishers are making it easier for folks to get
started in this hot area of computing, though obviously, being a vested member
of the Scientific Python community, I'm not exactly unbiased in my
opinions on the topic.

I received a complementary e-book copy.

Here are my thoughts

It correctly mentions that NumPy can use a LAPACK implementation if
one is available on your system, and also correctly mentions that
NumPy provides its own implementation if it can't find one - but
neglects to state the important fact that this will be very slow
relative to a traditional LAPACK implementation

No mention of CPython - since there are so many other flavors of
Python out there, these days, and NumPy doesn't work on most of them

Mentions that NumPy is "open source" and "free as in beer", but
neglects to specifically state its license.

Numpy 1.5 is in the book title - but NumPy 2.0.0.dev20100915 listed
under "Here is a list of software used to develop and test the code
examples" in the Preface.


- the need to register for an account in order to download the sample
code is annoying


The author of the book uses a 64-bit machine. How do I know that? The
first code example provided in the book does not work for the second
set of inputs.

    20:32@ch1code$ python vectorsum.py 2000
    The last 2 elements of the sum [7980015996L, 7992002000L]
    PythonSum elapsed time in microseconds 4943
    Warning: invalid value encountered in power
    The last 2 elements of the sum [-2143491644 -2143487647]
    NumPySum elapsed time in microseconds 722
    20:32@ch1code$

Indeed the vectorsum example: doesn't work for values > 1291 ( int
overflow on my 32 bit machine )
python vectorsum.py 1291

    20:32@ch1code$ python vectorsum.py 1291
    The last 2 elements of the sum [2143362090, 2148353100L]
    PythonSum elapsed time in microseconds 1771
    The last 2 elements of the sum [ 2143362090 -2146614196]
    NumPySum elapsed time in microseconds 374

So though the answer  is attained way faster using numpy, it's wrong
in this case!

"What just happened?" section a bit annoying - just space filler without
utility - reminiscent of closing curling brackets or line-ending semicolons of
those other programming languages :) Ditto with the "time for action" headings.

import numpy as np would have been nice - since that's the convention
used in numpy's own doc strings.

"What does arange(5) do" - namespaces are one honking good idea...

Re: IPython: "The Pylab switch imports all the Scipy, NumPy, and
Matplotlib packages. Without this
switch, we would have to import every package we need ourselves." -
biggest use of ``--pylab`` is to get a separate event loop for plots.

It's confusing to have a Numpy book that, in the first chapter, dives
into IPython! The book is unfocused in its presentation. A simple
mention of %quickref would have sufficed for pointing out features of
IPython

TypeError for ints and floats is a property of the Python programming
language - it's not specific to Numpy.

Reshape function should have mentioned that it only changes only the
metadata, so reshaping a really large array takes the same amount of
time as a small one.

More style problems: "This selects the first floor"

Mention of ravel() and flatten() without mention of flat (which is
mentioned several pages later).

Mention of a.transpose() without a.T - which is mentioned seven pages later.

Something I didn't know: "The flat attribute is settable. Setting the
value of the flat attribute leads to
overwriting the values of the whole array" (p 46). And then I figured
out why I didn't know this (it's slow):

In [31]: timeit a.flat=1
10 loops, best of 3: 80.4 ms per loop

In [32]: timeit a.fill(1)
100 loops, best of 3: 5.62 ms per loop

Unfortunately, the fill() method is *not* mentioned here, it's
mentioned 25 pages later.

VWA-what? too simplistic explanation to be useful.

Mean - cheeky language, but the fact that it's also a method on numpy array is
only mentioned in passing four pages later. That you can specify ``axis``
keyword to get means across rows or columns, etc., is also not mentioned here.
The same thing for min, max, ptp.

Uses the memory-copying numpy.msort() to sort an array, instead of the
in-place a.sort(), or talking about the more general numpy.sort()
function.

The explanation of numpy.var include the important point about how "some books
tell us to divide by the number of elements in the array minus one,"  but then
fails to mention the ddof keyword argument to var and std. A shout-out to
numpy.std would have been nice here, too.


    I don't care for the finance-focus of the example in this book,
but I do care about being talked down to:

        In academic literature it is more common to base analysis on
stock returns and
        log returns of the close price. Simple returns are just the
rate of change from
        one value to the next. Logarithmic returns or log returns are
determined by taking
        the log of all the prices and calculating the differences
between them. In high
        school, we learned that the difference between the log of "a"
and the log of
        "b" is equal to the log of "a divided by b". Log return, therefore, also
        measures rate of change.

Indexing with masks (e.g. all positive values of an array as  a[a>0]),
not mentioned until chapter X, whereas it had a natural fit in either
the indexing section of Chapter 2,  or along side the use of
numpy.where() in Chapter 3.

The book talks about how it's useful for scientists and engineers, and
then has a heavy focus on stock-market-related finance one-off
examples. My eyes glazed over with the acronyms and, what to me, are
meaningless sets of quantities. Image: ATR - if it's not important,
don't tell me about it! Bollinger Bands: gimme a break!

np.piecewise  - I've never used it - looks quite useful

As a newcomer to NumPy, I would have been too distracted by the
financial focus of all of the example to get a general picture of what
NumPy goodies.

inconsistencies in style: numpy.arange followed by plot() plot() plot() show()


no discussion of broadcasting.


go ahead and time it graphic: no mention of ipython's timeit magic, or
even just the python standard library timeit module.


dot function described independent of matrix class multiplication.


Typos / Errata
--------------
numpy.arange - "the arange function was imported, that's why it is
prefixed with numpy" - should be "the arange function was *not*
imported..."
page 17

hstack and vstack visuals on page 39 are identical in the resulting
array, and should not be.

poly = numpy.polyfit(t, bhp - vale, int(sys.argv[1])) on page 85
should make reference to how the script must be passed an argument for
the degree of the polynomial, or the int(sys.argv[1]) should be
changed to just 3 to suit the result.

pg 86 - "extremums"  should read "extrema"


--
Paul Ivanov
314 address only used for lists,  off-list direct email at:
http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7


More information about the NumPy-Discussion mailing list