mysql -> record array

Erin Sheldon erin.sheldon at
Tue Nov 14 20:26:48 CST 2006

On 11/14/06, Tim Hochberg <tim.hochberg at> wrote:
> Tim Hochberg wrote:
> > John Hunter wrote:
> >
> >>>>>>> "Erin" == Erin Sheldon <erin.sheldon at> writes:
> >>>>>>>
> >>>>>>>
> >>     Erin> The question I have been asking myself is "what is the
> >>     Erin> advantage of such an approach?".  It would be faster, but by
> >>
> >> In the use case that prompted this message, the pull from mysql took
> >> almost 3 seconds, and the conversion from lists to numpy arrays took
> >> more that 4 seconds.  We have a list of about 500000 2 tuples of
> >> floats.
> >>
> >>
> > I'm no database user, but a glance at the at the docs seems to indicate
> > that you can get your data via an iterator (by iterating over the cursor
> > or some such db mumbo jumbo) rather than slurping up the whole list up
> > at once. If so, then you'll save a lot of memory by passing the iterator
> > straight to fromiter. It may even be faster, who knows.
> >
> > Accessing the db via the iterator could be a performance killer, but
> > it's almost certainly worth trying as it could a few megabytes of
> > storage and that in turn might speed things up.
> >
> >
> Assuming that I didn't mess this up too badly, it appears that using the
> iterator directly with fromiter is significantly faster than the next
> best solution (about 45%). The fromiter wrapping a list solution come in
> second, followed by numarray.array and finally way in the back,
> numpy.array. Here's the numbers:
>     retrieve1 took 0.902922857514 seconds
>     retrieve2 took 1.31245870634 seconds
>     retrieve3 took 1.51207569677 seconds
>     retrieve4 took 8.71539930354 seconds

Interesting results Tim.  From Pierre's results
we saw that fromiter is the fastest way to get data
into arrays.  With your results we see there is a
difference between iterating over the cursor and
doing a fetchall() as well. Surprisingly, running
the cursor is faster.

This must come not from the data retrieval rate but
from creating the copies in memory. But just in case
I think there is one more thing to check.
I haven't used sqlite, but with other databases I have
used there is often a large variance in times from
one select to the next.  Can you
repeat these tests with a timeit().repeat  and give the

As an aside, your database is running on a local disk, right, so
the overehead of retrieving data is minimized here?
For my tests I think I am data retrieval limited because I
get exactly the same time for the equivalent of retrieve1
and retrieve2.


Take Surveys. Earn Cash. Influence the Future of IT
Join's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash

More information about the Numpy-discussion mailing list