[Numpy-discussion] mysql -> record array

Francesc Altet faltet at carabos.com
Fri Nov 17 07:21:20 CST 2006


A Dijous 16 Novembre 2006 22:28, Erin Sheldon escrigué:
> Hi Francesc -
>
> Unless I missed something, I think what you have
> shown is that the combination of
>       (getting data from database into python lists) +
>       (converting to arrays)
> is what is taking time.   I would guess the first takes
> significantly longer than the second.

Seriously, I don't think I have demonstrated nothing really solid in
that regard with so little evidences. But we can try looking for more
of those :)

For example, I'd split the times in:

    t1 (getting data from database) +
    t2 (python lists) +
    t3 (converting to arrays) =
    tt (total time)

We don't know t1, but we do know tt. Now, we can try to get a guess
for t1 and t2. Perhaps I'm wrong, but the next could be good
estimates.

For t1 (creating the python list of tuples):
In [44]: Timer("[(x,x) for x in np.arange(500000, dtype='float64')]", "import 
numpy as np").repeat(3,1)
Out[44]: [0.55968594551086426, 0.48462891578674316, 0.4855189323425293]

For t2 (converting to recarrays):
In [49]: Timer("np.fromiter(lot, dtype=dtype)", "import numpy as np; 
lot=[(x,x) for x in np.arange(500000, dtype='float64')]; 
dtype=np.dtype([('x', 'float64'), ('y', 'float64')])").repeat(3,1)
Out[49]: [0.50310707092285156, 0.50920987129211426, 0.50304579734802246]

So, it seems that t1 and t2 are similar and they take aproximately 0.5
seconds each.

Now, let me remember the timings for reading the databases on my
laptop at work (a Pentium4 @ 2 GHz):

setup SQLite took 23.5661110878 seconds
retrieve SQLite took 3.26717996597 seconds
setup PyTables took 0.139157056808 seconds
retrieve PyTables took 0.13444685936 seconds

So, in our case, tt for SQLite3 was 3.26 seconds. With that, we can
derive its t1 (getting data from database):

t1 = tt - t1 - t2 =~ 2.26 seconds

However, this is still far more than tt for PyTables (~ 0.14 sec), so
I'm not completely sure what's going on. Honest, I don't think that
HDF5 (the underlying library for doing I/O in PyTables) would be
almost 20x faster than SQLite3 for reading purposes. So my guess is
that there should be more factors contributing tt for SQLite3 that
I've not taken in account. Anyone can find them?

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"



More information about the Numpy-discussion mailing list