[Numpy-discussion] mysql -> record array

Erin Sheldon erin.sheldon at gmail.com
Sat Nov 18 09:35:06 CST 2006


On 11/16/06, Travis Oliphant <oliphant at ee.byu.edu> wrote:
> John Hunter wrote:
>
> >>>>>> "Erin" == Erin Sheldon <erin.sheldon at gmail.com> writes:
> >>>>>>
> >>>>>
> >
> >    Erin> The question I have been asking myself is "what is the
> >    Erin> advantage of such an approach?".  It would be faster, but by
> >
> > In the use case that prompted this message, the pull from mysql took
> > almost 3 seconds, and the conversion from lists to numpy arrays took
> > more that 4 seconds.  We have a list of about 500000 2 tuples of
> > floats.
> >
> > Digging in a little bit, we found that numpy is about 3x slower than
> > Numeric here
> >
> >  peds-pc311:~> python test.py
> >  with dtype: 4.25 elapsed seconds
> >  w/o dtype 5.79 elapsed seconds
> >  Numeric  1.58 elapsed seconds
> >  24.0b2
> >  1.0.1.dev3432
> >
> > Hmm... So maybe the question is -- is there some low hanging fruit
> > here to get numpy speeds up?
> >
> > import time
> > import numpy
> > import numpy.random
> > rand = numpy.random.rand
> >
> > x = [(rand(), rand()) for i in xrange(500000)]
> > tnow = time.time()
> > y = numpy.array(x, dtype=numpy.float_)
> > tdone = time.time()
> > print 'with dtype: %1.2f elapsed seconds'%(tdone - tnow)
> >
> > tnow = time.time()
> > y = numpy.array(x)
> > tdone = time.time()
> > print 'w/o dtype %1.2f elapsed seconds'%(tdone - tnow)
> >
> > import Numeric
> > tnow = time.time()
> > y = Numeric.array(x, Numeric.Float)
> > tdone = time.time()
> > print 'Numeric  %1.2f elapsed seconds'%(tdone - tnow)
> >
> > print Numeric.__version__
> > print numpy.__version__
> >
> >
>
> I just adapted Numarray's version of array (using the fromlist method)
> to NumPy.   This new change needs some testing as it is called in many,
> many ways.  But, I think it should be right (all tests of numpy and
> scipy pass with it).
> With the change I get:
>
> with dtype: 0.22 elapsed seconds
> w/o dtype 5.02 elapsed seconds
> Numeric  7.38 elapsed seconds
> numarray  0.55 elapsed seconds
> 24.2
> 1.0.1.dev3437
> 1.5.1

Hi Travis -

That is an impressive speed increase.  Why is w/o dtype taking
so much longer?  Is this just from determining elements sizes and
counts?

Erin


More information about the Numpy-discussion mailing list