[Numpy-discussion] fromstring, tostring slow?

Mark Janikas mjanikas@esri....
Tue Feb 13 14:36:01 CST 2007


Good call Stefan,

I decoupled the timing from the application (duh!) and got better results:

from numpy import *
import numpy.random as RAND
import time as TIME

x = RAND.random(1000)
xl = x.tolist()

t1 = TIME.clock()
xStringOut = [ str(i) for i in xl ]
xStringOut = " ".join(xStringOut)
f = file('blah.dat','w'); f.write(xStringOut)
t2 = TIME.clock()
total = t2 - t1
t1 = TIME.clock()
f = file('blah.bwt','wb')
xBinaryOut = x.tostring()
f.write(xBinaryOut)
t2 = TIME.clock()
total1 = t2 - t1

>>> total
0.00661
>>> total1
0.00229

Printing x directly to a string took REALLY long: f.write(str(x)) = 0.0258

The problem therefore, must be in the way I am appending values to the empty arrays.  I am currently using the append method: 

myArray = append(myArray, newValue)

Or would it be faster to concat or use a list append then convert?

But to be more sure, Ill have to profile it.  It seems a bit odd in that there are far less loops and conversions in my current implementation for the binary, yet it is still running slower. 


-----Original Message-----
From: numpy-discussion-bounces@scipy.org [mailto:numpy-discussion-bounces@scipy.org] On Behalf Of Stefan van der Walt
Sent: Tuesday, February 13, 2007 12:03 PM
To: numpy-discussion@scipy.org
Subject: Re: [Numpy-discussion] fromstring, tostring slow?

On Tue, Feb 13, 2007 at 11:42:35AM -0800, Mark Janikas wrote:
> I am finding that directly packing numpy arrays into binary using the tostring
> and fromstring methods do not provide a speed improvement over writing the same
> arrays to ascii files.  Obviously, the size of the resulting files is far
> smaller, but I was hoping to get an improvement in the speed of writing.  I got
> that speed improvement using the struct module directly, or by using generic
> python arrays.  Let me further describe my methodological issue as it may
> directly relate to any solution you might have.

Hi Mark

Can you post a benchmark code snippet to demonstrate your results?
Here, using 1.0.2.dev3545, I see:

In [26]: x = N.random.random(100)

In [27]: timeit f = file('/tmp/blah.dat','w'); f.write(str(x))
100 loops, best of 3: 1.77 ms per loop

In [28]: timeit f = file('/tmp/blah','w'); x.tofile(f)
10000 loops, best of 3: 100 µs per loop

(I see the same results for heterogeneous arrays)

Cheers
Stéfan
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion



More information about the Numpy-discussion mailing list