[Numpy-discussion] Question about improving genfromtxt errors
Tue Sep 29 15:36:59 CDT 2009
On 09/29/2009 01:30 PM, Pierre GM wrote:
> On Sep 29, 2009, at 1:57 PM, Bruce Southey wrote:
>> On 09/29/2009 11:37 AM, Christopher Barker wrote:
>>> Pierre GM wrote:
>> Probably more than memory is the execution time involved in printing
>> these problem rows.
> The rows with problems will be printed outside the loop (with at least
> an associated warning or possibly raising an exception). My concern is
> to whether store only the tuples (index of the row, nb of columns) for
> the invalid rows, or just create a list of nb of columns that I'd
> parse afterwards. The first solution requires an extra test in the
> loop, the second may waste some memory space.
> Bah, I'll figure it out. Please send me some test cases so that I can
> time/test the best option.
> NumPy-Discussion mailing list
The first case just has to handle a missing delimiter - actually I
expect that most of my cases would relate this. So here is simple Python
code to generate arbitrary large list with the occasional missing delimiter.
I set it so it reads the desired number of rows and frequency of bad
rows from the linux command line.
$time python tbig.py 1000000 100000
If I comment out the extra prints in io.py that I put in, it takes about
22 seconds to finish if the delimiters are correct. If I have the
missing delimiter it takes 20.5 seconds to crash.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 530 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090929/5572dd90/attachment.py
More information about the NumPy-Discussion