[Numpy-discussion] Numpy 2D array from a list error

Bruce Southey bsouthey@gmail....
Wed Sep 23 10:34:52 CDT 2009


On 09/23/2009 10:00 AM, Dave Wood wrote:
> "If the text file has 'numbers and strings' how is numpy meant to know
> what dtype to use?
> Please try genfromtxt especially if columns contain both numbers and
> strings."
> Well, I suppose they are all considered to be strings here. I haven't 
> tried to convert the numbers to floats yet.
> "What happens if you read a file instead of using stdin?"
> Same problem
>
> "It is possible that one or more rows have multiple sequential delimiters.
> Please check the row lengths of your 'data' variable after doing:"
> Already done, they all have the same number of rows.
> The fact that the script works with the first 40k lines, and also with 
> the last 40k lines suggests to me that there is no problem with the file.
> (I calculate column means and standard deviations later in the script 
> - it's only the first two columns which can't be cast to floating 
> point numbers)
>
> "Really without the input or system, it is hard to say anything.
> If you really know your data I would suggest preallocating the array 
> and updating the array one line at a time to avoid the large multiple 
> intermediate objects."
> I'm running on linux. My machine is redhat with 2GB RAM, but when 
> memory became an issue I tried running on other Linux machines with 
> much greater RAM capacities. I don't know what distos.
> I just tried preallocating the array and updating it one line at a 
> time, and that works fine. Thanks very much for the suggestion. :)
> This doesn't seem like the expected behaviour though and the error 
> message seems wrong.
> Many thanks,
> Dave
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>    
Glad it you got a solution.

While far from an expert, with 2GB ram you do not have that much free 
RAM outside the OS and other overheads. With your code, the OS has to 
read all the data in at least once as well as allocate the storage for 
the result and any intermediate objects. So it is easy to exhaust memory.

I agree that the error message is too vague so you could file a ticket.

Use PyTables if memory is a problem for you.
For example, see the recent 'np.memmap and memory usage' thread on numpy 
discussion:
http://www.mail-archive.com/numpy-discussion@scipy.org/msg18863.html
Especially the post by Francesc Alted:
http://www.mail-archive.com/numpy-discussion@scipy.org/msg18868.html

Bruce

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090923/53829d21/attachment.html 


More information about the NumPy-Discussion mailing list