[SciPy-User] MemoryError with tsfromtxt
Thu Sep 9 08:29:32 CDT 2010
On 09/09/2010 06:19 AM, Pierre GM wrote:
> On Sep 9, 2010, at 1:11 PM, Timmie wrote:
>>> You must have quite a huge file... Note that it's not a scikits.timeseries
>> pb, just a standard numpy one.
>> The file has 298 MB.
>> 5370772 records (rows); data in minutely frequency.
>>>> And what could I do to mitigate it?
>>> Cut the file in pieces ?
>> and then concatenate the timeseries?
> That's the idea. Could you cut it day by day, or week by week, or even month by month to reduce the load ?
> The issue is that genfromtxt has to keep a lot of information in memory (a list of values, a list of masks) before creating the array, and you're overloading Python's capacity to deal with it...
> SciPy-User mailing list
You could buy more memory because 5.4 million rows can add up very
quickly with many columns. Note that you also need contiguous memory
If you know the format of the input, then use something else like loadtxt.
If you know the size and format then you can slowly iterate over the
file and input the values directly into an empty array or use Chris's
code to append to an array.
More information about the SciPy-User