[Numpy-discussion] can I mapping a np.darray class with a text file instead of reading the file in to mem?
Mon Oct 4 02:59:05 CDT 2010
if you have 3 cols of 10 000 000 lines, that should add up 30
Mega-numbers. That is 240 MB for double, and 120 MB for single
That should not require a 64bit OS.
You probably have a problem because reading from text is using extra
memory. Can you not convert the file "line-by-line" into a second,
binary, file ?
Otherwise, you might want to look into the "appendable" ndarray the
Chris Barker wrote about on this list not too long ago.
And you might want to read this post:
- Sebastian Haase
On Sat, Oct 2, 2010 at 5:23 PM, kee chen <email@example.com> wrote:
> Dear All,
> I have memory problem in reading data from text file to a np.darray. It is
> because I have low mem on my pc and the data is too big.
> Te data is stored as 3 cols text and may have 10000000 records look like
> 0.64984279 0.587856227 0.827348652
> 0.33463377 0.210916859 0.608797746
> 0.230265156 0.390278562 0.186308355
> 0.431187207 0.127007937 0.949673389
> 10000000 LINES OMITTED HERE
> 0.150027782 0.800999655 0.551508963
> 0.255163742 0.785462049 0.015694154
> After googled, I found 3 ways may solve this problem:
> 1.hardware upgrade(upgrade memory, upgrade arch to x64 ..... )
> 2. filter the data before processing
> 3. using pytable
> However , I am trying to think another possibility - the mem-time trade-off.
> Can I design a class inherit from the np.darray then make it mapping with
> the text file?
> It may works in such a way, inside of this class only maintain a row object
> and total row ID a.k.a the rows of the file. the row mapping may look like
> an row object <--- bind---> row ID in text file <--- bind---> function
> Wen np function be applied on this object, the actual date is from function
> row_eader(actual row ID).
> I have no idea how to code it then may I get support here to design such a
> class? Thanks!
> NumPy-Discussion mailing list
More information about the NumPy-Discussion