[Numpy-discussion] can I mapping a np.darray class with a text file instead of reading the file in to mem?

Sebastian Haase seb.haase@gmail....
Mon Oct 4 02:59:05 CDT 2010


Hi,

if you have 3 cols of 10 000 000  lines, that should add up 30
Mega-numbers. That is 240 MB for double, and 120 MB for single
precision.
That should not require  a 64bit OS.
You probably have a problem because reading from text is using extra
memory. Can you not convert the file "line-by-line" into a second,
binary, file ?
Otherwise, you might want to look into the "appendable" ndarray the
Chris Barker wrote about on  this list not too long ago.
And you might want to read this post:
http://old.nabble.com/Memory-usage-of-numpy-arrays-td29107053.html

Cheers,
- Sebastian Haase


On Sat, Oct 2, 2010 at 5:23 PM, kee chen <keekychen.shared@gmail.com> wrote:
> Dear All,
>
> I have memory problem in reading data from text file to a np.darray. It is
> because I have low mem on my pc and the data is too big.
> Te data is stored as 3 cols text and may have 10000000 records look like
> this
>
> 0.64984279 0.587856227 0.827348652
> 0.33463377 0.210916859 0.608797746
> 0.230265156 0.390278562 0.186308355
> 0.431187207 0.127007937 0.949673389
> ...
>
> 10000000 LINES OMITTED HERE
> ...
> 0.150027782 0.800999655 0.551508963
> 0.255163742 0.785462049 0.015694154
>
>
> After googled, I found 3 ways may solve this problem:
>     1.hardware upgrade(upgrade memory, upgrade arch to x64 ..... )
>     2. filter the data before processing
>     3. using pytable
>
> However , I am trying to think another possibility - the mem-time trade-off.
>
> Can I design a class inherit from the np.darray then make it mapping with
> the text file?
> It may works in such a way, inside of this class only maintain a row object
> and  total row ID a.k.a the rows of the file. the row mapping may look like
> this:
>
> an row object   <--- bind--->   row ID in text file  <--- bind---> function
> row_eader()
>
> Wen np function be applied on this object, the actual date is from function
> row_eader(actual row ID).
>
> I have no idea how to code it then may I get support here to design such a
> class? Thanks!
>
>
> Rgs,
>
> KC
>
>
>
>
>
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


More information about the NumPy-Discussion mailing list