[SciPy-user] handling of huge files for post-processing
Cristoph,
Do you mean that b depends on the entire dataset a ? In this case, you might
consider buying additional memory; this is often way cheaper in terms of
time than trying to optimize the code.
What I mean by iterators is that when you open a binary file, you generally
have the possibility to iterate over each element in the file. For instance,
when reading an ascii file:
for line in f.readline():
some operation on the current line.
instead of loading all the file in memory:
lines = f.readlines()
This way, only one line is kept in memory at a time. If you can write your
code in this manner, this might solve your memory problem. For instance,
here is a generator that opens two files and will return the current line of
each file each time it's next() method is called
def read():
a = open('filea', 'r')
b = open('fileb', 'r')
la = a.readline()
lb = b.readline()
while (la and lb):
yield la,lb
la = a.readline()
lb = b.readline()
for a, b in read():
some operation on a,b
HTH,
David
2008/2/26, Christoph Scheit <Christoph.Scheit@lstm.uni-erlangen.de>:
>
> Hello David,
>
> I guess that everythink is kept in memory... but I don't
> know how to handle this problem using iterators. Can
> you give me some more detail? You read your files
> all in once?
>
> One problem is, that, let's assume I have three files
> a, b and c, then
> b depends on data from a
> c depends on data from b (and maybe from a, but
> this might be not the case in 99%)
> This is due to differences in signal runtime...
>
> christoph
>
>
>
