[SciPy-user] handling of huge files for post-processing

Christoph Scheit Christoph.Scheit@lstm.uni-erlangen...
Tue Feb 26 03:01:05 CST 2008


Hello David,

I guess that everythink is kept in memory... but I don't
know how to handle this problem using iterators. Can
you give me some more detail? You read your files
all in once?

One problem is, that, let's assume I have three files
a, b and c, then
b depends on data from a
c depends on data from b (and maybe from a, but
this might be not the case in 99%)
This is due to differences in signal runtime...

christoph

------------------------------

Message: 4
Date: Mon, 25 Feb 2008 09:53:31 -0500
From: "David Huard" <david.huard@gmail.com>
Subject: Re: [SciPy-user] handling of huge files for post-processing
To: "SciPy Users List" <scipy-user@scipy.org>
Message-ID:
	<91cf711d0802250653g652df1f9mdd9aaa5adf869bc5@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi Cristoph,

I am not sure exactly what causes your method to fail but it might be that
you are trying to hold all the arrays in memory at once. Can you do your
calculation using iterators/generators ? The idea is to load into memory
only the part of the array that you need for a given calculation, store the
result and continue iterating.  I used to process ~2GB files using iterators
from PyTables tables and it worked smoothly.

David


2008/2/25, Christoph Scheit <Christoph.Scheit@lstm.uni-erlangen.de>:
>
> Hello everybody,
>
> I get from a Fortran-Code (CFD) binary files containing
> the acoustic pressure at some distinct points.
> The files has N "lines" which look like this:
>
> TimeStep(int) DebugInfo (int) AcousticPressure(float)
>
> and is binary. My problem is now, that the file can be
> huge (> 100 MB) and that after several runs on a cluster
> indeed not only one but 20 - 50 files of that size are
> to be post-processed.
>
> Since the CFD code runs parallel, I have to sum up
> the results from different cpu's (cpu 1 calculates only
> a fraction of the acoustic pressure of point p and time step
> t, so that I have to sum over all cpu's)
>
> Currently I'm reading all the data into a sqlite-table, than
> I group the data, summing up over the processors and
> then I'm writing out files containing the data of the single
> points. This approach works for smaller files somehow,
> but does not seem to be working for big files like described
> above.
>
> Do you have some ideas on this problem? Thank you very
> much in advance,
>
> Christoph
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-user/attachments/20080225/33d1fb1c/attachment-0001.html 

------------------------------

Message: 5
Date: Mon, 25 Feb 2008 15:58:13 +0100
From: Johann Cohen-Tanugi <cohen@slac.stanford.edu>
Subject: Re: [SciPy-user] order in profiles and packages
To: SciPy Users List <scipy-user@scipy.org>
Message-ID: <47C2D785.9090405@slac.stanford.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

my apologies, this was the wrong list.... I submitted it to ipython list.
Johan


------------------------------

Message: 6
Date: Mon, 25 Feb 2008 17:14:27 +0100
From: "Shane Legg" <shane@vetta.org>
Subject: [SciPy-user] Bug in matplotlib plot_wireframe?
To: scipy-user@scipy.org
Message-ID:
	<d13d7ef40802250814v77ec0acbtfbf54f7e7e5c20db@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

I'm new here so if this isn't the right place to ask just let
me know where I should head.  Thanks.

I think there is a significant bug in plot_wireframe in matplotlib
where it incorrectly displays the Z axis values.  The code below
demonstrates the problem:


import scipy
import pylab as p
import matplotlib.axes3d as p3
from numpy import *

"""
# If you do a wire frame of the following, the graph is correct:
Z = scipy.array(
[[ 0.52,  0.00020],
 [ 0.45,  0.00018],
 [ 0.34,  0.00016]] )
"""

# but if you put negative signs in:
Z = scipy.array(
[[ -0.52,  -0.00020],
 [ -0.45,  -0.00018],
 [ -0.34,  -0.00016]] )

"""
 the graph displays:
[[ -0.62, -0.10020 ],
 [ -0.55, -0.10018 ],
 [ -0.44, -0.10016 ]]
"""

X, Y = meshgrid(arange(0, 3, 1.0), arange(0, 4, 1.0))

fig = p.figure()
ax = p3.Axes3D(fig)
ax.plot_wireframe(X, Y, Z)

ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')

p.show()


I'm running Ubuntu 7.10 x64 with python 2.5.1-1ubuntu2 and
python-scipy 0.5.2-9ubuntu4 both installed from the .deb files.
I sent the above code to somebody with a 32bit Linux system
and they had the same problem.

Any help appreciated!

Cheers
Shane
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-user/attachments/20080225/6f9bbe82/attachment-0001.html 

------------------------------

Message: 7
Date: Mon, 25 Feb 2008 10:53:22 -0600
From: "Robert Kern" <robert.kern@gmail.com>
Subject: Re: [SciPy-user] Bug in matplotlib plot_wireframe?
To: shane@vetta.org, "SciPy Users List" <scipy-user@scipy.org>
Message-ID:
	<3d375d730802250853j112bb67ah84847faef07b1255@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

On Mon, Feb 25, 2008 at 10:14 AM, Shane Legg <shane@vetta.org> wrote:
> Hi,
>
> I'm new here so if this isn't the right place to ask just let
> me know where I should head.  Thanks.

The appropriate matplotlib list is here:

  https://lists.sourceforge.net/lists/listinfo/matplotlib-users

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


------------------------------

_______________________________________________
SciPy-user mailing list
SciPy-user@scipy.org
http://projects.scipy.org/mailman/listinfo/scipy-user


End of SciPy-user Digest, Vol 54, Issue 48
******************************************



More information about the SciPy-user mailing list