[SciPy-user] organizing data

Gael Varoquaux gael.varoquaux@normalesup....
Wed Nov 21 09:54:57 CST 2007

Let's keep this on the mailing list, or else, we can switch to French
(not that I mind English).

On Wed, Nov 21, 2007 at 03:59:39PM +0100, Xavier Barthelemy wrote:
> So I think you did not understand my problem. It's a problem of 
> organizing and cutting by slice of rows in my data.

OK, you're telling me you already have interpolated your data, and you
have all the values you need (ie regular grids in the different
directions you are interested in), but you just need to sort them out.

> My data comports 18 values witch depends of 12 parameters.

> so I have my data like that:
> by rows:
> first the 12 parameters, then the 18 values.

> each rows represent a numeric experiment. I am doing parameters 
> exploration, so each varies independently. But the discretization of the 
> space parameters is not "regular": i have refined some for some values 
> of the others parameters.

If you want to do image plot, it will be more easier to have regular
data, that's where interpolation comes in. It looks to me like you want
to have an interpolation function f({P}) -> {V} where {P} is your set of
parameters and {V} your set of values. When you want to plot the cut
along a hyperplane HP, you simply choose a regular grid of this hyper
plane, and apply your interpolation function f on it. That's how I would
do it, if I understand your problem correctly.

> so my problem is really what I have (badly i guess) explained: I would 
> like to plot 2D (and 3D) graphs.
> Let's suppose that I have X ,Y and Z datas corresponding. knowing that, 
> how do you plot Y(X) with Z constant when you have a bunch of three 
> columns data?

In 1D that's really easy: if D is your [X, Y, Z] array, I would do

X = D[:, 0]
Y = D[:, 1]
Z = D[:, 2]

# Select all the data for Z = z0
x = X[Z==z0]
y = Y[Z==z0]

# Sort, just to make things prettier
y = y[argsort(x)]

plot(x, y)

> you 'll have to first sort by Z, and then by X. so when you'll plot 
> that, for each "X", sequentially sorted, you'll have the different "Y" 
> for each "Z" values. consequently you will have the number of different 
> Z values plots.

OK, you want to do this for all values of Z.

> And my problem now is the generalization with 12 parameters, let's name 
> them from A to J. How I'll do if I want to plot F(H)? the same, I will 
> sort by each of the 10 parameters and finally by H and I will have a 
> family of plots.

Yes, you can do this in a nice way using an array with fields and the
"order" argument to sort.

> but now I want to cut (slice) them to have each plot independently, and 
> i can plot them by the interfaced gnuplot.

You can always generalize the cutting method used in my example. If U, V,
W are parameters (similar to Z in the example above), you can define a
mask array:

mask = (U == u0) & (V == v0) & (W == w0)

# You dont really need x and y arrays, you could directly go to xy
x = X[mask]
y = Y[mask]
xy = empty(x.shape, dtype=dtype([('x','float'), ('y','float')]))
xy['x'] = x
xy['y'] = y

xy.sort(order=('x', 'y'))

then you have in xy['x'] and xy['y'] what you are interested in, if I
understand things right.

But I still think you need a regular grid, so either your data already
has that structure, and I don't understand why it has been "shuffled", or
it hasn't, and you'll need interpolating, so why bother sorting?
(selecting might be useful to reduce the amount of points).



More information about the SciPy-user mailing list