# [SciPy-user] organizing data

Xavier Barthelemy xavier.barthelemy@cmla.ens-cachan...
Wed Nov 21 10:24:10 CST 2007

```Gael Varoquaux a écrit :
> Let's keep this on the mailing list, or else, we can switch to French
> (not that I mind English).

>
> On Wed, Nov 21, 2007 at 03:59:39PM +0100, Xavier Barthelemy wrote:
>> So I think you did not understand my problem. It's a problem of
>> organizing and cutting by slice of rows in my data.
>
>
> OK, you're telling me you already have interpolated your data, and you
> have all the values you need (ie regular grids in the different
> directions you are interested in), but you just need to sort them out.
>

no, I did not interpolate my data, because you cannot. I am looking for
sub-critical instabilities in binary instationary miscible
compressible viscous fluids. so each numerical experiment is the result
of (for some) days or hours of calculus in a high performance parallel
computer.

>> My data comports 18 values witch depends of 12 parameters.
>
>> so I have my data like that:
>> by rows:
>> first the 12 parameters, then the 18 values.
>
>> each rows represent a numeric experiment. I am doing parameters
>> exploration, so each varies independently. But the discretization of the
>> space parameters is not "regular": i have refined some for some values
>> of the others parameters.
>
> If you want to do image plot, it will be more easier to have regular
> data, that's where interpolation comes in. It looks to me like you want
> to have an interpolation function f({P}) -> {V} where {P} is your set of
> parameters and {V} your set of values. When you want to plot the cut
> along a hyperplane HP, you simply choose a regular grid of this hyper
> plane, and apply your interpolation function f on it. That's how I would
> do it, if I understand your problem correctly.

Yes, you're right, i would do it like that, but I can't. Too much
calculus. I am refining some zones in the space parameter independently.

>
>> so my problem is really what I have (badly i guess) explained: I would
>> like to plot 2D (and 3D) graphs.
>> Let's suppose that I have X ,Y and Z datas corresponding. knowing that,
>> how do you plot Y(X) with Z constant when you have a bunch of three
>> columns data?
>
> In 1D that's really easy: if D is your [X, Y, Z] array, I would do
>
> X = D[:, 0]
> Y = D[:, 1]
> Z = D[:, 2]
>
> # Select all the data for Z = z0
> x = X[Z==z0]
> y = Y[Z==z0]
>
> # Sort, just to make things prettier
> y = y[argsort(x)]
> x.sort()
>
> plot(x, y)

i will try it. may be i have thought it too complicated.
>
>> you 'll have to first sort by Z, and then by X. so when you'll plot
>> that, for each "X", sequentially sorted, you'll have the different "Y"
>> for each "Z" values. consequently you will have the number of different
>> Z values plots.
>
> OK, you want to do this for all values of Z.
>
>> And my problem now is the generalization with 12 parameters, let's name
>> them from A to J. How I'll do if I want to plot F(H)? the same, I will
>> sort by each of the 10 parameters and finally by H and I will have a
>> family of plots.
>
> Yes, you can do this in a nice way using an array with fields and the
> "order" argument to sort.
>
>> but now I want to cut (slice) them to have each plot independently, and
>> i can plot them by the interfaced gnuplot.
>
> You can always generalize the cutting method used in my example. If U, V,
> W are parameters (similar to Z in the example above), you can define a
>
> mask = (U == u0) & (V == v0) & (W == w0)
>
> # You dont really need x and y arrays, you could directly go to xy
> xy = empty(x.shape, dtype=dtype([('x','float'), ('y','float')]))
> xy['x'] = x
> xy['y'] = y
>
> xy.sort(order=('x', 'y'))
>
> then you have in xy['x'] and xy['y'] what you are interested in, if I
> understand things right.

sounds good
>
> But I still think you need a regular grid, so either your data already
> has that structure, and I don't understand why it has been "shuffled", or
> it hasn't, and you'll need interpolating, so why bother sorting?
> (selecting might be useful to reduce the amount of points).

they are "shuffled" because I did not saved a data-base with the logical
links. so if I want the 7th column in function of the 5th, they are
"shuffled"

thanks for your help, I'll try just now

cheers
Xavier

```