[Numpy-discussion] Re: [SciPy-user] Table like array

Niklas Volbers Mithrandir42 at web.de
Wed Mar 1 03:41:02 CST 2006


[[Oops, I accidentally sent my post to scipy-users, but it was intended for numpy-discussion, so here's another attempt]]


Hey Michael,

take a look at my attempt of such a Table implementation!

The newest release 0.5.2 of my plotting project

http://developer.berlios.de/projects/sloppyplot

contains a Table class (in Sloppy.Base.dataset) which wraps a heterogeneous numpy array. The class should be fairly self-documenting, at least I hope so. Don't get confused by the 'undolist' stuff, this is my private undo implementation which could be easily removed from the code.

If you want a similar implementation using a list of 1-dimensional arrays, then download the previous release 0.5.1 (which uses Numeric).

The reason I switched over to the heterogeneous approach was that it is easier to provide similar wrappers for 2-d table like data (using a 2d heterogeneous array) and for 2-d matrix like data (using a 2d homogeneous array). Using a list of arrays gives you some problems when you would like to access the rows, because then you are in charge of creating a new 1-d array that represents the row, while with the heterogeneous array you can access both the columns and the rows quite naturally.

By the way, I had first planned to subclass ndarray, but I did not know how to resize the array and still keep the array as such persistent. This is why I wrapped the array into a class called 'Dataset' which you can consider constant.

If you need some more help on this, feel free to ask,

Niklas Volbers.





Travis Oliphant schrieb am 01.03.06 08:16:15:
>
> Michael Sorich wrote:
>
> > Hi,
> >
> > I am looking for a table like array. Something like a 'data frame'
> > object to those familiar with the statistical languages R and Splus.
> > This is mainly to hold and manipulate 2D spreadsheet like data, which
> > tends to be of relatively small size (compared to what many people
> > seem to use numpy for), heterogenous, have column and row names, and
> > often contains missing data.
>
> You could subclass the ndarray to produce one of these fairly easily, I
> think. The missing data item could be handled by a mask stored along
> with the array (or even in the array itself). Or you could use a masked
> array as your core object (though I'm not sure how it handles the
> arbitrary (i.e. record-like) data-types yet).
>
> Alternatively, and probably the easiest way to get started, you could
> just create your own table-like class and use simple 1-d arrays or 1-d
> masked arrays for each of the columns --- This has always been a way to
> store record-like tables.
>
> It really depends what you want the data-frames to be able to do and
> what you want them to "look-like."
>
> > A RecArray seems potentially useful, as it allows different fields to
> > have different data types and holds the name of the field. However it
> > doesn't seem easy to manipulate the data. Or perhaps I am simply
> > having difficulty finding documentation on there features.
>
> Adding a new column/field means basically creating a new array with a
> new data-type and copying data over into the already-defined fields.
> Data-types always have a fixed number of bytes per item. What those
> bytes represent can be quite arbitrary but it's always fixed. So, it
> is always "more work" to insert a new column. You could make that
> seamless in your table class so the user doesn't see it though.
>
> You'll want to thoroughly understand the dtype object including it's
> attributes and methods. Particularly the fields attribute of the dtype
> object.
>
> > eg
> > adding a new column/field (and to a lesser extent a new row/record) to
> > the recarray
>
> Adding a new row or record is actually similar because once an array is
> created it is usually resized by creating another array and copying the
> old array into it in the right places.
>
> > Changing the field/column names
> > make a new table by selecting a subset of fields/columns. (you can
> > select a single field/column, but not multiple).
>
> Right. So far you can't select multiple columns. It would be possible
> to add this feature with a little-bit of effort if there were a strong
> demand for it, but it would be much easier to do it in your subclass
> and/or container class.
>
> How many people would like to see x['f1','f2','f5'] return a new array
> with a new data-type descriptor constructed from the provided fields?
>
> > It would also be nice for the table to be able to deal easily with
> > masked data (I have not tried this with recarray yet) and perhaps also
> > to be able to give the rows/records unique ids that could be used to
> > select the rows/records (in addition to the row/record index), in the
> > same way that the fieldnames can select the fields.
>
> Adding fieldnames to the "rows" is definitely something that a subclass
> would be needed for. I'm not sure how you would even propose to select
> using row names. Would you also use getitem semantics?
>
> > Can anyone comment on this issue? Particularly whether code exists for
> > this purpose, and if not ideas about how best to go about developing
> > such a Table like array (this would need to be limited to python
> > programing as my ability to program in c is very limited).
>
> I don't know of code that already exists for this, but I don't think it
> would be too hard to construct your own data-frame object.
>
> I would probably start with an implementation that just used standard
> arrays of a particular type to represent the internal columns and then
> handle the indexing using your own over-riding of the __getitem__ and
> __setitem__ special methods. This would be the easiest to get working,
> I think.
>
> -Travis

______________________________________________________________
Verschicken Sie romantische, coole und witzige Bilder per SMS!
Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193





More information about the Numpy-discussion mailing list