[Numpy-discussion] [ANN] New open source project for labeled arrays

josef.pktd@gmai... josef.pktd@gmai...
Wed Jan 27 21:44:45 CST 2010


On Wed, Jan 27, 2010 at 10:13 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
> On Jan 27, 2010, at 9:10 PM, Keith Goodman wrote:
>> I recently opened sourced one of my packages. It is a labeled array
>> that I call larry.
>>
>> A two-dimensional larry, for example, contains a 2d NumPy array with
>> labels on each row and column. A larry can have any dimension.
>>
>> Alignment by label is automatic when you add (or subtract, multiply,
>> divide) two larrys.
>>
>> larry has built-in methods such as movingsum, ranking, merge, shuffle,
>> zscore, demean, lag as well as typical NumPy methods like sum, max,
>> std, sign, clip. NaNs are treated as missing data.
>
> So you can't have an integer larry with missing data ?
>
>> You can archive larrys in HDF5 format using save and load or using a
>> dictionary-like interface.
>>
>> I'm working towards a 0.1 release. In the meantime, comments,
>> suggestions, critiques are all appreciated.
>>
>
> I'll have to check it (hopefully I'll have a bit more time in the next couple of weeks), but what are the main differences/advantages of using your approach compared to pandas or tabular ?

In a very simplified characterization, my impression is they all try
to do the same thing in different ways and with different emphasis
pandas is a dictionary (not only), tabular are structured arrays,
larry is an nd array by delegation, all of them for generic axis
labels as far as I understand, all based on nan for missing values
scikits.timeseries has the more elaborate time support and is based on
masked arrays

getitem and slicing work differently, another version of a labeled
array is http://github.com/fperez/datarray/blob/master/datarray.py

I'm trying to work on an example how we can move our data between the
different implementations, because depending on the task one
implementation or another might be more convenient. And there seems to
be enough compatibility to do it without loss of information.

Josef

" http://esciencenews.com/articles/2010/01/19/too.many.choices.new.study.says.more.usually.better
"


>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list