[Numpy-discussion] Tabular data package

josef.pktd@gmai... josef.pktd@gmai...
Tue Oct 6 12:01:18 CDT 2009

On Tue, Oct 6, 2009 at 12:31 PM,  <josef.pktd@gmail.com> wrote:
> On Mon, Oct 5, 2009 at 5:22 PM, Elaine Angelino
> <elaine.angelino@gmail.com> wrote:
>> Hi there,
>> We are writing to announce the release of "Tabular", a package of Python
>> modules for working with tabular data.
>> Tabular is a package of Python modules for working with tabular data. Its
>> main object is the tabarray class, a data structure for holding and
>> manipulating tabular data. By putting data into a tabarray object, you’ll
>> get a representation of the data that is more flexible and powerful than a
>> native Python representation. More specifically, tabarray provides:
>> -- ultra-fast filtering, selection, and numerical analysis methods, using
>> convenient Matlab-style matrix operation syntax
>> -- spreadsheet-style operations, including row & column operations, 'sort',
>> 'replace', 'aggregate', 'pivot', and 'join'
>> -- flexible load and save methods for a variety of file formats, including
>> delimited text (CSV), binary, and HTML
>> -- helpful inference algorithms for determining formatting parameters and
>> data types of input files
>> -- support for hierarchical groupings of columns, both as data structures
>> and file formats
>> You can download Tabular from PyPI (http://pypi.python.org/pypi/tabular/) or
>> alternatively clone our hg repository from bitbucket
>> (http://bitbucket.org/elaine/tabular/).  We also have posted tutorial-style
>> Sphinx documentation (http://www.parsemydata.com/tabular/).
>> The tabarray object is based on the record array object from the Numerical
>> Python package (NumPy), and Tabular is built to interface well with NumPy in
>> general.  Our intended audience is two-fold: (1) Python users who, though
>> they may not be familiar with NumPy, are in need of a way to work with
>> tabular data, and (2) NumPy users who would like to do spreadsheet-style
>> operations on top of their more "numerical" work.
>> We hope that some of you find Tabular useful!
>> Best,
>> Elaine and Dan
> I briefly looked at the sphinx docs and the code. Tabular looks pretty
> useful and
> the code can be partially read as recipes for working with recarrays
> or structured
> arrays. Thanks for the choice of license (it makes looking at the code "legal").
> I didn't see any explicit nan handling. Are missing values allowed
> e.g. in the constructor?
> I looked a bit closer at function like tabular.fast.recarrayisin since
> I always have problems
> with these row operations.
> Are these function supposed to work with arbitrary structured arrays?
> The tests are only
> for a 1d integer arrays.
> With floats the default string representation doesn't sort correctly.
> Or am I misreading the function?
>>>> arr = np.array([6,1,2,1e-13,0.5*1e-14,1,2e25,3,0,7]).view([('',float)]*2)
>>>> arr
> array([(6.0, 1.0), (2.0, 1e-013), (5e-015, 1.0),
>       (2.0000000000000002e+025, 3.0), (0.0, 7.0)],
>      dtype=[('f0', '<f8'), ('f1', '<f8')])
>>>> np.sort([str(l) for l in arr])
> array(['(0.0, 7.0)', '(2.0, 1e-013)', '(2.0000000000000002e+025, 3.0)',
>       '(5e-015, 1.0)', '(6.0, 1.0)'],
>      dtype='|S30')

Maybe this doesn't matter for the purpose of this function.
I will download and try the code before I make any more irrelevant


> Being able to do a searchsorted on rows of an array would be a useful feature
> in numpy. Is there a sortable 1d representation of the rows of a 2d float or
> mixed type array?
> Thanks,
> Josef
