FW: [Numpy-discussion] typecodes in numarray

Francesc Alted falted at openlc.org
Fri Jan 24 10:48:04 CST 2003


A Divendres 24 Gener 2003 18:02, Todd Miller va escriure:
>
> My [i.e. Todd's]  thoughts about it:
>
> No.  It shows you're thinking about it carefully.   Having looked at all
> of the examples below,  I have some comments:

I mostly agree with your comments, but let point out some thoughts

>
> 1.  The sparseness and obscurity of the typecode "wordspace" are both
> demonstrated here.  There are so few letters to choose from,  they're
> often already used in some other context.  Even given the large number
> of unused letters,  it's often difficult to choose good ones and to
> remember what has been chosen.  I think this is one of the reasons Perry
> chose to replace typecodes with true type objects which have rich,
> regular, and predictable symbolic names.

I completely agree that type objects is a brilliant idea.

> 3. STSCI has layered other software on top of numarray and recarray
> which astronomers use to do work.   It is the friction of that interface
> which makes correcting these consistency problems more difficult than
> might be immediately apparent.

Yeah, I know...

>
> >I think it's important to agree with a definitive set of charcodes and use
> >them uniformly throughout numarray.
>
> I wish this were possible,  but I'm thinking we should try to find an
> alternative approach altogether,  one which may be more verbose but
> implicitly free of conflict.
>
> A means for specifying a recarray format might be created from tuples,
> type objects,  and integer repetition factors.
>
> The verbosity of this approach might be a litte tedious,  but it would
> also be transparent, maintainable, and conflict free.

I think this is a very good idea. In fact, while working in PyTables I was
lately pondering what would be the best way to define record arrays, and I
also think that a verbose approach should be the beast.

After considering metaclasses, and tuples, I ended to a compromise solution
between both which are dictionaries combined with some function or class to
refine the definition.

My current thinking is something like:

recarrDescr = {
    "name"        : defineType(CharType, 16, ""),  # 16-character String
    "TDCcount"    : defineType(UInt8, 1, 0),    # unsigned byte
    "ADCcount"    : defineType(Int16, 1, 0),    # signed short integer
    "grid_i"      : defineType(Int32, 1, 9),    # integer
    "grid_j"      : defineType(Int32, 1, 9),    # integer
    "pressure"    : defineType(Float32, 1, 1.),  # float  (single-precision)
    "temperature" : defineType(Float64, 32, arange(32)),  # double[32]
    "idnumber"    : defineType(Int64, 1, 0),    # signed long long 
    }

where defineType is a class that accepts (type, shape, default) parameters.
It can be extended safely in the future if more needs appear.

Dictionary has the advantage over tuple in that you can map column name to
their contents quite easily, and is more flexible than defining the fields
with a metaclass descendent (see
http://pytables.sourceforge.net/html-doc/usersguide-html3.html#subsection3.1.2)
because dictionarys can be built-up in run-time (although that also migth
metaclass descendents, but in a more misterious way that I think is not
worth of). In addition, dictionary object is available in all python version
whereas metaclasses only from 2.2 on. However, I regard metaclasses as the
most elegant solution (but elegance is not always equivalent to convenience
:().

Perhaps you may want to consider this for using in recarray definition.

>
> I think we should add an "obsolescent feature" warning to numarray and
> recarray which flags any use of character typecodes when the appropriate
> command line switches are set.

Well, I don't fully agree with that. I do believe that classes typecodes to
be a more meaningful way for describing types, but charcodes can be quite
advantageous in certain situations, like in describing in compact way the
contents of a record, or passing this info to C-routines to deal with the
data.

For example, consider the benefits of describing a recarray format as:

"3s4i20d"

instead of

((Int16, 3), 
 (Int32, 4),
 (Float64, 20),
 )

the former being more handy in lots of situations.

I certainly believe that a coexistence of both can be very beneficious,
specially for 3rd party extension makers (like me :).

>
> >Suggestion: if recarray charcodes are not necessary to match the Numeric
> >ones, I propose that using the Python convention maybe a good idea.
> >Look at the table in:
> >http://www.python.org/doc/current/lib/module-struct.html.
>
> This sounds good to me,  except that it will break an existing interface
> that I don't have control over.  Therefore,  I suggest we correct the
> problem by coming up with something better.

Well, if charcodes finally stay in, this have an additional advantage in
that python crew has provided meaningful ways to express padding (character
"x"), endianess ("=", "<", ">") and alignment ("@"). So having a compact
expresion like "@3sx4i20d", apart from resembling chinese to occidentals,
may give a lot of info in a handy way.

-- 
Francesc Alted




More information about the Numpy-discussion mailing list