[SciPy-User] loadtxt and complicated dtype

Gerrit Holl gerrit.holl@ltu...
Tue Aug 16 10:50:14 CDT 2011


Hello,

I have a datafile with 5000 rows and 839 columns that have particular
meanings. I use a complicated dtype to read this data, and used for
this until now loadtxt. However, it seems that it has stopped working
at some point. For example.

>>> from numpy import loadtxt, uint8
>>> from StringIO import StringIO
>>> from numpy.version import version
>>> print version
2.0.0.dev-5cf0a07
>>> loadtxt(StringIO("0 1 2 3"), dtype=[("a", uint8, 2), ("b", uint8, 2)])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/storage4/home/gerrit/.local/lib/python2.6/site-packages/numpy/lib/npyio.py",
line 806, in loadtxt
    X = np.array(X, dtype)
ValueError: setting an array element with a sequence.
>>> loadtxt(StringIO("0 1 2 3"), dtype=[("a", uint8, 4)])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/storage4/home/gerrit/.local/lib/python2.6/site-packages/numpy/lib/npyio.py",
line 806, in loadtxt
    X = np.array(X, dtype)
ValueError: setting an array element with a sequence.

Why does this not work? I have filed a bug-report.
http://projects.scipy.org/numpy/ticket/1936

Alright then, so I can try it in a different way. In my real case I
have a 2-D array M with shape (5000, 839). I have my complicated
dtype:

[('temp', <type 'numpy.float64'>, 91),
 ('hum', <type 'numpy.float64'>, 91),
 ...,
 ('gpoint', <type 'numpy.uint32'>, 1),
 ('ind', <type 'numpy.uint16'>, 1)]
]

whose numbers add up to 839. How do I turn this into an array of size
(5000,) with my requested dtype?
- .view(dtype) does not do what I mean, because this interprets the
actual bytes, and my new array will have a different number of bytes
compared to the old one
- array(M, dtype) does not do what I mean, because this will try to
expand every element of M according to the requested dtype, does
making the array much larger (and throwing a MemoryError).

I want this, because it's a very convenient way to access fields of my
data. It's more convenient to say M["ciw"] than to say M[:, 455:546].
If someone can suggest another way to achieve this convenience, I'm
open for suggestions.

kind regards,
Gerrit Holl.

-- 
Gerrit Holl
PhD student at Division of Space Technology, Luleå University of
Technology, Kiruna, Sweden
http://www.sat.ltu.se/members/gerrit/


More information about the SciPy-User mailing list