[SciPy-dev] Some Q's vis-a-vis Numpy unicode support

David Goldsmith d_l_goldsmith@yahoo....
Tue Aug 11 21:28:32 CDT 2009


Thanks, Josef.  This may just be an artifact of working in a DOS Terminal (but your example, though not printing the accented e, did at least print something different for b vs. b.capitalize()), or it may be because I don't know the right encoding to use, but I tried your code w/ what I found on Wikipedia to be the unicode for the Greek letter delta, namely, u'\x03b04', with both 'cp1252' and 'iso8859-7' encoding (the latter being inferred from the same Wikipedia article) and here's what I get:

>>> b = np.array([u'\x03b04',u'\x03b04'],'<U1').view(np.chararray)
>>> print b.encode('cp1252')[0]
♥
>>> print b.capitalize().encode('cp1252')[0]
♥
>>> print b.encode('iso8859-7')[0]
♥
>>> print b.capitalize().encode('iso8859-7')[0]
♥

i.e., no difference.  If I'm doing something wrong, please let me know; otherwise, for the purpose of documenting chararray.capitalize() - which is my ultimate goal - is there any rhyme or reason behind which unicode characters capitalize() works on and which it doesn't? 

Thanks,

DG
--- On Tue, 8/11/09, josef.pktd@gmail.com <josef.pktd@gmail.com> wrote:

> actually this works (in Idle)
> 
> >>> b =
> np.array([u'\xe9',u'\xe9'],'<U1').view(np.chararray)
> >>> print b.encode('cp1252')[0]
> é
> >>> print b.capitalize().encode('cp1252')[0]
> É
> >>> print b[0].encode('cp1252')
> é
> 
> 
> this looks like a bug ? or is it a known limitation that
> chararrays
> cannot be 0-d
> 
> >>> b0=
> np.array(u'\xe9','<U1').view(np.chararray)
> >>> print b0.encode('cp1252')
> Traceback (most recent call last):
>   File "<pyshell#47>", line 1, in
> <module>
>     print b0.encode('cp1252')
>   File
> "C:\Programs\Python25\Lib\site-packages\numpy\core\defchararray.py",
> line 217, in encode
>     return self._generalmethod('encode',
> broadcast(self, encoding, errors))
>   File
> "C:\Programs\Python25\Lib\site-packages\numpy\core\defchararray.py",
> line 162, in _generalmethod
>     newarr[:] = res
> ValueError: cannot slice a 0-d array
> 
> 
> >
> > Josef
> >
> >>>
> >>> Unless the answer is "No," my real question:
> >>>
> >>> 1) Does chararray.capitalize() capitalize
> non-Roman letters
> >>> that have different lower-case and upper-case
> forms (e.g.,
> >>> the Greek letters)?  If "yes," are there any
> exceptions
> >>> (e.g., Russian letters)?
> >>>
> >>> Thanks!
> >>>
> >>> DG
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >> _______________________________________________
> >> Scipy-dev mailing list
> >> Scipy-dev@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-dev
> >>
> >
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
> 


      


More information about the Scipy-dev mailing list