[Numpy-discussion] Fixing #736 and possible memory leak

Charles R Harris charlesr.harris@gmail....
Thu Apr 24 20:11:27 CDT 2008


On Thu, Apr 24, 2008 at 5:58 PM, Robert Kern <robert.kern@gmail.com> wrote:

> On Thu, Apr 24, 2008 at 5:37 PM, Charles R Harris
> <charlesr.harris@gmail.com> wrote:
> > Hi,
> >
> > I've been looking into ticket #736 and playing with some things. In
> > arrayobject.c starting at line 8534 I added a check for strings.
> >
> >         if (PyString_Check(op)) {
> >             r = Array_FromPyScalar(op, newtype);
> >          }
> >         if (PySequence_Check(op)) {
> >             PyObject *thiserr = NULL;
> >
> >             /* necessary but not sufficient */
> >             Py_INCREF(newtype);
> >             r = Array_FromSequence(op, newtype, flags & FORTRAN,
> >                                     min_depth, max_depth);
> >             if (r == NULL && (thiserr=PyErr_Occurred())) {
> >                 if (PyErr_GivenExceptionMatches(thiserr,
> >                                                 PyExc_MemoryError)) {
> >                      return NULL;
> >                 }
> >
> > I think there may be a failure to decrement the reference to newtype
> unless
> > Array_FromSequence does that (nasty side effect);
> >
> > Anyway, the added check for a string fixes the conversion problem for
> such
> > things as int32('123'). There remains a problem with array('123',
> > dtype=int32) and with array(['123','123'], dtype=int32), but I think I
> can
> > track those down. The question is, will changing the current behavior so
> > that strings get converted to numbers cause problems with other programs
> out
> > there. I suspect I also need to check that strings are converted this way
> > only when the type is explicitly given, not detected.
>
> Seems to work for me.
>
> In [5]: array([124, '123', '123'])
> Out[5]:
> array(['124', '123', '123'],
>      dtype='|S4')


Sure, but you didn't specify the type, so numpy determined that it was numpy
string type. Wrong test. Try

In [1]: array(['123'], dtype=int32)
Out[1]: array([[1, 2, 3]])

In [2]: a = ones(3, dtype=int32)

In [3]: a[...] = '123'

In [4]: a
Out[4]: array([1, 2, 3])

In [5]: a[...] = int32('123')

In [6]: a
Out[6]: array([123, 123, 123])

So on and so forth. The problem is this bit of code (among others)

    stop_at_string = ((type == PyArray_OBJECT) ||
                      (type == PyArray_STRING &&
                       typecode->type == PyArray_STRINGLTR) ||
                      (type == PyArray_UNICODE) ||
                      (type == PyArray_VOID));


The question is, how do we interpret a string when the type is specified? I
think in that case we should try to convert the string to the relevant type,
just as we cast numbers to the relevant type. So we should always stop at
string.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080424/1309aea8/attachment.html 


More information about the Numpy-discussion mailing list