[Numpy-discussion] numpy.random.shuffle
Robert
kxroberto at googlemail.com
Wed Nov 22 14:28:20 CST 2006
Robert Kern wrote:
> Tim Hochberg wrote:
>> Robert Kern wrote:
>
>>> One possibility is to check if the object is an ndarray (or subclass) and use
>>> .copy() if so; otherwise, use the current implementation and hope that you
>>> didn't pass it a Numeric or numarray array (or some other view-based object).
>>>
>> I think I would invert this test and instead check if the object is a
>> Python list and *not* copy in that case. Otherwise, use copy.copy to
>> copy the object whatever it is. This looks like it would be more robust
>> in that it would work in all sensible case, and just be a tad slower in
>> some of them.
>
> I don't want to assume that the only two sequence types are lists and arrays.
> The problem with using copy.copy() on non-arrays is that it, well, makes copies
> of the elements. The objects in the shuffled sequence are not the same objects
> before and after the shuffling. I consider that to be a violation of the spec.
>
> Views are rare outside of numpy/Numeric/numarray, partially because Guido
> considers them to be evil. I'm beginning to see why.
>
>> Another possible refinement / complication would be to special case 1D
>> arrays so that they run fastish.
>>
>> A third possibility involves rewriting this in this form:
>>
>> indices = arange(len(x))
>> _shuffle_core(indices) # This just does what current shuffle now does
>> x[:] = take(x, indices, 0)
>
> That's problematic since the elements all turn into numpy scalar objects:
>
> In [1]: from numpy import *
>
> In [2]: a = range(9,-1,-1)
>
> In [3]: idx = arange(len(a))
>
> In [4]: a[:] = take(a, idx, 0)
>
> In [5]: a
> Out[5]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
>
> In [6]: type(a[0])
> Out[6]: <type 'numpy.int32'>
>
a[:]=take(asarray(a,object),idx,0) ? works also correct with ndarray's even if I didn't dig the reason why... all element will be probably re-casted twice.
Think the take-method on shuffled indizes is basically right and natural for a numpy-shuffler.
The example is just possibly another vote against the default behavior of letting numpy.scalar types out of arrays, which are set up with a "harmless" type.
>>> array([1,2,3],float)
array([ 1., 2., 3.])
>>> type(_[0])
<type 'numpy.float64'>
>>>
is just ill as it I think.
In (Guido's) Python objects should probably come out of collections best as typy as they went in. Currently numpy-scalars will just "infect" the whole app almost like a virus (and kill performance and pickle's etc.)
Of course views are essential for an efficient array type, but type-altering possibly not.
For rare cases for generalized algs (I need to think hard to find even an example), where the array-interface is needed on elements (and a array(obj) cast is too uncomfortable), there could be still the different possibilty:
>>> array([1,2,3],numpy.float64)
then its natural that numpy.float64, numpy.int32.... come out, as the programmer would even expect it so.
Thus maybe for array types:
* float!=numpy.float64 (but common base class (or 'float' itself) maybe)
* int !=numpy.intXX
* complex !=numpy.complex128
* default array type is (python.)float
* default array type from list of ints is (python.)int
* default array type from list of complex is (python.)complex
* default array type of other lists is always <object>
currently this is also problematic:
>>> array([1,2,"3",[]])
array(['1', '2', '3', '[]'],
dtype='|S4')
and even
>>> array([1,2,"3ef",'wefwfewoiwjefo iwjef'])
array(['1', '2', '3ef', 'wefwfewoiwjefo iwjef'],
dtype='|S20')
>>> _[0]='woeifjwo woie pwioef wliuefh lwieufh wleifuh welfiu '
>>> _
array(['woeifjwo woie pwioef', '2', '3ef', 'wefwfewoiwjefo iwjef'],
dtype='|S20')
is rarely what a Pythoneer would expect. Guess fix string arrays should only be created explicitely
Robert
More information about the Numpy-discussion
mailing list