[Numpy-discussion] numpy.random.shuffle

Tim Hochberg tim.hochberg at ieee.org
Wed Nov 22 11:34:11 CST 2006


Robert Kern wrote:
> Christopher Barker wrote:
>   
>> Robert Kern wrote:
>>     
[SNIP]
>>> I copied the algorithm from Python's random
>>> module. At the core of it is a set of swaps:
>>>
>>>     x[i], x[j] = x[j], x[i]
>>>
>>> With the kind of sequences that the stdlib random module is expecting, that
>>> makes perfect sense. However, with N-dim arrays (N > 1), x[i] is a *view* into
>>> the array. By the time that x[j] = x[i] gets executed, x[i] = x[j] has already
>>> executed and the underlying memory that x[i] points to has been modified.
>>>       
>> wouldn't something like:
>>
>> temp = x[i].copy()
>> x[i], x[j] = x[j], temp
>>
>> work?
>>
>> In any case, it should raise an error if ndim > 1, rather than giving a 
>> wrong result.
>>     
>
> The method is intended to work on sequences in general, not just numpy arrays,
> so I can't really use .copy() or even test for ndim > 1.
>
> One possibility is to check if the object is an ndarray (or subclass) and use
> .copy() if so; otherwise, use the current implementation and hope that you
> didn't pass it a Numeric or numarray array (or some other view-based object).
>   
I think I would invert this test and instead check if the object is a 
Python list and *not* copy in that case. Otherwise, use copy.copy to 
copy the object whatever it is. This looks like it would be more robust 
in that it would work in all sensible case, and just be a tad slower in 
some of them.

Another possible refinement / complication would be to special case 1D 
arrays so that they run fastish.

A third possibility involves rewriting this in this form:

    indices = arange(len(x))
    _shuffle_core(indices) # This just does what current shuffle now does
    x[:] = take(x, indices, 0)

Of course, all of these are pretty inefficient in the common case of a 
1D array.

-tim




More information about the Numpy-discussion mailing list