[Numpy-discussion] improving arraysetops

Neil Crighton neilcrighton@gmail....
Wed Jun 17 05:11:47 CDT 2009


> > What about merging unique and unique1d?  They're essentially identical for an
> > array input, but unique uses the builtin set() for non-array inputs and so is
> > around 2x faster in this case - see below. Is it worth accepting a speed
> > regression for unique to get rid of the function duplication?  (Or can they be
> > combined?)
>
> unique1d can return the indices - can this be achieved by using set(), too?
>

No, set() can't return the indices as far as I know.

> The implementation for arrays is the same already, IMHO, so I would
> prefer adding return_index, return_inverse to unique (automatically
> converting input to array, if necessary), and deprecate unique1d.
>
> We can view it also as adding the set() approach to unique1d, when the
> return_index, return_inverse arguments are not set, and renaming
> unique1d -> unique.
>

This sounds good. If you don't have time to do it, I don't mind having
a go at writing
a patch to implement these changes (deprecate the existing unique1d, rename
unique1d to unique and add the set approach from the old unique, and the other
changes mentioned in http://projects.scipy.org/numpy/ticket/1133).

> I have found a strange bug in unique():
>
> In [24]: l = list(np.random.randint(100, size=1000))
>
> In [25]: %timeit np.unique(l)
> ---------------------------------------------------------------------------
> UnicodeEncodeError                        Traceback (most recent call last)
>
> /usr/lib64/python2.5/site-packages/IPython/iplib.py in ipmagic(self, arg_s)
>      951         else:
>      952             magic_args = self.var_expand(magic_args,1)
> --> 953             return fn(magic_args)
>      954
>      955     def ipalias(self,arg_s):
>
> /usr/lib64/python2.5/site-packages/IPython/Magic.py in
> magic_timeit(self, parameter_s)
>     1829
> precision,
>     1830                                                           best
> * scaling[order],
> -> 1831
> units[order])
>     1832         if tc > tc_min:
>     1833             print "Compiler time: %.2f s" % tc
>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in
> position 28: ordinal not in range(128)
>
> It disappears after increasing the array size, or the integer size.
> In [39]: np.__version__
> Out[39]: '1.4.0.dev7047'
>
> r.

Weird! From the error message, it looks like a problem with ipython's timeit
function rather than unique. I can't reproduce it on my machine
(numpy 1.4.0.dev, r7059;   IPython 0.10.bzr.r1163 ).

Neil


More information about the Numpy-discussion mailing list