[SciPy-User] equivalent of tolist().index(entry) for numpy 1d array of strings

josef.pktd@gmai... josef.pktd@gmai...
Mon Dec 21 22:39:11 CST 2009


On Mon, Dec 21, 2009 at 9:09 PM, Ryan Krauss <ryanlists@gmail.com> wrote:
> I am still open to more elegant solutions, but it seems like my
> concerns about .tolist() being inefficient are unfounded (this may be
> an indicator that I don't understand the inner workings of numpy very
> well).
>
> Here is my test:
>
> t1 = time.time()
> index1 = where(self.md5sum==photo.md5sum)[0][0]

"where" finds all matching elements, there is no early stopping. I
think there was a similar argument recently that python is faster if
early stopping/return is desired.

> t2 = time.time()
> index2 = mysearch(self.md5sum, photo.md5sum)
> t3 = time.time()
> index3 = self.md5sum.tolist().index(photo.md5sum)

index finds only the first match, and then can return immediately.

I guess, if you only care about the first element that matches the
condition, then python might always be faster than numpy. The
advantage of "where" would be, if you need to do something with  all
elements that match, e.g. replace them with something else.

Josef

> t4 = time.time()
>
> All 3 approaches lead to the same result.  Here are my timing results:
> t2-t1=4.81605529785e-05
> t3-t2=4.98294830322e-05
> t4-t3=2.00271606445e-05
>
> def mysearch(arrayin, element):
>    bool_vect = where(arrayin==element)[0]
>    assert(len(bool_vect)==1), 'Did not find exactly 1 match for ' +
> str(element)
>    return bool_vect[0]
>
> Now, for this test, the arrays didn't have very many elements (10 ish).
>
> FWIW,
>
> Ryan
>
> On Mon, Dec 21, 2009 at 7:53 PM, Ryan Krauss <ryanlists@gmail.com> wrote:
>> I wrote some code to work with csv spreadsheet files by reading the
>> columns into lists, but I need to rework the code to work with numpy
>> 1d arrays of strings rather than lists.  I need to search one of these
>> columns/arrays.  What is the best way to find the index for the
>> element that matches a certain string (or maybe just the first element
>> to match such a string)?
>>
>> With the columns as lists, I was doing
>> index = mylist.index(entry)
>>
>> So, I could obviously do
>> index = mylist.tolist().index(entry)
>>
>> but I don't know if that would be slower or clumsier than something like
>> bool_vect = where(mylist==entry)[0]
>> index = bool_vect[0]
>>
>> or just
>>
>> index = where(mylist==entry)[0][0]
>>
>> Any thoughts?  Is there an easier way?
>>
>> Thanks,
>>
>> Ryan
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list