[SciPy-User] equivalent of tolist().index(entry) for numpy 1d array of strings

Keith Goodman kwgoodman@gmail....
Mon Dec 21 20:27:05 CST 2009


On Mon, Dec 21, 2009 at 6:09 PM, Ryan Krauss <ryanlists@gmail.com> wrote:
> I am still open to more elegant solutions, but it seems like my
> concerns about .tolist() being inefficient are unfounded (this may be
> an indicator that I don't understand the inner workings of numpy very
> well).
>
> Here is my test:
>
> t1 = time.time()
> index1 = where(self.md5sum==photo.md5sum)[0][0]
> t2 = time.time()
> index2 = mysearch(self.md5sum, photo.md5sum)
> t3 = time.time()
> index3 = self.md5sum.tolist().index(photo.md5sum)
> t4 = time.time()

If you are using ipython then it is handly, and more accurate, to use
timeit. At the ipython prompt try:

timeit where(self.md5sum==photo.md5sum)[0][0]

>
> All 3 approaches lead to the same result.  Here are my timing results:
> t2-t1=4.81605529785e-05
> t3-t2=4.98294830322e-05
> t4-t3=2.00271606445e-05
>
> def mysearch(arrayin, element):
>    bool_vect = where(arrayin==element)[0]
>    assert(len(bool_vect)==1), 'Did not find exactly 1 match for ' +
> str(element)
>    return bool_vect[0]

If element is not in arrayin then mysearch will crash. Same for .index.

>
> Now, for this test, the arrays didn't have very many elements (10 ish).
>
> FWIW,
>
> Ryan
>
> On Mon, Dec 21, 2009 at 7:53 PM, Ryan Krauss <ryanlists@gmail.com> wrote:
>> I wrote some code to work with csv spreadsheet files by reading the
>> columns into lists, but I need to rework the code to work with numpy
>> 1d arrays of strings rather than lists.  I need to search one of these
>> columns/arrays.  What is the best way to find the index for the
>> element that matches a certain string (or maybe just the first element
>> to match such a string)?
>>
>> With the columns as lists, I was doing
>> index = mylist.index(entry)
>>
>> So, I could obviously do
>> index = mylist.tolist().index(entry)
>>
>> but I don't know if that would be slower or clumsier than something like
>> bool_vect = where(mylist==entry)[0]
>> index = bool_vect[0]
>>
>> or just
>>
>> index = where(mylist==entry)[0][0]
>>
>> Any thoughts?  Is there an easier way?
>>
>> Thanks,
>>
>> Ryan
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list