[SciPy-User] equivalent of tolist().index(entry) for numpy 1d array of strings

Ryan Krauss ryanlists@gmail....
Mon Dec 21 20:09:17 CST 2009


I am still open to more elegant solutions, but it seems like my
concerns about .tolist() being inefficient are unfounded (this may be
an indicator that I don't understand the inner workings of numpy very
well).

Here is my test:

t1 = time.time()
index1 = where(self.md5sum==photo.md5sum)[0][0]
t2 = time.time()
index2 = mysearch(self.md5sum, photo.md5sum)
t3 = time.time()
index3 = self.md5sum.tolist().index(photo.md5sum)
t4 = time.time()

All 3 approaches lead to the same result.  Here are my timing results:
t2-t1=4.81605529785e-05
t3-t2=4.98294830322e-05
t4-t3=2.00271606445e-05

def mysearch(arrayin, element):
    bool_vect = where(arrayin==element)[0]
    assert(len(bool_vect)==1), 'Did not find exactly 1 match for ' +
str(element)
    return bool_vect[0]

Now, for this test, the arrays didn't have very many elements (10 ish).

FWIW,

Ryan

On Mon, Dec 21, 2009 at 7:53 PM, Ryan Krauss <ryanlists@gmail.com> wrote:
> I wrote some code to work with csv spreadsheet files by reading the
> columns into lists, but I need to rework the code to work with numpy
> 1d arrays of strings rather than lists.  I need to search one of these
> columns/arrays.  What is the best way to find the index for the
> element that matches a certain string (or maybe just the first element
> to match such a string)?
>
> With the columns as lists, I was doing
> index = mylist.index(entry)
>
> So, I could obviously do
> index = mylist.tolist().index(entry)
>
> but I don't know if that would be slower or clumsier than something like
> bool_vect = where(mylist==entry)[0]
> index = bool_vect[0]
>
> or just
>
> index = where(mylist==entry)[0][0]
>
> Any thoughts?  Is there an easier way?
>
> Thanks,
>
> Ryan
>


More information about the SciPy-User mailing list