[Numpy-discussion] array slicing questions

Vlastimil Brom vlastimil.brom@gmail....
Mon Jul 30 14:33:13 CDT 2012


Hi all,
I'd like to ask for some hints or advice regarding the usage of
numpy.array and especially  slicing.

I only recently tried numpy and was impressed by the speedup in some
parts of the code, hence I suspect, that I might miss some other
oportunities in this area.

I currently use the following code for a simple visualisation of the
search matches within the text, the arrays are generally much larger
than the sample - the texts size is generally hundreds of kilobytes up
to a few MB - with an index position for each character.
First there is a list of spans(obtained form the regex match objects),
the respective character indices in between these slices should be set
to 1:

>>> import numpy
>>> characters_matches = numpy.zeros(10)
>>> matches_spans = numpy.array([[2,4], [5,9]])
>>> for start, stop in matches_spans:
...     characters_matches[start:stop] = 1
...
>>> characters_matches
array([ 0.,  0.,  1.,  1.,  0.,  1.,  1.,  1.,  1.,  0.])

Is there maybe a way tu achieve this in a numpy-only way - without the
python loop?
(I got the impression, the powerful slicing capabilities could make it
possible, bud haven't found this kind of solution.)


In the next piece of code all the character positions are evaluated
with their "neighbourhood" and a kind of running proportions of the
matched text parts are computed (the checks_distance could be
generally up to the order of the half the text length, usually less :

>>>
>>> check_distance = 1
>>> floating_checks_proportions = []
>>> for i in numpy.arange(len(characters_matches)):
...     lo = i - check_distance
...     if lo < 0:
...         lo = None
...     hi = i + check_distance + 1
...     checked_sublist = characters_matches[lo:hi]
...     proportion = (checked_sublist.sum() / (check_distance * 2 + 1.0))
...     floating_checks_proportions.append(proportion)
...
>>> floating_checks_proportions
[0.0, 0.33333333333333331, 0.66666666666666663, 0.66666666666666663,
0.66666666666666663, 0.66666666666666663, 1.0, 1.0,
0.66666666666666663, 0.33333333333333331]
>>>

I'd like to ask about the possible better approaches, as it doesn't
look very elegant to me, and I obviously don't know the implications
or possible drawbacks of numpy arrays in some scenarios.

the pattern
for i in range(len(...)): is usually considered inadequate in python,
but what should be used in this case as the indices are primarily
needed?
is something to be gained or lost using (x)range or np.arange as the
python loop is (probably?) inevitable anyway?
Is there some mor elegant way to check for the "underflowing" lower
bound "lo" to replace with None?

Is it significant, which container is used to collect the results of
the computation in the python loop - i.e. python list or a numpy
array?
(Could possibly matplotlib cooperate better with either container?)

And of course, are there maybe other things, which should be made
better/differently?

(using Numpy 1.6.2, python 2.7.3, win XP)


Thanks in advance for any hints or suggestions,
   regards,
  Vlastimil Brom


More information about the NumPy-Discussion mailing list