# [Numpy-discussion] Numpy performance vs Matlab.

Nicolas ROUX nicolas.roux@st....
Fri Jan 9 08:56:08 CST 2009

```Sorry my previous mail was probalby not clear.
This mail was following the tread we had before, so with some discussion
legacy.

I simplified the code to focus only on "what I" need, rather to bother you
with the full code.

I wrote below a code closer to what I need, where you will agree that
vectorization/broadcasting is needed to avoid nested loops.
As I wrote in the 1st mail (added at the end), what is important is to keep
the code not too ugly due to vectorization syntax.
(As written below I try to demonstrate that vectorization/broadcast code
could be as readable as twice nested loop )
The real code we have is even more complex, with processing the array
element using 5x5 neighbours, instead of 3x3.

######################################################
def test6():
w = 3096
h = 2048
a = numpy.zeros((h,w))      #Normally loaded with real data
b = numpy.zeros((h,w,3))

w0 = numpy.ogrid[0:w-2]
w1 = numpy.ogrid[1:w-1]
w2 = numpy.ogrid[2:w]

h0 = numpy.ogrid[0:h-2]
h1 = numpy.ogrid[1:h-1]
h2 = numpy.ogrid[2:h]

p00, p10, p20 = [h0,w0], [h1,w0],[h2,w0]
p01, p11, p21 = [h0,w1], [h1,w1],[h2,w1]
p02, p12, p22 = [h0,w2], [h1,w2],[h2,w2]

b[p11,1] =  a[p11] + 1.23*a[p22]  \
- numpy.min([a[p11]-a[p00],
a[p11]-a[p01],
a[p11]-a[p02],
a[p11]-a[p10],
a[p11]-a[p12],
a[p11]-a[p20],
a[p11]-a[p21],
a[p11]-a[p22]])  \
+ 0.123*numpy.max([a[p11]-a[p00],
a[p11]-a[p01],
a[p11]-a[p02],
a[p11]-a[p10],
a[p11]-a[p12],
a[p11]-a[p20],
a[p11]-a[p21],
a[p11]-a[p22]])
######################################################

This code above is the one I wish to write but is not working.
I hope you better understand my issue context ;-)

Did I missed something  ?

Cheers,
Nicolas.

> I understand the weakness of the missing JITcompiler in Python vs Matlab,
> that's why I invistigated numpy vectorization/broadcast.
> (hoping to find a cool way to write our code in fast Numpy)
>
> I used the page http://www.scipy.org/PerformancePython to write my code
> efficiently in Numpy.
> While doing it I found one issue.
>
> To have pretty code, I created p0 and p1 arrays of indexes.

I must admit I don't quite understand what you are trying to do, and what

If you just want to do

Out[:,:] = In[:,:]

there is no need for meshgrids (ogrid), for-loops, or whatever.

It is brain dead to use nested for-loops or ogrid for this purpose in
NumPy. It is equally foolish to use nested for loops or meshgrid for this
purpose in Matlab. If you do, I would seriously question your competence.

By the way, you can index ogrid with more than one dimension:

p = numpy.ogrid[:m, :n]
Out[p] = In[p]

============================================================================
===============
============================================================================
===============
============================================================================
===============

Hi !

A very good point for Numpy ;-)

I spent all my time trying to prepare my testcase to better share with you,
that's why I didn't reply fast.

I understand the weakness of the missing JITcompiler in Python vs Matlab,
that's why I invistigated numpy vectorization/broadcast.
(hoping to find a cool way to write our code in fast Numpy)

I used the page http://www.scipy.org/PerformancePython to write my code
efficiently in Numpy.
While doing it I found one issue.

To have pretty code, I created p0 and p1 arrays of indexes.
In "test8" I wished to see the commented line working, which is not the
case.
Having to use "ix_" is not pretty enough, and seems to not work with further
dimensions.
Why the comment line is not working ?

############################################
def test8():
m = 1024
n = 512

Out = numpy.zeros((m,n))
In = numpy.zeros((m,n))

p0 = numpy.ogrid[0:m]
p1 = numpy.ogrid[0:n]

Out[0:m,0:n] = In[0:m,0:n]
#Out[p0,p1] = In[p0,p1]   #This doesn't work
Out[numpy.ix_(p0,p1)] = In[numpy.ix_(p0,p1)]
############################################

What is maybe not clear in the above code, is that I don't want to predefine
all possible ogrid/vector.
The number of possible ogrid/vector is big if in need to define all.
... And this vector definition become more paintful.

So Numpy vector style is fine if i can write something like:
Out[p0,p1] = In[p0,p1]   #2 dimensions case
And
Out[p0,p1,1] = In[p0,p1,1] #3 dimensions case
But is not fine if i have to add ".ix_()" or to multiply the number of
vector definitions.

Below example with 3 dimensions instead of 2.
############################################
def test9():
m = 1024
n = 512

Out = numpy.zeros((m,n,3))
In = numpy.zeros((m,n,3))

p0 = numpy.ogrid[0:m]
p1 = numpy.ogrid[0:n]

Out[0:m,0:n,2] = In[0:m,0:n,2]
#Out[p0,p1,2] = In[p0,p1,2]
Out[numpy.ix_(p0,p1,2)] = In[numpy.ix_(p0,p1,2)]
############################################

Tanks again for your support ;-)

Cheers,
Nicolas.

============================================================================
===============
============================================================================
===============
============================================================================
===============
Hi,

I need help ;-)
I have here a testcase which works much faster in Matlab than Numpy.

The following code takes less than 0.9sec in Matlab, but 21sec in Python.
Numpy is 24 times slower than Matlab !
The big trouble I have is a large team of people within my company is ready
to replace Matlab by Numpy/Scipy/Matplotlib,
but I have to demonstrate that this kind of Python Code is executed with the
same performance than Matlab, without writing C extension.
This is becoming a critical point for us.

This is a testcase that people would like to see working without any code
restructuring.
The reasons are:
- this way of writing is fairly natural.
- the original code which showed me the matlab/Numpy performance differences
is much more complex,
and can't benefit from broadcasting or other numpy tips (I can later give
this code)

...So I really need to use the code below, without restructuring.

Numpy/Python code:
#####################################################################
import numpy
import time

print "Start test \n"

dim = 3000

a = numpy.zeros((dim,dim,3))

start = time.clock()

for i in range(dim):
for j in range(dim):
a[i,j,0] = a[i,j,1]
a[i,j,2] = a[i,j,0]
a[i,j,1] = a[i,j,2]

end = time.clock() - start

print "Test done,   %f sec" % end
#####################################################################

Matlab code:
#####################################################################
'Start test'
dim = 3000;
tic;
a =zeros(dim,dim,3);
for i = 1:dim
for j = 1:dim
a(i,j,1) = a(i,j,2);
a(i,j,2) = a(i,j,1);
a(i,j,3) = a(i,j,3);
end
end
toc
'Test done'
#####################################################################

Any idea on it ?
Did I missed something ?

Cheers,
Nicolas.

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

```