[Numpy-discussion] Python ctypes and OpenMP mystery

Francesc Alted faltet@pytables....
Wed Feb 16 09:57:03 CST 2011


A Saturday 12 February 2011 21:19:39 Eric Carlson escrigué:
> Hello All,
> I have been toying with OpenMP through f2py and ctypes. On the whole,
> the results of my efforts have been very encouraging. That said, some
> results are a bit perplexing.
> 
> I have written identical routines that I run directly as a C-derived
> executable, and through ctypes as a shared library. I am running the
> tests on a dual-Xeon Ubuntu system with 12 cores and 24 threads. The
> C executable is SLIGHTLY faster than the ctypes at lower thread
> counts, but the C eventually has a speedup ratio of 12+, while the
> python caps off at 7.7, as shown below:
> 
> threads C-speedup Python-speedup
> 1	1	1
> 2	2.07	1.98
> 3	3.1	2.96
> 4	4.11	3.93
> 5	4.97	4.75
> 6	5.94	5.54
> 7	6.83	6.53
> 8	7.78	7.3
> 9	8.68	7.68
> 10	9.62	7.42
> 11	10.38	7.51
> 12	10.44	7.26
> 13	7.19	6.04
> 14	7.7	5.73
> 15	8.27	6.03
> 16	8.81	6.29
> 17	9.37	6.55
> 18	9.9	6.67
> 19	10.36	6.9
> 20	10.98	7.01
> 21	11.45	6.97
> 22	11.92	7.1
> 23	12.2	7.08
> 
> These ratios are quite consistent from 100KB double arrays to 100MB
> double arrays, so I do not think it reflects a Python overhead issue.
> There is no question the routine is memory bandwidth constrained, and
> I feel lucky to squeeze the eventual 12+ ratio, but I am very
> perplexed as to why the performance of the Python-invoked routine
> seems to cap off.
> 
> Does anyone have an explanation for the caps? Am I seeing some effect
> from ctypes, or the Python engine, or what?

It is difficult to realize what could be going on by only looking at the 
timings.  Can you attach a small, self-contained benchmark?  Not that I 
can offer a definitive answer, but I'm curious about this.

-- 
Francesc Alted


More information about the NumPy-Discussion mailing list