[SciPy-User] Proposal for a new data analysis toolbox

Dag Sverre Seljebotn dagss@student.matnat.uio...
Thu Nov 25 00:40:10 CST 2010


On 11/24/2010 07:09 PM, Matthew Brett wrote:
> Hi,
>
> On Wed, Nov 24, 2010 at 9:30 AM, Dag Sverre Seljebotn
> <dagss@student.matnat.uio.no>  wrote:
>    
>
>> For the time being, for something like this I'd definitely go with a
>> template language to generate Cython code if you are not already. Myself
>> (for SciPy on .NET/fwrap refactor) I'm using Tempita with a pyx.in
>> extension and it works pretty well. Using Bento one can probably chain
>> Tempita so that this gets built automatically (but I haven't tried that
>> yet).
>>      
> Thanks for the update - it's excellent news that you are working on
> this.  If you ever have spare time, would you consider writing up your
> experiences in a blog post or similar?  I'm sure it would be very
> useful for the rest of us who have idly thought we'd like to do this,
> and then started waiting for someone with more expertise to do it...
>    

I don't have a blog, and it'd take too much time to create one, but 
here's something less polished:

What I'm really doing is to modify fwrap so that it detects functions 
with the same functionality (but different types) in the LAPACK wrapper 
in scipy.linalg, and emits a Cython template for that family of 
functions. But I'll try to step into your shoes here.

There's A LOT of template engines out there. I chose Tempita, which has 
the advantages of a) being recommended by Robert Kern, b) pure Python, 
no compiled code, c) very small and simple so that it can potentially be 
bundled with other projects in the build system without a problem.

Then, simply write templated code like the following. It becomes less 
clear to read, but a lot easier to fix bugs etc. when they must only be 
fixed in one spot.

{{py:
dtype_values = ['np.float32', 'np.float64', 'np.complex64', 'np.complex128']
dtype_t_values = ['%s_t' % x for x in dtype_values]
funcletter_values = ['f', 'd', 'c', 'z']
NDIM_MAX = 5
}}

...

{{for ndim in range(5}}
{{for dtype, dtype_t, funcletter in zip(dtype_values, dtype_t_values, 
funcletter_values)}}
def {{prefix}}sum_{{ndim}}{{funcletter}}(np.ndarray[{{dtype_t}}, 
ndim={{ndim}}] x,
                                                                         
np.ndarray[{{dtype_t}}, ndim={{ndim}}] y,
                                                                         
np.ndarray[{{dtype_t}}, ndim={{ndim}}] out=None):
       ... and so on...inside here everything looks about the same as 
normal...
{{endfor}}
{{endfor}}


For integrating this into a build, David C.'s Bento is probably the best 
way once a bug is fixed (see recent "Cython distutils" thread on 
cython-dev where this is specifically discussed, and David points to 
examples in the Bento distribution). For my work on fwrap I use the 
"waf" build tool, where it is a simple matter of:

def run_tempita(task):
     import tempita
     assert len(task.inputs) == len(task.outputs) == 1
     tmpl = task.inputs[0].read()
     result = tempita.sub(tmpl)
     task.outputs[0].write(result)

...
bld(
         name = 'tempita',
         rule = run_tempita,
         source = ['foo.pyx.in'],
         target = ['foo.pyx']
         )
...

Although I'm sure a more automatic rule for .pyx.in -> .pyx is possible 
as well (I don't really know waf, it's just what the fwrap test 
framework uses).

Dag Sverre


More information about the SciPy-User mailing list