[SciPy-User] Proposal for a new data analysis toolbox

Dag Sverre Seljebotn dagss@student.matnat.uio...
Thu Nov 25 05:01:54 CST 2010


On 11/25/2010 09:30 AM, Sebastian Haase wrote:
> On Thu, Nov 25, 2010 at 8:32 AM, David<david@silveregg.co.jp>  wrote:
>    
>> On 11/25/2010 03:40 PM, Dag Sverre Seljebotn wrote:
>>      
>>> On 11/24/2010 07:09 PM, Matthew Brett wrote:
>>>        
>>>> Hi,
>>>>
>>>> On Wed, Nov 24, 2010 at 9:30 AM, Dag Sverre Seljebotn
>>>> <dagss@student.matnat.uio.no>     wrote:
>>>>
>>>>
>>>>          
>>>>> For the time being, for something like this I'd definitely go with a
>>>>> template language to generate Cython code if you are not already. Myself
>>>>> (for SciPy on .NET/fwrap refactor) I'm using Tempita with a pyx.in
>>>>> extension and it works pretty well. Using Bento one can probably chain
>>>>> Tempita so that this gets built automatically (but I haven't tried that
>>>>> yet).
>>>>>
>>>>>            
>>>> Thanks for the update - it's excellent news that you are working on
>>>> this.  If you ever have spare time, would you consider writing up your
>>>> experiences in a blog post or similar?  I'm sure it would be very
>>>> useful for the rest of us who have idly thought we'd like to do this,
>>>> and then started waiting for someone with more expertise to do it...
>>>>
>>>>          
>>> I don't have a blog, and it'd take too much time to create one, but
>>> here's something less polished:
>>>
>>> What I'm really doing is to modify fwrap so that it detects functions
>>> with the same functionality (but different types) in the LAPACK wrapper
>>> in scipy.linalg, and emits a Cython template for that family of
>>> functions. But I'll try to step into your shoes here.
>>>
>>> There's A LOT of template engines out there. I chose Tempita, which has
>>> the advantages of a) being recommended by Robert Kern, b) pure Python,
>>> no compiled code, c) very small and simple so that it can potentially be
>>> bundled with other projects in the build system without a problem.
>>>
>>> Then, simply write templated code like the following. It becomes less
>>> clear to read, but a lot easier to fix bugs etc. when they must only be
>>> fixed in one spot.
>>>
>>> {{py:
>>> dtype_values = ['np.float32', 'np.float64', 'np.complex64', 'np.complex128']
>>> dtype_t_values = ['%s_t' % x for x in dtype_values]
>>> funcletter_values = ['f', 'd', 'c', 'z']
>>> NDIM_MAX = 5
>>> }}
>>>
>>> ...
>>>
>>> {{for ndim in range(5}}
>>> {{for dtype, dtype_t, funcletter in zip(dtype_values, dtype_t_values,
>>> funcletter_values)}}
>>> def {{prefix}}sum_{{ndim}}{{funcletter}}(np.ndarray[{{dtype_t}},
>>> ndim={{ndim}}] x,
>>>
>>> np.ndarray[{{dtype_t}}, ndim={{ndim}}] y,
>>>
>>> np.ndarray[{{dtype_t}}, ndim={{ndim}}] out=None):
>>>          ... and so on...inside here everything looks about the same as
>>> normal...
>>> {{endfor}}
>>> {{endfor}}
>>>
>>>
>>> For integrating this into a build, David C.'s Bento is probably the best
>>> way once a bug is fixed (see recent "Cython distutils" thread on
>>> cython-dev where this is specifically discussed, and David points to
>>> examples in the Bento distribution). For my work on fwrap I use the
>>> "waf" build tool, where it is a simple matter of:
>>>
>>> def run_tempita(task):
>>>        import tempita
>>>        assert len(task.inputs) == len(task.outputs) == 1
>>>        tmpl = task.inputs[0].read()
>>>        result = tempita.sub(tmpl)
>>>        task.outputs[0].write(result)
>>>
>>> ...
>>> bld(
>>>            name = 'tempita',
>>>            rule = run_tempita,
>>>            source = ['foo.pyx.in'],
>>>            target = ['foo.pyx']
>>>            )
>>>        
>> You may want to look at the flex example in waf tools subdir to see how
>> to chain builders together.
>>
>> As for bento, I unfortunately won't be able to work on it much if at all
>> until the end of the year, so I don't think I will have time to fix the
>> issue until then,
>>
>> cheers,
>>
>> David
>>      
> As I mentioned, I have a setup based on SWIG: it allows me to do most
> of the heavy-lifting using SWIG's C++-template support, to make
> "general" functions that support a multiple dtypes. With the help of a
> C preprocessor macro it instantiates the functions (which is needed
> for builtind dynamic libs) for a standard set of dtypes - for my image
> processing needs I have: uint8, uint16, int16, int32, float32,
> float64, and long -- this is also a compromise to get the dlls bloated
> with dypes I never use ( and e.g. bool can be casted in a python
> wrapper to uint8).
> My point here, is that as far as I know cython is missing such a
> template support, right ? -- how hard would it be to add this,
> concentrating on this special purpose of dtype support ?
>    


I'd guess between 1 and 2 weeks full-time by somebody who already knows 
the code base. But I'm not willing to stand by that guess in the future :-)

Dag Sverre


More information about the SciPy-User mailing list