# [SciPy-Dev] Electronics student with programming background - Like to participate in SciPy - Write some code and learn!

Daniel Smith smith.daniel.br@gmail....
Tue Jan 22 15:16:36 CST 2013

``` <josef.pktd <at> gmail.com> writes:

>
> On Tue, Jan 22, 2013 at 2:26 PM, Daniel Smith <smith.daniel.br <at> gmail.com> wrote:
> > Hello,
> >
> > I am also looking for ways to contribute to Scipy. I have experience
> > with Python, C/C++ and limited experience with the Numpy C API. In
> > particular, I have some code implementing the kernel density estimator
> > bandwidth selection algorithm from the following paper:
> >
> > Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density
> > estimation via diffusion. The Annals of Statistics, 38(5):2916–2957,
> > 2010.
> >
> > That method is more resilient to multi-modal data than the standard
> > plug-in estimators. I would love to add that method to the current
> > SciPy stats package if there is interest.
>
> Looks interesting, either for scipy.stats or statsmodels.
> statsmodels has now kde with least-squares cross-validation among
> other bandwidth choices.
>
> However, there is nothing to improve boundary effects or that has

Boundary effects are another issue. I don't have working code, but I have seen
a few algorithms and could certainly add those corrections to existing code.

>
> Which programming language did you write it in?

Everything is in Python/SciPy/Numpy. The most computationally expensive parts
are the FFT, iFFT and a fixed point calculation, which are all implemented in
SciPy/Numpy. The code is reasonably fast as it stands. I could make it faster
by using Cython or C for calculating the derivatives of the estimated
probability distribution function (pdf).

>
> and out of curiosity: Do you know how well the estimator behaves in
> smaller samples, 200 or 500. The paper seems to consider sample size
> of 1000 as small. (very fast skimming of article)

Personally, I've had pretty good luck going down to 50-100 samples. The exact
sample size needed largely depends on how ragged the pdf you are estimating is.

>
> Josef
>
> >
> > Thanks,
> > Daniel
> >
> >> Hi,
> >>
> >> I am Surya, studying Junior Year - Electronics & Communication Engineering
> >> with Computer Science/ Programming background. I have looked into SciPy and
> >> its really amazing!
> >>
> >> In this regard, I would like to explore the possibility of contributing to
> >> this project by writing code and simultaneously learn the real engineering
> >> stuff. My skills lie in Python, Django, C - and little Facebook API, Cloud
> >> platforms (Openshift), Git.
> >>
> >> Also, I wrote some fun-stuff projects during week ends which you might like
> >> to take a look.
> >>
> >> 1. Https://apps.facebook.com/pingmee -- Lets people ping their friends
> >> using cartoons (Python, Django -- PIL)
> >> for my photography blog; Not yet finished (Python, Django -- Google Feed
> >> API) - Got to finish if time permits
> >> 3. Https://github.com/ksurya -- Github handle
> >>
> >> So, I am ready to take up any work and get along with it that involves
> >> Python!
> >>
> >> Regarding my scientific skills, I studied Engineering Mathematics, Digital
> >> Signal Processing (now studying), Signals & Systems etc. [ More on signals ]
> >>
> >>