# [SciPy-User] FW: curve fitting by a sum of gaussian with scipy

josef.pktd@gmai... josef.pktd@gmai...
Thu Apr 18 10:51:01 CDT 2013

```On Thu, Apr 18, 2013 at 11:41 AM, Charles R Harris
<charlesr.harris@gmail.com> wrote:
>
>
> On Thu, Apr 18, 2013 at 9:18 AM, Stéphanie haaaaaaaa
> <flower_des_iles@hotmail.com> wrote:
>>
>> Hi charles,
>> Yes, i left out all the distance data points that had zero matches. So, I
>> correct this, and i obtain the graph attached.
>> I've done an interpolation with the following code using python and scipy
>> :
>>
>> #!/usr/bin/env python
>> # -*- coding:Utf-8 -*-
>>
>> import numpy as np
>> import matplotlib.pyplot as plt
>> from matplotlib import pylab
>> from scipy import interpolate
>> from scipy import stats
>>
>> data = np.dtype( [ ('distance',np.int), ('effectif',np.int) ] )
>> enreg = np.loadtxt('tmp.txt', dtype=data, skiprows=True)
>>
>> data=zip(enreg['distance'],enreg['effectif'])
>>
>>
>> xmin = -60
>> xmax=60
>> ymin=0
>> ymax=max(enreg['effectif'])+5
>>
>> ax1=plt.subplot(2,1,1)
>> ax1.set_autoscaley_on(False)
>> ax1.set_ylim([ymin,ymax])
>> ax1.set_xlim([xmin,xmax])
>>
>> plt.plot(enreg['distance'],enreg['effectif'],color='#347C2C',lw=2)
>> plt.fill_between(enreg['distance'],enreg['effectif'],0,color='#4CC417')
>>
>> #tck= interpolate.splrep(enreg['distance'],enreg['effectif'])
>> xnew= np.linspace(enreg['distance'][0],enreg['distance'][-1], 1000)
>> #ynew = interpolate.splev(xnew,tck,der=0)
>> #plt.plot(xnew, ynew, 'r')
>>
>> #kernel
>> #kde1 = stats.gaussian_kde(enreg['distance'])
>> #kdepdf = kde1.evaluate(xnew)
>>
>>
>> #plt.plot(xnew, kdepdf, label='kde', color="r")
>>
>> tck= interpolate.splrep(enreg['distance'],enreg['effectif'])
>> xnew= np.linspace(enreg['distance'][0],enreg['distance'][-1], 1000)
>> ynew = interpolate.splev(xnew,tck,der=0)
>> plt.plot(xnew, ynew, 'r')
>>
>>
>> plt.subplot(2,1,2)
>> ax2 = plt.subplot(212, sharex=ax1)
>>
>> for dist, eff in data: # distance dans mon histogramme + effectif dans
>> chaque classe
>>     for e in np.arange(eff): # np.arrange(3) = array([0, 1, 2])
>>         #print "%d,%d : %d" %(dist,eff,e)
>>         emod=e+1
>>         ax2.scatter(dist,-emod,marker='s',color='#B048B5') # pour qu'il
>> n'y ait pas d'overlappe
>>
>> #plt.xlim(0,100)
>>
>> ax1.xaxis.tick_top()
>> # Set the X Axis label.
>> ax1.set_xlabel('Position from significant crosslink site',fontsize=12)
>> ax1.xaxis.set_label_position('top')
>> ax1.axvline(0, color='black', lw=2)
>> # Set the Y Axis label.
>> ax1.set_ylabel('# of piRNAs',fontsize=12)
>>
>>
>> ax2.axes.get_xaxis().set_visible(False)
>> ax2.axes.get_yaxis().set_visible(False)
>> ax2.axvline(0, color='black', lw=2)
>>
>>
>>
>>
>> plt.setp( ax2.get_xticklabels(), visible=False)
>> plt.setp( ax2.get_yticklabels(), visible=False)
>> plt.show()
>>
>> What do you think about it ?
>>
>
> Certainly looks better ;) Not my field so I can't say more.
>
> For gaussian kernel density estimation there is scipy.stats.gaussian_kde
> that might be worth fooling with, although the notes imply that multimodal
> distributions tend to be over smoothed.

My guesss is that in this case pretty much any KDE would oversmooth if
it doesn't use an adaptive bandwidth.
Absent that, fitting a spline to a histogram sounds ok, especially if
the data is only given as histogram.
Although, it's still necessary to impose that it's actually a density,
integrate to 1 and non-negative (if we want a density).

Josef

>
> <snip>
>
> Chuck
>>
>> ________________________________
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
```