[SciPy-User] FW: curve fitting by a sum of gaussian with scipy

Stéphanie haaaaaaaa flower_des_iles@hotmail....
Thu Apr 18 10:18:12 CDT 2013


Hi charles,
Yes, i left out all the distance data points that had zero matches. So, I correct this, and i obtain the graph attached. 
I've done an interpolation with the following code using python and scipy :

#!/usr/bin/env python
# -*- coding:Utf-8 -*- 

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import pylab
from scipy import interpolate
from scipy import stats

data = np.dtype( [ ('distance',np.int), ('effectif',np.int) ] )
enreg = np.loadtxt('tmp.txt', dtype=data, skiprows=True)

data=zip(enreg['distance'],enreg['effectif'])


xmin = -60
xmax=60
ymin=0
ymax=max(enreg['effectif'])+5

ax1=plt.subplot(2,1,1)
ax1.set_autoscaley_on(False)
ax1.set_ylim([ymin,ymax])
ax1.set_xlim([xmin,xmax])

plt.plot(enreg['distance'],enreg['effectif'],color='#347C2C',lw=2)
plt.fill_between(enreg['distance'],enreg['effectif'],0,color='#4CC417')

#tck= interpolate.splrep(enreg['distance'],enreg['effectif'])
xnew= np.linspace(enreg['distance'][0],enreg['distance'][-1], 1000)
#ynew = interpolate.splev(xnew,tck,der=0)
#plt.plot(xnew, ynew, 'r')

#kernel
#kde1 = stats.gaussian_kde(enreg['distance'])
#kdepdf = kde1.evaluate(xnew)


#plt.plot(xnew, kdepdf, label='kde', color="r")

tck= interpolate.splrep(enreg['distance'],enreg['effectif'])
xnew= np.linspace(enreg['distance'][0],enreg['distance'][-1], 1000)
ynew = interpolate.splev(xnew,tck,der=0)
plt.plot(xnew, ynew, 'r')


plt.subplot(2,1,2)
ax2 = plt.subplot(212, sharex=ax1)

for dist, eff in data: # distance dans mon histogramme + effectif dans chaque classe
    for e in np.arange(eff): # np.arrange(3) = array([0, 1, 2])
        #print "%d,%d : %d" %(dist,eff,e)
        emod=e+1
        ax2.scatter(dist,-emod,marker='s',color='#B048B5') # pour qu'il n'y ait pas d'overlappe
 
#plt.xlim(0,100)

ax1.xaxis.tick_top()
# Set the X Axis label.
ax1.set_xlabel('Position from significant crosslink site',fontsize=12)
ax1.xaxis.set_label_position('top')
ax1.axvline(0, color='black', lw=2)
# Set the Y Axis label.
ax1.set_ylabel('# of piRNAs',fontsize=12)


ax2.axes.get_xaxis().set_visible(False)
ax2.axes.get_yaxis().set_visible(False)
ax2.axvline(0, color='black', lw=2)




plt.setp( ax2.get_xticklabels(), visible=False)
plt.setp( ax2.get_yticklabels(), visible=False)
plt.show()

What do you think about it ?

Date: Thu, 18 Apr 2013 09:05:19 -0600
From: charlesr.harris@gmail.com
To: scipy-user@scipy.org
Subject: Re: [SciPy-User] FW: curve fitting by a sum of gaussian with scipy



On Thu, Apr 18, 2013 at 8:59 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:



On Thu, Apr 18, 2013 at 6:24 AM, Stéphanie haaaaaaaa <flower_des_iles@hotmail.com> wrote:





Dear all,


I'm doing bioinformatics and we map small RNA on mRNA. We have the 
mapping coordinate of a protein on each mRNA and we calculate the 
relative distance between the place where the protein is bound on the mRNA and
 the site that is bound by a small RNA.


I obtain the following dataset :


dist    eff
-69 3
-68 2
-67 1
-66 1
-60 1
-59 1
-58 1
-57 2
-56 1
-55 1
-54 1
-52 1
-50 2
-48 3
-47 1
-46 3
-45 1
-43 1
0   1
1   2
2   12
3   18
4   18
5   13
6   9
7   7
8   5
9   3
10  1
13  2
14  3
15  2
16  2
17  2
18  2
19  2
20  2
21  3
22  1
24  1
25  1
26  1
28  2
31  1
38  1
40  2


When i plot the data, i have 3 pics : 1 at around 3/4
another one around 20 and a last one around -50. (see attached file, upper graph)



I try cubic spline interpolation, but it does'nt work very well for my data (see attached file 2, red curve).


My idea was to do curve fitting with a sum of gaussians.
For example in my case, estimate 3 gaussian curve around the peak (at point 5,20 and -50).


How can i do so ?


I looked at scipy.optimize.curve_fit(), but how can i fit the curve at precise intervalle ? 
How can i add the curve to have one single curve ?


That's interesting. On thinking about it, I think if you used the design matrix for, say, fitting a uniform spline with fairly closely spaced sample points, that it would be pretty singular, which would be a good thing because the pseudo inverse would minimize the sum of squares of the coefficients, which in turn would knock down the curve where there is no data. Mind, I'm just speculating here, haven't tried it. Is the data you posted complete?


And thinking some more, always a bad sign here, this looks like a histogram, but you have left out all the distance data points that had zero matches, I think you need to keep them in.


Chuck 



_______________________________________________
SciPy-User mailing list
SciPy-User@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20130418/aec266c0/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: interpol_20m1_full_data.png
Type: image/png
Size: 43471 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/scipy-user/attachments/20130418/aec266c0/attachment-0001.png 


More information about the SciPy-User mailing list