# [SciPy-User] technical question: normed exponential fit for data?

Daniel Lepage dplepage@gmail....
Thu Mar 24 14:12:11 CDT 2011

On Thu, Mar 24, 2011 at 2:30 PM, Daniel Mader
> Dear Dan,
>
> thank your very much for this approach, it really sounds very reasonable.
>
> However, not being a probability pro, I don't understand the meaning
> for some terms:
>
> P(concentration | count, temperature) P(count, temperature) ?
>
> I'd be grateful if you could elaborate a little more, it sounds very promising!

Sure!

If A is a random variable, then P(A) (the "probability of A") is a
function assigning probabilities to all possible values of A. For a
given value "a", you'll often see people write P(A=a) to denote the
specific probability that A will take that value.

For example, let T be the random variable for the temperature of your
system. Then P(T) is a function assigning a probability to each
temperature; the probability that the temperature is 3K would be
written P(T=3K).

P(A,B) (the "joint probability of A and B") is a two-parameter
function that tells the probability of seeing particular pair of
values. Again letting T be temperature, let C be concentration; P(T=t,
C=c) tells you the probability that the temperature would be t and the
concentration would be c. Note that the order doesn't matter: P(A, B)
= P(B, A).

P(A | B) (the "conditional probability of A given B") is a
two-parameter function that gives you the probability that A would
take some value given that B had taken another. So P(T=t | C=c) tells
you the probability that the temperature would be t if the
concentration were c.

Note that P(A, B) and P(A | B) are both two-argument functions, but
satisfy different constraints - P(A,B) is a probability distribution,
so if you integrate over all possible values of a and b you should get
1, whereas P(A | B) defines a set of a probability distributions, so
that for any given choice of b integrating P(A | B=b) over all
possible values of a will yield 1.

These functions are related by two fundamental theorems:

The law of conditional probability: P(A=a, B=b) = P(A=a | B=b) P(B=b)

This says that the probability that you'd see get a pair of
observations (a,b) is equal to the probability that you'd see A=a
given that B=b times the probability that B would equal b in the first
place.

The law of marginalization: P(A=a) = \int_b P(A=a, B=b)

This says that the probability that you'd observe A=a is equal to the
integral over all possible values of b of the probability that you'd
see (A=a, B=b).

Another oft-cited theorem, called Bayes' Law, follows from the law of
conditional probability and from the fact that P(A,B) = P(B,A):
P(A | B) = P(B | A) * P(A)/P(B)

In practice, you can solve a lot of problems without ever writing them
in this form. For example, the algorithm I described of fitting a
surface to your calibration and then intersecting new measurements
with this surface makes intuitive sense without looking at the
underlying probabilities: you assume that the correct values lie on
some manifold, estimate the manifold from your calibration data, and
then use the manifold to look up concentration as a function of
temperature and count.

because it forces you to explicitly spell out your assumptions, such
as the assumption that your data is corrupted by Gaussian noise (this
is an implicit assumption any time you use a least-squares fitting
technique such as linear regression).

some reasonable descriptions:
http://en.wikipedia.org/wiki/Joint_probability
http://en.wikipedia.org/wiki/Conditional_probability
http://en.wikipedia.org/wiki/Marginal_probability