# [SciPy-dev] Sphinx-in-LaTeX do's and don'ts

josef.pktd@gmai... josef.pktd@gmai...
Thu Aug 13 22:15:42 CDT 2009

On Thu, Aug 13, 2009 at 6:32 PM, David Warde-Farley<dwf@cs.toronto.edu> wrote:
> I looked at the scipy.maxentropy docstring on the doc site and
> realized that that equation should really be typeset. The trouble is
>
> As a compromise, I added the "plaintext" version of the equation in a
> reST comment above the LaTeX:
>
>        http://docs.scipy.org/scipy/docs/scipy.maxentropy/
>
> do people think this is a reasonable compromise?
>
> David
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>

Do you really need all the fancy latex in a docstring? It seems all
you need is a dot/inner product, and the over arrow for vectors is not
really necessary.
Your type setting looks good, but for a docstring it seems overkill.
In econometrics/statistics for example  X^T X is perfectly understood
as a matrix/dot product if you know that X are matrices or2-d arrays.

Actually, with the typesetting of the equation (transpose and arrow) I
start to worry that I misunderstand the dimension in the definition.
If theta is a 1d array and f is a vector of functions with the same
number of elements then \theta \dot \f should be clear enough to
indicate the innerproduct.
In analogy, this looks very much like a multinomial logit with the
vector of functions f replacing the role of explanatory variables X.

p(y=j|X) = exp(X_j beta)/sum_i{exp(X_i beta)}   where X_j and beta are
both 1-d vectors with same length.

Similarly in the kernel literature in machine learning, I have seen
mostly a simple notation for inner products to represent a linear
combination of feature or kernel functions.

My recommendation would be to keep the latex in the docstrings simple
and readable. Latex in rst files for more extended explanations are a
different story.

as a BTW:

When I tried to figure out what the maxentropy subpackage is doing, I
found it quite impossible from a quick reading of the docs, tests and
examples. The examples are good but don't explain why it works, as far
as I remember.

Recently, I stumbled upon two introductory references, where I
understood for the first time a little bit of the theory behind
maxentropy, especially the connection to maximum likelihood
estimation.

http://www.stat.washington.edu/courses/stat592/winter04/public_html/handouts.html

and
http://www.cs.cmu.edu/afs/cs/user/aberger/www/maxent.html
especially
http://www.cs.cmu.edu/afs/cs/user/aberger/www/ps/compling.ps

Sorry for the longer than intended mail, but I'm always happy when a
(statistics/machine learning related) package, that I thought might be
orphaned, gets some attention.

Josef