[IPython-dev] ipython sphinx directive

John Hunter jdh2358@gmail....
Sat Nov 7 09:21:27 CST 2009

On Sat, Nov 7, 2009 at 3:51 AM, Gael Varoquaux
<gael.varoquaux@normalesup.org> wrote:

> Very, very useful. I can't give much feedback, because I have no
> available mental bandwidth

That's OK, I don't either <wink>

I did some thinking overnight and have codified my ideas in a proposal
(appropriately in sphinx) included below -- also in html at
IPython Directive Proposal

I've been debating the merits of two syntaxes for the embedded ipython
interpreter.  The first alternative, call it "plain python" is to feed
literal python code to ipython, so your rest code looks like::

  .. ipython::

     x = 2

and your rendered code looks like this::

  In [1]: x = 2

  In [2]: x**3
  Out[2]: 8

The advantages of this are ease of implementation and a true "what you
see is what you get.  However, there are a number of disadvantages.

In the alternative syntax, call it "ipython prompt", The rst document
includes input and output prompts, but the embedded ipython
interpreter detects the input prompt string 'In [\\d]:' and executes
the code.  I'm leaning towards this syntax, fraught as it is with
complexities, for reasons outlined below.  The rest doc would look
like this::

  .. ipython::

     In [1]: x = 'hello world'

     In [2]: x.upper()
     Out[2]: 'HELLO WORLD'

     In [3]: x.st<TAB>
     x.startswith  x.strip

The advantages of this approach

* there are some things, illustrated by prompt 3, that we will not be
  able to do in ipython embedded in sphinx, eg illustrating tab
  completion which requires interactive keystrokes and readline.  I
  propose an ``@verbatim`` pseudo-decorator to inform ipython to simply
  put the input and output into the rest document verbatim (modulo
  prompt numbering, see below)

* one of the strengths of rest is that is is human readable in the
  plain text form.  If we input plain python w/o the output prompts::

    .. ipython::

       import numpy as np
       x = np.arange(10)

  but then refer to the output in the rest narrative like 'the numpy
  array method ``std`` computes the standard deviation of the array as
  2.87', it is hard for the reader and writer of the plain text
  document to follow along.  Also, it is more work for the doc writer,
  who is likely coding the examples in ipython as he works, to paste
  in the code session verbatim -- it was a fair amount of work to
  strip off the input and output prompts, and strip the output, for
  the example above.  Much more natural is to just paste::

    .. ipython::

       In [5]: import numpy as np

       In [6]: x = np.arange(10)

       In [7]: x.mean()
       Out[7]: 4.5

       In [8]: x.std()
       Out[8]: 2.8722813232690143

But there are subtleties and complexities with this approach as well.
These include

* how do we handle numbering?  I'm pretty sure we want auto-numbering.
  With real-world experience writing a chapter using ipython session
  dumps heavily, I find that you frequently want to change a thing or
  two, and the prompt numbering gets out of whack.  Or you come back
  later with a fresh ipython session and insert something into the
  middle of the chapter and your ipython prompt numbers are
  non-monotonic.  So I propose auto-numbering, even for ``@verbatim``
  inputs, where the embedded interpreter will use your input and
  output, but will use its internal prompt counter.  I am punting on
  use of things like ``_10`` to refer to the 10th output -- too hard.
  However, we often want to refer to the input and output line number
  in our narrative text, like "the ``std`` on input line ``In [8]``
  does such-and-such" which we cannot easily do with auto-numbering.
  One solution is to support a new role, something like::

    .. ipython::

       In [7]: x.mean()
       Out[7]: 4.5

       .. _std_x:
       In [8]: x.std()
       Out[8]: 2.8722813232690143

  which we can refer to in our text like::

     the ``std`` call on input line ``In [:iref:`std_x`]`` does such and such.

  a little perl-esque and ugly, but may get the job done.

* What should be rendered for output: the session output or the
  ipython interpreter output.  Normally these should be identical, and
  in fact we can and should support an ``@doctest`` decorator or something
  like that, but there are a number of cases when they will not be
  identical.  One obvious one is when we are dealing with random

    .. ipython::

       In [11]: x = np.random.rand(10)

       In [12]: x.max()
       Out[12]: 0.86805905876197698

  There is an obvious solution here which is to seed the generator
  deterministically, but there are other cases like grabbing a file
  from a web server that may be updated daily (eg stock prices from
  Yahoo Finance) which will also lead to different data at doc build
  time, so we should at least deal with the issue of how to handle
  non-deterministic data.

  I think the only workable solution is to use *ipython's* output and
  not the user output.  Because when we start making figures, the user
  text output and the figure data will not necessarily agree for
  non-deterministic output.  So essentially, I think it is incumbent
  on the doc writer to insure deterministic output.

  We can support a ``@suppress`` decorator and/or argument so the doc writer
  can set up the ipython session to insure deterministic output with
  stuff they may not want reflected in the rendered doc.  Eg, to
  suppress and entire block during a setup phase, we could employ a
  ``:suppress:`` option::

    .. ipython::

       In [6]: from pylab import *

       In [7]: ion()

       In [8]: import numpy.random

       In [9]: numpy.random.seed(2358)

  Or if grabbing data from a URL which may change, but which we want
  to insure is deterministic (at least to the best of our abilities),
  we could filter the date range to be deterministic but hide some of
  the work using ``@suppress`` for single lines or ``:suppress:`` for
  entire blocks::

    .. ipython::

       In [17]: url = 'http://ichart.finance.yahoo.com/table.csv?s=CROX\
	  ....: &d=9&e=22&f=2009&g=d&a=1&b=8&c=2006&ignore=.csv'

       In [19]: import urllib

       In [21]: fname, msg = urllib.urlretrieve(url)

       In [24]: import matplotlib.mlab as mlab

       In [26]: r = mlab.csv2rec(fname)

    .. ipython::

       # make sure the date range is deterministic
       In [28]: import datetime

       In [29]: startdate = datetime.date(2006,1,1)

       In [30]: enddate = datetime.date(2009,11,6)

       In [31]: mask = (r.date>=startdate) & (r.date<=enddate)

       In [32]: r = r[mask]

What to do about figures?

In matplotlib's ``plot_directive``, we automatically insert all the
figures that are generated in the plot code into the rest document.
This is very convenient, and may be the approach we want to take in
the ``ipython_directive``, but we may want to consider alternatives.
The ``plot_directive`` works because it is aware of the matplotlib
figure manager which is a pylab helper hidden to normal users.  We
might want to avoid all such magic in ipython mode, and have the user
do everything.  Eg::

    .. ipython::

       In [4]: import matplotlib
       In [5]: matplotlib.use('Agg')
       In [6]: from pylab import *
       In [7]: ion()

and then when they make a plot::

    .. ipython::

	 In [45]: plot([1,2,3])
	 Out[45]: [<matplotlib.lines.Line2D object at 0xa00e6cc>]

	 In [46]: savefig('_static/myfig.png')

and then we *explicitly* refer to it with an image directive::

    .. image:: _static/myfig.pnh
       :width: 4in

This has the advantage of explicit is better than implicit, but it is
tricky to find the ``_static`` dir, particularly if someone has
navigated in ipython away from the base directory.  We might circumvent
this by supporting a special ``$STATIC`` argument, which the embedded
ipython interpreter would replace with the actual ``_static``
directory.  Thus we could do, from any dir::

    In [46]: savefig('$STATIC/myfig.png')

However, there is still a potential gotcha.  I find in my rendered
PDFs, when I have interleaved a bunch of ``.. sourcecode:: ipython``
and ``..plot::`` commands, that the figures can easily get separated
from the ipython sessions they are associated with.  This is a
familiar problem to LaTeX users: latex will insert the figures at
semi-random spots, and it might be helpful to let the code generation
inside the ipython directive generate both the code and figure
insertion in case there are some latex mode bindings we can do to try
and keep these associated with one another via minipages or even
simple code/figure numbering.  With this in mind, I propose some
special magic, syntax to be determined::

    @insert_fig width=4in
    In [46]: savefig('myfig.png')

which will both save the fig to the static directory and insert it
into the document.

Proposed pseudo-decorators

    execute the ipython input block, but suppress the input and output
    block from the rendered output.

    insert the input and output block in verbatim, but auto-increment
    the line numbers.  Internally, the interpreter will be fed an
    empty string, so it is a no-op that keeps line numbering

    save the figure to the static directory and insert it into the
    document, possibly binding it into a minipage and/or putting
    code/figure label/references to associate the code and the figure.
    Takes args to pass to the image directive (scale, width, etc can
    be kwargs)

    Compare the pasted in output in the ipython block with the output
    generated at doc build time, and raise errors if they don't match.
    We may also want a ``:doctest:`` option to the ipython directive
    to test the entire block.

Special roles

    ipython input blocks can be tagged with the standard label syntax,
    eg ``.. _std_x:`` and later referred to with :iref:`std_x` which
    will insert the *number* of the referenced prompt into the
    narrative text.

Proposed options

    Doctest the entire block

    Execute the entire block in the ipython session but insert nothing

More information about the IPython-dev mailing list