[IPython-dev] Thoughts on the notebook format for version control
Sat Nov 5 22:39:43 CDT 2011
On Saturday, November 05, 2011 07:58:47 pm Fernando Perez wrote:
> On Sat, Nov 5, 2011 at 7:41 PM, MinRK <email@example.com> wrote:
> > On Sat, Nov 5, 2011 at 18:58, Fernando Perez <firstname.lastname@example.org> wrote:
> > There is a *huge* disadvantage in portability to notebooks not being single
> > files. I think this still makes
> > sense, though. I would treat the output as a 'cache' (along the lines of
> > .pyc / __cache__),
> > rather than considering the notebook itself as a multi-file format. And you
> > should be able
> > to embed the outputs in a single file if you want, for easier portability.
> > Doing it this way would not require changing the notebook format, because
> > current (output-included)
> > notebooks would still comply with the spec.
> I agree that it's a big inconvenience for everyday, non-VC use. I
> like the idea of making it optional, it can be a flag set in the
> metadata dict, that indicates whether to keep outputs in the cache or
> internally (and also to offer the single-file download option).
A complimentary idea would be to provide utility scripts for converting
(unified notebook) <-> (inputs), (outputs)
outside of an IPython session.
These could be used by VC hooks (e.g., to strip or split outputs before
committing or to strip outputs before diff-ing).
The main use case that I am thinking of is keeping complete notebooks in git
with hooks set up to ignore outputs for normal use, but with the ability to
recover "environment dependent" outputs (e.g., when auditing an old result
that is not reproducible on a new system). I am not sure how often this would
be practical (since many of the interesting use cases would involve large outputs)
or if it has any advantage over normal incremental backups of output files or
externally marking outputs with the relevant commit SHAs (which is what I tend to do now).
More information about the IPython-dev