[IPython-User] embed data in notebooks?
Tue Jun 5 17:22:47 CDT 2012
> We did seriously consider the idea of archive file-formats while planning
> the notebook format, but we decided (largely from the perspective of VCS,
> etc) that JSON makes much more sense,
It's not an either/or choice. In effect, I'm saying, when encountering a
zip file in a path, iPython notebooks could behave as if the path element
had been mounted with fuse-zip. That way, people could choose whether to
use zip files or directories, and whether to use one or multiple notebooks
with separate or shared data. I expect I would use both kinds of
Also, version control systems are learning to deal with zip files anyway
because other document formats use zip files to solve the same problem
(text-based documents with embedded binary objects). Both Mercurial and
git support this now.
> What is unfortunate at this point is that we really haven't developed our
> project-level UI/APIs yet, they only exist in the planning stages. I think
> once you have project as dir/repo, then the benefits of data in the
> notebook file itself vanish, as the project becomes the unit of
Sure, when working on research projects, notebooks can live in project
repositories with no problem, and you usually want them to share data. I
use notebooks a lot this way.
When teaching, however, notebooks often need to be moved between
"projects", emailed, etc. There are lots of little files, most of them used
only in a single notebook. Making sure that every notebook has all the
files it needs becomes a headache. I often end up having broken notebooks
because the data that I used to run them has gotten lost, and since
notebooks don't "fail" and can't be tested, these are even harder to find.
The two situations are just different use cases with different
requirements. There is no single "right" solution, and supporting one use
case doesn't break the other. (By analogy, we also don't discuss whether
we should either have global variables or local variables, we have both
because each kind of scope has its own uses.)
Anyway, there are really two different issues. Adding nb_open and nb_data
really would work with JSON or zip transparently and not break anything.
Separately, long term, I encourage you to reconsider the decision about
zip files, and how to support them. I think the suggestion I made above
(treating any zip file along a path as if it had been mounted with
fuse-zip) would allow you to continue development along your preferred path
(treating everything as directory trees) while still giving people the
option of working with self-contained files when they need to.
I don't think storing binary data in the notebook file itself is worth a
> new kernel-side API, more than existing systems for b64 data in Python
> scripts, which will work just as well in the notebook as anywhere else (not
> that they are great, of course).
That's not practical: that is a lot of effort when there are dozens of
notebooks and files involved, it's impossible to maintain, and the
resulting notebooks look horrible. The only way to get reasonably
self-contained and transportable notebooks right now is to put all the
shared data on a web server and access it with urlopen, but that has other
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the IPython-User