[IPython-User] database magics, saving notebooks in the database and smallest quantum of shareability etc

Brian Granger ellisonbg@gmail....
Mon Feb 25 00:31:20 CST 2013


On Sat, Feb 23, 2013 at 4:38 PM, Nitin Borwankar <nborwankar@gmail.com> wrote:
> I would like to connect with the IPython team at PyData 2013 in SantaClara
> next month as I have interest in doing the following and would like to
> co-ordinate :-
> a) want to create a robust plugin framework for database magics (SQL and
> NoSQL)
> I have Postgres working (%%PGDB) right now and aim to do MySQL and Mongo
> (and anything else based on community feedback).
> Essentially  (after some database config) you run
> "%PGDB <sql query>"  in IPyNB
> this executes the query on a PG instance (in your config) via stdin/stdout
> voodoo to the psql client (needs to be locally installed) and returns the
> results set in the output visually as a text based table.
> I am hoping to put out the PGDB magic on github before PyData.
> Note that I make no attempt to parse SQL or understand the string - if you
> mess up you will hear from the database at the other end as if you were
> sitting at a console in a terminal.  I am just the middle man.
> Or rather %%PGDB is.
> Some work is needed to return the result set as a Py dict that is visible in
> the name space so it can be used by other code.
> There are some issues to deal with re: keeping a persistent connection so
> that multiple requests don't open more and more connections - this is now
> running at a basic (optimistic scenario) level but has not been tested for
> all kinds of bad scenarios.
> I have used standard "magics" metaphors and just dropped the code in the
> right place and did some config. The integration with the IPyNB system was
> not trivial but it is straightforward enough that it is not rocket science.
> I was very pleased it worked after some elementary brain twisting :-)
> Kudos to the IPy team for making extensibility straightforward - this is a
> b)  want to save JSON .ipynb in Mongo, Postgres key-value store and other
> JSON stores instead of the filesystem in current directory.

Our notebook manager class is designed to make it possible to add
other storage backends.  Right now we have a file system based one and
another based on Azure blob storage.  It shouldn't be too difficult to
create a mongodb based one.  However, I think Mongodb is a horrible
choice to use for entire notebooks.  Notebooks can be really big and
mongodb is not very space efficient.

> c) want to be able to compose (note *compose* not edit) Notebooks via a web
> UI where a user can assemble content chunks (JSON in Mongo) or .... and then
> "publish" to .ipynb compliant JSON.
> This will allow massive reuse of working content chunks especially those
> that involve code examples and diagrams needing reproduceability.
> My motivation for doing this is
> 1) IMHO, the filesystem based storage is subject to OS level security issues
> and the database storage *may* (huge big MAY) provide some mitigation.
> Caveat being don't naively assume that database security is the whole
> answer.

The big security issues related to the notebook are not the file
system, but the fact that users can run arbitrary code.  Throwing
notebooks in a db won't help that at all.  In fact, because users can
run arbitrary code, you risk them being able to hack your mongodb

> 2) while the quantum of shareability right now is the single notebook, which
> is awesome in itself, this can be taken even further. So if one wishes, a
> notebook can be published as a sequence of quasi-atomic chunks which can
> then be separately mixed and mashed.  The quasi-atomic means that we define
> one further level of granularity inside a notebook - the boundaries being
> orthogonal to actual content boundaries. i.e. we should not say e.g. that
> "use a horizontal rule" as a chunk marker - this is brittle etc.

We have long term plans to think about allowing notebooks to be saved
and loaded on a cell-by-cell basis, but this is pretty far off still.

At this point, I think you should try to implement everything you want
on your own outside of IPython proper.  In the long term, you may find
that IPython moves in some of these directions, but we are focused on
other things right now:




> 3) Content management becomes easy *in some respects* when the granularity
> is smaller.
> Thanks for reading this far.
> ------------------------------------------------------------------
> Nitin Borwankar
> nborwankar@gmail.com
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

More information about the IPython-User mailing list