[IPython-User] Advice on running a server with multiple users

Benjie Chen benjie@ginkgobioworks....
Sat Aug 25 22:46:31 CDT 2012


Matthias,

Thanks for the comments. This is really helpful, giving me some idea
of some short term changes I can make that may be of use for everyone
(such as the messages alerting user that someone else has already a
kernel running), and seeing the overall direction.

One thing that I'd like to raise: I think concurrent editing and
distributed backend are two separate things. They may share a lot of
things in common, like requiring cell-ids, etc, but I think they can
be considered separately.

Let's consider that the end goal is user collaborating on a set of
notebook (i.e. other goals like archiving revisions, etc, can be
handled differently). Then the two approaches discussed so far maps
onto a) strong consistency with mutex, and b) eventual consistency
with git.

With strong consistency, we are talking about having users being able
to actively view/edit the same notebook, in the same kernel. Mutex is
used to ensure consistency. With this approach, you don't actually
need to have a git backend, to achieve the goal of collaborating on
notebooks. But I think mutex/locking is dangerous in a distributed,
unreliable world. E.g. google wave (and I assume google docs) went
away from this approach, in favor of eventual consistency (in their
case, using operational transformation rather than git model, but I
think operational transformation is hard here since you have only one
kernel state). I think there is also danger here with state of the
kernel, e.g. running cells out of order or in different order by
different users will cause confusion to the user. In general, I feel
that while doable, there are lots of complexity here.

With eventual consistency, and if git is the choice here, then we
don't need active collaboration. Users can edit away on their own
copies of the notebook, and depend on git to achieve consistency,
eventually (or they can choose to be branched). With good
communication (i.e. notifying users of commits in real time), you can
achieve fairly "on-line" collaboration, with a lot less complexity
(since you are pushing much of the complexity to git). Essentially you
can just build a UI for git that's incorporated into the notebook.
Much like emacs has UI for git and other version control tools.

I think it's possible to choose to work on these separately.
Concurrent editing with strong consistency by itself can be useful,
even w/o git backend. Similarly, git backend can be useful w/o
concurrent editing. And you obviously can have both.

Right now, I am thinking the git backend is a bigger win. If a git
backend is available, with UI built to handle commits and
notifications of other users' commits (poll the remote repo?), then
you can support distributed users, and you can support multiple user
on same server by having the server provision a separate local repo
for each user. Additionally, the git backend gives you archival, etc.

But the git backend approach still requires multi-user login, someway
to make reconciliation easier (it's not immediately clear how cell-ids
here help, but I haven't looked at your stuff), and UI work to tie git
operations to the frontend.

Thanks,

Benjie



On Sat, Aug 25, 2012 at 6:02 AM, Matthias BUSSONNIER
<bussonniermatthias@gmail.com> wrote:
>
> Le 24 août 2012 à 22:23, Benjie Chen a écrit :
>
>> Matthias,
>>
>> We may have an active user base here needing multiple user support, so
>> can you let me know what we can do to help? I'd love to see/test some
>> early user authentication mechanisms.
>
> There is none right now. They have to be design from ground up.
> Brian might have more idea on it than I do.
>
>> Also, is there a list of
>> milestones drawn out on how this would work? Is this the place to
>> discuss details?
>
> There might be discussion on this somewhere on github, but nothing
> concrete yep. I guess it will start with the redaction of an IPEP that lay out
> what is our exact goal and the step to achieve it.
>
>>
>>> From what I understand you described, each user will get a separate
>> git backed directory.
>
> Yes, we want a git backend. And something distributed.
>
>> The notebooks are checked into one repo. I see
>> two possible UX, not sure if your ideas map onto either of these. One
>> UX is user checks out the entire repo, work on user's own notebooks,
>> and commit and push.
>
>
>> Not sure how you resolve conflicts, especially
>> with the way the notebook is stored as a JSON.
>
> Not that hard. we need to introduce cell ID.
> This is one of my woking project.
>
> I think that we need more than cell id to have a clean way to merge/diff cells because that can be splitter or copy and pasted.
> See https://github.com/Carreau/ipython/commits/_cell_id if you are adventurous.
> There should be 2 notebooks with cell ids and a script to diff the input that is smart enough to diff correctly the concatenated contents
> of the ancestors cell to the new one.
>
> I need more time to build a better prototype to convince others dev.
>
>
>> The second UX is user
>> browse master copies from the repo, like browsing on github, and can
>> view static copies of notebooks. When they want to open one for
>> editing, they check that one out, work on it, and check that one back
>> in. Then you can either prevent multiple users from checking out the
>> same copy, or allow them to do so if there are merge algorithms that
>> would work for the notebooks. This second one here is more rcs then
>> git.
>>
>
> To achieve what we want we need several stuff.
>
> 1) multi login to distinguish between user.
> 5) versionning backend
> 3) cell id.
> 4) by cell save/load.
>         We want to be able to sync cell by cell between server and browser.
> 5) Track multiple pages connected to the same notebook
>
> Then we should be ready for multi user.
>
> Let me describe what I think the global vision is from an example.
>
> I, owner of `my_awsome_notebook.ipynb` start it.
> I can invite you to work on it. only when started.
> Meaning that we will work at the same time, on the same file and the same kernel.
> The introduction of cell id will allow us to send 'mutex' to a certain cell making only
> one user at a time being able to edit it.
> Once the edit of a cell is done you can push it to others, (and commit in // to a side branch just in case)
>
> You can of course imagine that the owner can give separately editing, executing and saving privileges to guests.
>
> And nothing prevent you from forking the file,  running it into your own kernel and make PR on the file content later.
>
> We are just really inflexible with (one file == one kernel), or at least nobody have found a way to convince us to do it.
> The state of the notebook should reflect as close as possible the sate of the kernel and all 'view' of an ipynb file
> should be as synchronized as possible.
>
>> Basically, eager to help but not sure how we can.
>
> The easiest you can do is test PR. we have a small script on /tools/git-mpr.py
> That allows you to merge a pr by number. We really would like to go below 10 open PRs.
> Usually with one or 2  "Works great for me, code looks alright" it is always easier to push the merge button.
>
> After we'll be happy to have you open PRs to implement whatever is needed.
>
> If you'd like to help on multiuser I suggest you try to look at /IPython/frontend/html/notebook/handler.py
> and /IPython/frontend/html/notebook/static/js/
>
> but you should know them a little already :-)
>
> If you wan't to give a shot at showing a message :
> "Someone is already connected to this notebook from another web page, are you sure you want to continue ?"
>
> On the other side why not a "Someone is trying to connect...etc"
> You can think of a way to detect that a certain page browser has been inactive for a certain amount of time and show :
> "You've been inactive for xxxx time"
> eventually replace by "this page has been remotely modified"
>
> This would involve digging into JS.
> I guess you won't have issue, and I don't think we have anything agains using coffescript, but you should check with Brian.
> It will at least give an idea of what handler are needed.
>
> You now that we also have support for different backend, one often requested thing is the ability to have backup file.
> But with the current handler and naming convention, you have more chance of erasing your work than creating a backup file.
> having a cleaner API from dealing with file from JS, and others handlers would be nice.
> like a
> `IPython.notebook.save_widget.save_copy(name)`
> `IPython.notebook.save_widget.make_copy_and_open_it(name)`
> `IPython.notebook.save_widget.rename(name)
> With warning if file exist.
>
> I haven't tried  django-ipy-nbmgr but I guess you had to implement theses action somewhere and have a good idea of what is needed.
>
> You can also work on the cell id stuff, and start looking into per-cell sync.
> Or look at side project that we want to merge in later like nbconvert.
>
> Don't be afraid to ask or send incomplete PR to have feedback or ask question.
>
> --
> Matthias
>
>
>
>
>
>
>>
>> Thanks,
>>
>> Benjie
>>
>>
>>
>> On Fri, Aug 24, 2012 at 3:20 PM, Matthias BUSSONNIER
>> <bussonniermatthias@gmail.com> wrote:
>>>
>>> Le 24 août 2012 à 20:58, Benjie Chen a écrit :
>>>
>>>> Hi,
>>>>
>>>> I am running an iPython server with multiple users. We are running
>>>> into this problem where it's fairly easy for users to open the same
>>>> notebook, thereby stepping on each other's kernel. What's the current
>>>> advice for handling this?
>>>
>>> There is good mechanism right now to deal with this, except running one server per user.
>>> The notebook server has no notion of user. The easiest that could be done is detect if web socket are
>>> already connected and send a warning that the same notebook might already be opened from elsewhere.
>>>
>>>> And what features may be coming in the near
>>>> and long term dealing with this issue?
>>>
>>> We have multiuser login/collaboration as short term plan, but this is still **a lot** of work.
>>> The target is something like google docs with a per cell lock, but we are a long way to that
>>>
>>>>
>>>> For example, I can envision that if user authentication is supported
>>>> (is it?),
>>> Only password for now, there is no notion of ID.
>>>
>>>> then a notebook can have multiple kernels,
>>> That won't happend
>>> 1 notebook - 1 kernel we are pretty inflexible on that.
>>> people will have to fork the ipynb file.
>>> like you fork a repo an github.
>>>
>>>> each kernel
>>>> belong to a user. A user can only enter his/her own kernel, but can
>>>> enter readonly mode of other kernels.
>>> that will be the first step in collaboration yes, you will be able to see a static view of the **notebook**.
>>>
>>>> Warnings are given when user
>>>> opens a new kernel on a notebook that already exists an active kernel,
>>>> so they know the danger of overwriting each other's changes.
>>> Each user will have access to it's folder, you will never work on the same file, unless you have a live collaboration.
>>> In the end you should never have the same file opened in 2 pages with difference state.
>>>
>>>> Thanks,
>>> We hope to be able to offer you multi user soon, and we always welcome people helping, don't hesitate !
>>>
>>> I hope I answered your questions.
>>> --
>>> Matthias
>>>
>>> _______________________________________________
>>> IPython-User mailing list
>>> IPython-User@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/ipython-user
>> _______________________________________________
>> IPython-User mailing list
>> IPython-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/ipython-user
>
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user


More information about the IPython-User mailing list