[SciPy-dev] Scipy server suffering again...
J. Ryan Earl
Thu Jan 10 14:35:14 CST 2008
I believe I have made some some good short-term improvements regarding
the load on the SciPy.org server. It should be much peppier now and I
will continue to monitor the effect of the work-around. If you don't
care about the details then stop reading here. The following technical
details explain the problem, the short-term workaround, and why the
long-term fix is best for those who care.
The main problem is that the system was starved for memory and swapping
excessively. The high-load and low CPU usage is a result of processes
waiting on free memory. The "high-memory usage"(1) is a basic result
using garbage collected daemon processes that don't get restarted
regularly, in our case FastCGI Python daemons. In particular, one
virtual hosts uses "Zope" with a large amount of content and was using
45% of the memory alone. I restarted this FastCGI daemon and it went
down to 10% memory usage, though as I write this it's just under 20%
usage. Over time it'll go back up to around 40%, but I expect it to
stabilize around that and go no higher as it reaches a stable heap size.
Now what's happening is largely the result of heap memory
fragmentation(2). Garbage collected languages tend to fragment their
heap more than non-garbage collected languages, but with both it is
expected that there is a critical heapsize threshold that once reached
will satisfy all out-going and in-going heap requests without the heap
having to grow further. This is different than a memory leak where
memory consumption will grow indefinitely. Where this threshold is
depends on a variety of factors such as typical workload, but it can be
empirically measured. Thus the long-term solution is to migrate to
hardware that has enough memory to fit stable-sized heaps for all the
Python daemons into but this will take a lot of time, effort, and
testing so it's weeks out. The short-term solution is to periodically
restart the Python daemon processes before they reach max heap
fragmentation. However, restarting the daemons severs existing
connections users may have and will likely erase any session state that
isn't stored in their local web-browser so it is thus not a desirable
Right now I'm measuring how fast memory gets fragmented so I can
determine the maximum interval to use in a script to restart these
processes automatically. ie I may only need to restart them once every
few days instead of once per day to minimize severed connections.
(1) High is a relative term here. On the scale of modern servers it's
not that high, but it's high for this particular hardware.
(basic introduction to heap fragmentation)
Continue to let me know if you have problems, conversely, let me know if
you're having less problems than you've had recently. Both are good to
J. Ryan Earl
Fernando Perez wrote:
> I keep on getting, frequently, the by now familiar
> """Internal Server Error
> The server encountered an internal error or misconfiguration and was
> unable to complete your request.
> so doing anything around the site, using trac, moin, etc, is becoming
> rather difficult. I just noticed a load average on the box around 16,
> though no process is consuming any significant amount of CPU.
> If there's anything on our side (the individual project admins) we can
> do to help, please let us know.
> Scipy-dev mailing list
More information about the Scipy-dev