Wed Oct 6 14:04:21 CDT 2010
#1295: Confusing and/or inaccurate labels on probability plot ('stats.probplot')
axes
Reporter: jmpolom | Owner: somebody
Type: defect | Status: new
Priority: normal | Milestone: 0.9.0
Component: scipy.stats | Version: 0.7.0
Keywords: probplot |
Problem: The SciPy probability plotting function (probplot) wrongly labels
the resulting plot's x-axis as 'Order Statistic Medians'. The actual set
of values plotted on the x-axis are ordered z-score transforms of each
data point's median statistic, readily determined by examining the source
in morestats.py. Likewise the y-axis label could be improved, since
'Ordered Values' implies the data plotted is numeric. If the y-axis plots
categorical or ordinal data, labeling it 'Ordered Values' is not exactly
truthful.
Context: A mislabeled x-axis prevents me from using SciPy's probability
plotting features professionally without a little grief. Properly labeled
axes are essential.
Recommended Fixes: I recommend changing the label probplot assigns to the
x-axis to simply 'Z-scores'. Updated documentation for probplot can
explain what is plotted on which axis, and what statistic probplot
computes. Finally, I recommend changing the default y-axis label to
'Ordered Responses', which doesn't imply any specific type of data is
plotted on the axis.
I've attached sample probability plot output showing the mislabeled axes
this ticket refers to. I've also attached an updated morestats.py file
with the suggested fixes.
