[Numpy-discussion] A better median function?
Fri Aug 21 11:55:48 CDT 2009
Ouch, didn't check my to address first, sorry!!!
--- On Fri, 8/21/09, David Goldsmith <email@example.com> wrote:
> From: David Goldsmith <firstname.lastname@example.org>
> Subject: Re: [Numpy-discussion] A better median function?
> To: "Discussion of Numerical Python" <email@example.com>
> Date: Friday, August 21, 2009, 9:50 AM
> Not to make you regret your post ;-)
> but, you having readily furnished your email address, I'm
> taking the liberty of forwarding you my resume - I'm the guy
> who introduced himself yesterday by asking if you knew Don
> Hall - in case you have need of an experienced CCD data
> reduction programmer who knows Python, numpy, and
> matplotlib, as well as IDL, matlab, C/C++, and, from the
> "distant past" FORTRAN (not to mention advanced math and a
> little advanced physics, to boot). Caveat: I'm not
> presently in a position to relocate. :-( Thanks for
> your time and consideration,
> David Goldsmith
> DAVID GOLDSMITH
> 2036 Lakemoor Dr. SW
> Olympia, WA 98512
> Career Interests: Support of research possessing a strong
> component of one or more of the following: mathematics,
> statistics, programming, modeling, physical sciences,
> engineering, etc.
> Desired salary rate: $75,000/yr.
> Operating Systems: Windows, Macintosh, Unix
> Programming/Technical: Python, C/C++, SWIG, numpy,
> matplotlib, wxmpl, wxWidgets, SPE, Visual Studio .NET 2003,
> Trac, TortoiseSVN, RapidSVN, WinCVS, LAPACK, Matlab,
> Scientific Workplace, IDL, FORTRAN, Splus, Django (learning
> in progress).
> Office: MS Word, Excel, PowerPoint, Outlook, Publisher,
> etc.; Page Maker; etc.
> Communications: Firefox, Thunderbird, VPN, MS Explorer,
> Netscape, NCSA Telnet, Fetch, WS FTP, telnet, ftp, lynx,
> pine, etc.
> Advanced mathematics, statistics, physics, fluid dynamics,
> engineering, etc.; technical documentation.
> Programming Employment
> Technical Editor (Research Manager); June, 2009 to present;
> Planetary Sciences Group, Dept. of Physics, University of
> Central Florida, Orlando, FL (but working out of Olympia,
> WA). Write and review a broad range of docstrings for
> NumPy, the standard Python module for numerical computing,
> and manage the 2009 NumPy Documentation Summer Marathon,
> including volunteer recruitment and coordination, project
> promotion, grant writing for perpetuation of the project,
> Programming Mathematical Modeler (Functional Analyst II);
> June, 2004 through February, 2008; Emergency Response
> Division, National Oceanic and Atmospheric Administration,
> Seattle, WA (under contract with General Dynamics
> Information Technology, Fairfax, VA). Develop 3D
> enhancements to existing 2D estuarine circulation codes and
> data visualization and analysis tools in Python and C++,
> using SWIG, numpy, C/LAPACK, ATLAS, matplotlib, wxmpl, SPE,
> Visual Studio/Visual C++, wxWidgets, RapidSVN, TortoiseSVN,
> WinCVS, etc. as development tools; confer regularly with
> other physical scientists, mathematicians, and programmers
> about these tools and other issues/projects related to
> hazardous material emergency response.
> Programming Statistician (Research Associate V); May, 1999
> to September, 2001; Institute for Astronomy, University of
> Hawai`i, Hilo. Developed IDL-based software for
> analysis of data obtained in development of solid-state
> sensor technology for the Next Generation Space Telescope,
> and other related computer activities.
> Programming Research Assistant; September to December,
> 1997; Physics Dept., Univ. of Montana, Missoula.
> Assisted in the development of a FORTRAN computational model
> for optimization of toroidal plasma confinement.
> Programming Research Assistant; June to August, 1997;
> Physics Dept., Univ. of Montana, Missoula. Assisted in
> FORTRAN computer modeling of passive scalar transport in the
> Programming Research Assistant; June to August, 1997;
> Mathematical Sciences Dept., Univ. of Montana,
> Missoula. Developed, in MATLAB, a
> cellular-automata-based simulation of flow around windmill
> turbine blades.
> Programming Consultant; April, 1995; Earth Justice Legal
> Defense Fund, Honolulu, Hawai`i. Developed Excel
> spreadsheet to determine sewage discharge violations from
> municipal wastewater facility records.
> Programming Research Assistant; June to August, 1985 and
> 1986; Plasma Physics Branch, Naval Research Laboratory,
> Washington, DC. Assisted in FORTRAN computer modeling
> of plasma switching devices.
> Publications (abridged)
> 2000, w/ D. Hall (1st author) et al., "Characterization of
> lambda_c ~ 5 micron Hg:Cd:Te Arrays for Low-Background
> Astronomy", Optical and IR Telescope Instrumentation and
> Detectors, Proceedings of SPIE, Vol. 4008, Part 2.
> 2000, w/ D. Hall (1st author) et al., "Molecular Beam
> Epitaxial Mercury Cadmium Telluride: A Quiet, Warm FPA For
> NGST", Astr. Soc. Pacific Conf. Ser., Vol. 207.
> 1997, w/ A. Ware (1st author) et al., "Stability of Small
> Aspect Ratio Toroidal Hybrid Devices", American Physical
> Society, Plasma Physics Section, Semi-annual meeting.
> Education (abridged)
> Master of Arts, Mathematical Sciences, University of
> Montana, Missoula, awarded May, 1998. GPA: 4.0.
> Master of Science, Aquacultural Engineering, University of
> Hawai`i, Manoa, awarded August, 1993. GPA: 3.72.
> Bachelor of Arts, Mathematics, Brown University,
> Providence, Rhode Island, awarded May, 1989. GPA: Unreported
> (Brown does not routinely calculate GPA's; unofficially:
> Prof. Joseph Harrington, Ph.D., Department of Physics,
> University of Central Florida, 321-696-9914, firstname.lastname@example.org
> Debbie Payton, Branch Chief and Oceanographer, Emergency
> Response Division, NOAA, 206-526-6320, email@example.com
> Glen Watabayashi, Operations Manager and Oceanographer,
> ERD, NOAA, 206-526-6324, firstname.lastname@example.org
> Chris Barker, Ph.D., Oceanographer, ERD, NOAA,
> 206-526-6959, email@example.com
> Don Hall, Ph.D., Institute for Astronomy, University of
> Hawai`i, 808-932-2360, firstname.lastname@example.org
> --- On Fri, 8/21/09, Mike Ressler <email@example.com>
> > From: Mike Ressler <firstname.lastname@example.org>
> > Subject: [Numpy-discussion] A better median function?
> > To: "Discussion of Numerical Python" <email@example.com>
> > Date: Friday, August 21, 2009, 8:47 AM
> > I presented this during a lightning
> > talk at the scipy conference
> > yesterday, so again, at the risk of painting myself as
> > flaming
> > idiot:
> > ---------------------
> > Wanted: A Better/Faster median() Function
> > numpy implementation uses simple sorting algorithm:
> > Sort all the data using the .sort() method
> > Return middle value (or mean of two middle values)
> > One doesn’t have to sort all data – need only the
> > middle value
> > Nicolas Devillard discusses several algorithms at
> > http://ndevilla.free.fr/median/median/index.html
> > Implemented Devillard’s version of the Numerical
> > select()
> > function using ctypes: 2 to 20 times faster on the
> > (> 10^6
> > points) arrays I tested
> > --- Caveat: I don’t have all the bells and whistles
> > the built-in
> > median function (multiple dimensions, non-contiguous,
> > etc.)
> > Any of the numpy developers interested in pursuing
> > further?
> > -----------------------
> > I got a fairly loud "yes" from the back of the room
> which a
> > few of us
> > guessed was Robert Kern. I take that as generic
> interest at
> > least in
> > checking this out.
> > The background on this is that I am doing some glitch
> > finding
> > algorithms where I call median frequently. I think my
> > ultimate problem
> > is not in median(), but how I loop through the data,
> > that is a
> > different discussion. What I noticed as I was
> > was what I
> > noted in the slide above. Returning the middle of a
> > vector is
> > not a bad thing to do (admit it, we've all done it at
> > point), but
> > it does too much work. Things that are lower or higher
> > the median
> > don't need to be in a perfectly sorted order if all we
> > after is
> > the median value.
> > I did some googling and came up with the web page
> > above. I used
> > his modified NumRec select() function as an excuse to
> > ctypes,
> > and my initial weak attempts were successful. The
> speed ups
> > depend
> > highly on the length of the data and the randomness -
> > things that are
> > correlated or partially sorted already go quickly. My
> > caveat is that
> > my select-based median is too simple; it must have
> > contiguous data
> > of a predefined type. It also moves the data in
> > affecting the
> > original variable. I have no idea how this will blow
> up if
> > implemented
> > in a general purpose way.
> > Anyway, I'm not enough of a C-coder to have any hope
> > improving this
> > to the point where it can be included in numpy
> > However, if
> > someone is willing to take up the torch, I will
> > to assist
> > with discussion, prototyping a few routines, and
> testing (I
> > have lots
> > of real-world data). One could argue that the current
> > median
> > implementation is good enough (and it probably is for
> > of all
> > usage), but I view this as a chance to add an
> > strength
> > routine to the numpy base.
> > Thanks for listening.
> > Mike
> > --
> > firstname.lastname@example.org
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam
> protection around
> NumPy-Discussion mailing list
More information about the NumPy-Discussion