[Numpy-discussion] Emulate left outer join?
Robert Kern
robert.kern@gmail....
Tue Feb 9 16:02:48 CST 2010
On Tue, Feb 9, 2010 at 15:52, David Carmean <dlc@halibut.com> wrote:
>
> Hi,
>
> I've been working with numpy for less than a month, having learned about
> it after finding matplotlib. My foundation in things like set theory is...
> weak to nonexistent, so I need a little help mapping sql-like thoughts into
> set-theory thinking :)
>
>
> Some context to help me explain: I'm trying to store, chart, and analyze
> unix system performance data (sar/sadf output). On a typical system I have
> about 75 fields/variables, all floats, with identical timestamps... or so
> we hope. What I want to do in order to save memory/disk space is to stack
> the timeseries data all into three or four different arrays, and use a single
> timestamp field for each set.
>
> My problem is: I don't know that I can guarantee that the shape of all the
> individual arrays will be identical along the time axis. I may receive
> truncated textfiles to parse, or new variables may appear and disappear from
> the set being reported/recorded.
>
> If these were in flat files or database tables, I'd do a left outer join between
> a master timestamp table and each individual variable's table. But... I don't
> know the keywords to search for in the numpy docs/web chatter. A thread from
> just about one year ago left the question hanging:
>
> http://article.gmane.org/gmane.comp.python.numeric.general/27942
>
> Examples? Pointers? Shoves toward the correct sections of the docs?
numpy.lib.recfunctions.join_by(key, r1, r2, jointype='leftouter')
In [23]: numpy.lib.recfunctions.join_by?
Type: function
Base Class: <type 'function'>
Namespace: Interactive
File: /Users/rkern/svn/numpy/numpy/lib/recfunctions.py
Definition: numpy.lib.recfunctions.join_by(key, r1, r2,
jointype='inner', r1postfix='1', r2postfix='2', defaults=None,
usemask=True, asrecarray=False)
Docstring:
Join arrays `r1` and `r2` on key `key`.
The key should be either a string or a sequence of string corresponding
to the fields used to join the array.
An exception is raised if the `key` field cannot be found in the two input
arrays.
Neither `r1` nor `r2` should have any duplicates along `key`: the presence
of duplicates will make the output quite unreliable. Note that duplicates
are not looked for by the algorithm.
Parameters
----------
key : {string, sequence}
A string or a sequence of strings corresponding to the fields used
for comparison.
r1, r2 : arrays
Structured arrays.
jointype : {'inner', 'outer', 'leftouter'}, optional
If 'inner', returns the elements common to both r1 and r2.
If 'outer', returns the common elements as well as the elements of r1
not in r2 and the elements of not in r2.
If 'leftouter', returns the common elements and the elements of r1 not
in r2.
r1postfix : string, optional
String appended to the names of the fields of r1 that are present in r2
but absent of the key.
r2postfix : string, optional
String appended to the names of the fields of r2 that are present in r1
but absent of the key.
defaults : {dictionary}, optional
Dictionary mapping field names to the corresponding default values.
usemask : {True, False}, optional
Whether to return a MaskedArray (or MaskedRecords is `asrecarray==True`)
or a ndarray.
asrecarray : {False, True}, optional
Whether to return a recarray (or MaskedRecords if `usemask==True`) or
just a flexible-type ndarray.
Notes
-----
* The output is sorted along the key.
* A temporary array is formed by dropping the fields not in the key for the
two arrays and concatenating the result. This array is then sorted, and
the common entries selected. The output is constructed by
filling the fields
with the selected entries. Matching is not preserved if there are some
duplicates...
For some reason, numpy.lib.recfunctions isn't in the documentation
editor. I'm not sure why.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion
mailing list