Demo of the crossmatch
function¶
The crossmatch
function calculates indexing arrays between two catalogs that share a common object ID, handling cases of both repeated and absent IDs. In this demo, we’ll create a couple of dummy catalogs to demonstrate basic usage.
The typical use-case of crossmatch
is as follows. Suppose you have two catalogs of data, Cat_A
and Cat_B
, and both catalogs have some integer column storing a unique integer identifying each object. The crossmatch
function calculates two indexing arrays that provide the correspondence between entries that pertain to the common objects. The conditions that crossmatch
assumes are:
Cat_A
is permitted to contain repeated entries of the same IDCat_A
is permitted to contain entries of IDs that do not appear inCat_B
Cat_B
is NOT permitted to contain repeated entries of the same ID
Let’s get started by setting up a couple of catalogs storing some dummy data for demonstration purposes:
[1]:
import numpy as np
n_a = 2500
n_b = 400
cat_b_objid = np.arange(n_b).astype(int)
cat_b_mass = np.random.uniform(0, 10, n_b)
cat_b_spin = 10**np.random.uniform(-2, 0, n_b)
cat_b = dict(objid=cat_b_objid, mass=cat_b_mass, spin=cat_b_spin)
cat_a_objid = np.random.choice(cat_b_objid, size=n_a)
cat_a = dict(objid=cat_a_objid)
Note that cat_A
has been set up so that every one of its entries has a unique matching entry in cat_B
, and that while cat_A
has numerous repeats, there are no repeated IDs in cat_B
. So we see that these two catalogs meet the assumptions required by the crossmatch
function. In the next example below, we explore a case where some of the entries in cat_A
do not appear in cat_B
, but for now in this first example everything has a match.
Now let’s use crossmatch
to calculate the indexing arrays providing the correspondence between common objects:
[2]:
from galsampler.crossmatch import crossmatch
idxA, idxB = crossmatch(cat_a['objid'], cat_b['objid'])
First note that the length of the returned indexing arrays both have the same number of entries as the length of cat_A
: the crossmatch
function calculates arrays that provide an index in cat_B
for every object in cat_A
for which there is a match. Since every object in cat_A
has a match, then both idxA
and idxB
have the same number of entries as the number of objects in cat_A
:
[3]:
print(len(idxA), len(idxB))
2500 2500
Now let’s check that the indexing arrays have the expected behavior.
First let’s verify that they do indeed provide a matching correspondence:
[4]:
assert np.allclose(cat_a['objid'][idxA], cat_b['objid'][idxB])
Finally, let’s augment cat_A
with the properties of mass
and spin
whose values are stored in cat_b
. This is a two-step process:
Initialize an empty array where we will store the new data from the cross-matching
Use the indexing arrays to map the values from
cat_B
intocat_A
[5]:
cat_a['mass'] = np.zeros(n_a)
cat_a['spin'] = np.zeros(n_a)
cat_a['mass'][idxA] = cat_b['mass'][idxB]
cat_a['spin'][idxA] = cat_b['spin'][idxB]
Let’s do one more example in which some of the objects in cat_A
have no matching counterpart in cat_B
:
[6]:
n_unmatched = 20
cat_a_objid[:n_unmatched] = np.random.randint(-5, 0, n_unmatched)
cat_a = dict(objid=cat_a_objid)
[7]:
idxA, idxB = crossmatch(cat_a['objid'], cat_b['objid'])
We have set up this example so that the first \(20\) entries of cat_A
have no match in cat_B
. Let’s check that the length of the returned indexing arrays reflect this:
[8]:
print(len(idxA), len(idxB))
2480 2480
Now let’s again transfer the properties in cat_B
into cat_A
. This time, we’ll initialize our arrays with fill values so that it’s easy to verify that unmatched objects in cat_A
still have their initial values after the cross-matching:
[9]:
cat_a['mass'] = np.zeros(n_a) + np.nan
cat_a['spin'] = np.zeros(n_a) + np.nan
cat_a['mass'][idxA] = cat_b['mass'][idxB]
cat_a['spin'][idxA] = cat_b['spin'][idxB]
Next we’ll define a simple has_match
array storing whether or not the objects in cat_A
have a match, and we’ll verify that only the negatively-valued IDs go unmatched, which is the way we set up this toy example.
[10]:
mask_has_match = np.zeros(n_a).astype(bool)
mask_has_match[idxA] = True
assert not np.any(np.isnan(cat_a['mass'][mask_has_match]))
assert np.all(np.isnan(cat_a['mass'][~mask_has_match]))
assert np.all(cat_a['objid'][mask_has_match]>=0)
assert np.all(cat_a['objid'][~mask_has_match]<0)
As we can see above, the only NaN values in our cross-matched catalog come from objects without a match in cat_B
, all of which pertain to objects in cat_A
with negative IDs.