{ "cells": [ { "cell_type": "markdown", "id": "567fab57", "metadata": {}, "source": [ "# Demo of the `crossmatch` function\n", "\n", "The `crossmatch` function calculates indexing arrays between two catalogs that share a common object ID, handling cases of both repeated and absent IDs. In this demo, we'll create a couple of dummy catalogs to demonstrate basic usage.\n", "\n", "The typical use-case of `crossmatch` is as follows. Suppose you have two catalogs of data, `Cat_A` and `Cat_B`, and both catalogs have some integer column storing a unique integer identifying each object. The `crossmatch` function calculates two indexing arrays that provide the correspondence between entries that pertain to the common objects. The conditions that `crossmatch` assumes are:\n", "\n", "- `Cat_A` is permitted to contain repeated entries of the same ID\n", "- `Cat_A` is permitted to contain entries of IDs that do not appear in `Cat_B`\n", "- `Cat_B` is NOT permitted to contain repeated entries of the same ID\n", "\n", "Let's get started by setting up a couple of catalogs storing some dummy data for demonstration purposes:" ] }, { "cell_type": "code", "execution_count": null, "id": "f51b796e", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "n_a = 2500\n", "n_b = 400\n", "\n", "cat_b_objid = np.arange(n_b).astype(int)\n", "cat_b_mass = np.random.uniform(0, 10, n_b)\n", "cat_b_spin = 10**np.random.uniform(-2, 0, n_b)\n", "cat_b = dict(objid=cat_b_objid, mass=cat_b_mass, spin=cat_b_spin)\n", "\n", "cat_a_objid = np.random.choice(cat_b_objid, size=n_a)\n", "\n", "cat_a = dict(objid=cat_a_objid)" ] }, { "cell_type": "markdown", "id": "0ee2f530", "metadata": {}, "source": [ "Note that `cat_A` has been set up so that every one of its entries has a unique matching entry in `cat_B`, and that while `cat_A` has numerous repeats, there are no repeated IDs in `cat_B`. So we see that these two catalogs meet the assumptions required by the `crossmatch` function. In the next example below, we explore a case where some of the entries in `cat_A` do not appear in `cat_B`, but for now in this first example everything has a match.\n", "\n", "Now let's use `crossmatch` to calculate the indexing arrays providing the correspondence between common objects:" ] }, { "cell_type": "code", "execution_count": null, "id": "bb1120b0", "metadata": {}, "outputs": [], "source": [ "from galsampler.crossmatch import crossmatch\n", "\n", "idxA, idxB = crossmatch(cat_a['objid'], cat_b['objid'])" ] }, { "cell_type": "markdown", "id": "d1fab6a8", "metadata": {}, "source": [ "First note that the length of the returned indexing arrays both have the same number of entries as the length of `cat_A`: the `crossmatch` function calculates arrays that provide an index in `cat_B` for every object in `cat_A` for which there is a match. Since every object in `cat_A` has a match, then both `idxA` and `idxB` have the same number of entries as the number of objects in `cat_A`:" ] }, { "cell_type": "code", "execution_count": null, "id": "f183293c", "metadata": {}, "outputs": [], "source": [ "print(len(idxA), len(idxB))" ] }, { "cell_type": "markdown", "id": "be26b28e", "metadata": {}, "source": [ "Now let's check that the indexing arrays have the expected behavior. \n", "\n", "First let's verify that they do indeed provide a matching correspondence:" ] }, { "cell_type": "code", "execution_count": null, "id": "8c63ae4c", "metadata": {}, "outputs": [], "source": [ "assert np.allclose(cat_a['objid'][idxA], cat_b['objid'][idxB])" ] }, { "cell_type": "markdown", "id": "5b7ba031", "metadata": {}, "source": [ "Finally, let's augment `cat_A` with the properties of `mass` and `spin` whose values are stored in `cat_b`. This is a two-step process:\n", "\n", "1. Initialize an empty array where we will store the new data from the cross-matching\n", "2. Use the indexing arrays to map the values from `cat_B` into `cat_A`" ] }, { "cell_type": "code", "execution_count": null, "id": "c7cba2b7", "metadata": {}, "outputs": [], "source": [ "cat_a['mass'] = np.zeros(n_a)\n", "cat_a['spin'] = np.zeros(n_a)\n", "\n", "cat_a['mass'][idxA] = cat_b['mass'][idxB]\n", "cat_a['spin'][idxA] = cat_b['spin'][idxB]" ] }, { "cell_type": "markdown", "id": "f8d8bfc1", "metadata": {}, "source": [ "Let's do one more example in which some of the objects in `cat_A` have no matching counterpart in `cat_B`:" ] }, { "cell_type": "code", "execution_count": null, "id": "cd430de7", "metadata": {}, "outputs": [], "source": [ "n_unmatched = 20\n", "cat_a_objid[:n_unmatched] = np.random.randint(-5, 0, n_unmatched)\n", "\n", "cat_a = dict(objid=cat_a_objid)" ] }, { "cell_type": "code", "execution_count": null, "id": "4bf5ae41", "metadata": {}, "outputs": [], "source": [ "idxA, idxB = crossmatch(cat_a['objid'], cat_b['objid'])" ] }, { "cell_type": "markdown", "id": "047bc68c", "metadata": {}, "source": [ "We have set up this example so that the first $20$ entries of `cat_A` have no match in `cat_B`. Let's check that the length of the returned indexing arrays reflect this:" ] }, { "cell_type": "code", "execution_count": null, "id": "b6a44722", "metadata": {}, "outputs": [], "source": [ "print(len(idxA), len(idxB))" ] }, { "cell_type": "markdown", "id": "3f0ca03d", "metadata": {}, "source": [ "Now let's again transfer the properties in `cat_B` into `cat_A`. This time, we'll initialize our arrays with fill values so that it's easy to verify that unmatched objects in `cat_A` still have their initial values after the cross-matching:" ] }, { "cell_type": "code", "execution_count": null, "id": "726a8e87", "metadata": {}, "outputs": [], "source": [ "cat_a['mass'] = np.zeros(n_a) + np.nan\n", "cat_a['spin'] = np.zeros(n_a) + np.nan\n", "\n", "cat_a['mass'][idxA] = cat_b['mass'][idxB]\n", "cat_a['spin'][idxA] = cat_b['spin'][idxB]" ] }, { "cell_type": "markdown", "id": "88c7cffb", "metadata": {}, "source": [ "Next we'll define a simple `has_match` array storing whether or not the objects in `cat_A` have a match, and we'll verify that only the negatively-valued IDs go unmatched, which is the way we set up this toy example." ] }, { "cell_type": "code", "execution_count": null, "id": "f025d511", "metadata": {}, "outputs": [], "source": [ "mask_has_match = np.zeros(n_a).astype(bool)\n", "mask_has_match[idxA] = True\n", "\n", "assert not np.any(np.isnan(cat_a['mass'][mask_has_match]))\n", "assert np.all(np.isnan(cat_a['mass'][~mask_has_match]))\n", "\n", "assert np.all(cat_a['objid'][mask_has_match]>=0)\n", "assert np.all(cat_a['objid'][~mask_has_match]<0)" ] }, { "cell_type": "markdown", "id": "1c78225c", "metadata": {}, "source": [ "As we can see above, the only NaN values in our cross-matched catalog come from objects without a match in `cat_B`, all of which pertain to objects in `cat_A` with negative IDs." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }