{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "567fab57",
   "metadata": {},
   "source": [
    "# Demo of the `crossmatch` function\n",
    "\n",
    "The `crossmatch` function calculates indexing arrays between two catalogs that share a common object ID, handling cases of both repeated and absent IDs. In this demo, we'll create a couple of dummy catalogs to demonstrate basic usage.\n",
    "\n",
    "The typical use-case of `crossmatch` is as follows. Suppose you have two catalogs of data, `Cat_A` and `Cat_B`, and both catalogs have some integer column storing a unique integer identifying each object. The `crossmatch` function calculates two indexing arrays that provide the correspondence between entries that pertain to the common objects. The conditions that `crossmatch` assumes are:\n",
    "\n",
    "- `Cat_A` is permitted to contain repeated entries of the same ID\n",
    "- `Cat_A` is permitted to contain entries of IDs that do not appear in `Cat_B`\n",
    "- `Cat_B` is NOT permitted to contain repeated entries of the same ID\n",
    "\n",
    "Let's get started by setting up a couple of catalogs storing some dummy data for demonstration purposes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f51b796e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "n_a = 2500\n",
    "n_b = 400\n",
    "\n",
    "cat_b_objid = np.arange(n_b).astype(int)\n",
    "cat_b_mass = np.random.uniform(0, 10, n_b)\n",
    "cat_b_spin = 10**np.random.uniform(-2, 0, n_b)\n",
    "cat_b = dict(objid=cat_b_objid, mass=cat_b_mass, spin=cat_b_spin)\n",
    "\n",
    "cat_a_objid = np.random.choice(cat_b_objid, size=n_a)\n",
    "\n",
    "cat_a = dict(objid=cat_a_objid)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ee2f530",
   "metadata": {},
   "source": [
    "Note that `cat_A` has been set up so that every one of its entries has a unique matching entry in `cat_B`, and that while `cat_A` has numerous repeats, there are no repeated IDs in `cat_B`. So we see that these two catalogs meet the assumptions required by the `crossmatch` function. In the next example below, we explore a case where some of the entries in `cat_A` do not appear in `cat_B`, but for now in this first example everything has a match.\n",
    "\n",
    "Now let's use `crossmatch` to calculate the indexing arrays providing the correspondence between common objects:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb1120b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from galsampler.crossmatch import crossmatch\n",
    "\n",
    "idxA, idxB = crossmatch(cat_a['objid'], cat_b['objid'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1fab6a8",
   "metadata": {},
   "source": [
    "First note that the length of the returned indexing arrays both have the same number of entries as the length of `cat_A`: the `crossmatch` function calculates arrays that provide an index in `cat_B` for every object in `cat_A` for which there is a match. Since every object in `cat_A` has a match, then both `idxA` and `idxB` have the same number of entries as the number of objects in `cat_A`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f183293c",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(len(idxA), len(idxB))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be26b28e",
   "metadata": {},
   "source": [
    "Now let's check that the indexing arrays have the expected behavior. \n",
    "\n",
    "First let's verify that they do indeed provide a matching correspondence:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8c63ae4c",
   "metadata": {},
   "outputs": [],
   "source": [
    "assert np.allclose(cat_a['objid'][idxA], cat_b['objid'][idxB])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b7ba031",
   "metadata": {},
   "source": [
    "Finally, let's augment `cat_A` with the properties of `mass` and `spin` whose values are stored in `cat_b`. This is a two-step process:\n",
    "\n",
    "1. Initialize an empty array where we will store the new data from the cross-matching\n",
    "2. Use the indexing arrays to map the values from `cat_B` into `cat_A`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c7cba2b7",
   "metadata": {},
   "outputs": [],
   "source": [
    "cat_a['mass'] = np.zeros(n_a)\n",
    "cat_a['spin'] = np.zeros(n_a)\n",
    "\n",
    "cat_a['mass'][idxA] = cat_b['mass'][idxB]\n",
    "cat_a['spin'][idxA] = cat_b['spin'][idxB]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8d8bfc1",
   "metadata": {},
   "source": [
    "Let's do one more example in which some of the objects in `cat_A` have no matching counterpart in `cat_B`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cd430de7",
   "metadata": {},
   "outputs": [],
   "source": [
    "n_unmatched = 20\n",
    "cat_a_objid[:n_unmatched] = np.random.randint(-5, 0, n_unmatched)\n",
    "\n",
    "cat_a = dict(objid=cat_a_objid)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4bf5ae41",
   "metadata": {},
   "outputs": [],
   "source": [
    "idxA, idxB = crossmatch(cat_a['objid'], cat_b['objid'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "047bc68c",
   "metadata": {},
   "source": [
    "We have set up this example so that the first $20$ entries of `cat_A` have no match in `cat_B`. Let's check that the length of the returned indexing arrays reflect this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6a44722",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(len(idxA), len(idxB))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f0ca03d",
   "metadata": {},
   "source": [
    "Now let's again transfer the properties in `cat_B` into `cat_A`. This time, we'll initialize our arrays with fill values so that it's easy to verify that unmatched objects in `cat_A` still have their initial values after the cross-matching:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "726a8e87",
   "metadata": {},
   "outputs": [],
   "source": [
    "cat_a['mass'] = np.zeros(n_a) + np.nan\n",
    "cat_a['spin'] = np.zeros(n_a) + np.nan\n",
    "\n",
    "cat_a['mass'][idxA] = cat_b['mass'][idxB]\n",
    "cat_a['spin'][idxA] = cat_b['spin'][idxB]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88c7cffb",
   "metadata": {},
   "source": [
    "Next we'll define a simple `has_match` array storing whether or not the objects in `cat_A` have a match, and we'll verify that only the negatively-valued IDs go unmatched, which is the way we set up this toy example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f025d511",
   "metadata": {},
   "outputs": [],
   "source": [
    "mask_has_match = np.zeros(n_a).astype(bool)\n",
    "mask_has_match[idxA] = True\n",
    "\n",
    "assert not np.any(np.isnan(cat_a['mass'][mask_has_match]))\n",
    "assert np.all(np.isnan(cat_a['mass'][~mask_has_match]))\n",
    "\n",
    "assert np.all(cat_a['objid'][mask_has_match]>=0)\n",
    "assert np.all(cat_a['objid'][~mask_has_match]<0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c78225c",
   "metadata": {},
   "source": [
    "As we can see above, the only NaN values in our cross-matched catalog come from objects without a match in `cat_B`, all of which pertain to objects in `cat_A` with negative IDs."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}