Search

Tree-first exploration of the dataset: an accurate traversal model of the DB, then search on top

Facet tree (live)

Below is a facet tree built from the case database (extracted labels only, no raw narratives) drawn with D3 as an SVG: each box is a cohort node, lines are parent→child edges (a decision tree over facets). You can search via filling in the I am interested in box or manual prune and focus.

Facet depth

Phase 1

Database → tree: Derive a deterministic decision tree from the case store—nodes, edges, and state at each step (counts, facet signature, valid next branches). Dimensions follow curated enumerations so behavior is explainable.
Traversal semantics: Each branch narrows the population; paths are reproducible (same choices → same cohort). Multi-step drill-down can use cursor / step state aligned with the tree.
Leaves = cohorts: Terminal nodes expose group-centric payloads—case_count, labels, facet_signature, optional opaque member refs—not raw case narratives in the default search response.
Foundation for search UX: Structured chips / paths on top of this tree; optional richer graph (multiple routes to similar cohorts) in a later phase. Implementation sketch: facet filters over the storage layer, then rollups if needed for speed.

Semantic search (in development)

This is a different direction from the facet tree above: text-based exploration using semantic search to find cases that match ideas in language (meaning and similarity), not only drill-down along fixed facet partitions. A transformer-based embedding model would encode the corpus and each natural-language question as dense vectors; cases whose embeddings are closest to the question embedding (by similarity in that space) would be retrieved and displayed. Nothing here is wired yet; the live tree remains the supported path today.