Search

Tree-first exploration of the dataset: an accurate traversal model of the DB, then search on top

Facet tree (live)

Below is a facet tree built from the case database (extracted labels only, no raw narratives) drawn with D3 as an SVG: each box is a cohort node, lines are parent→child edges (a decision tree over facets). You can search via filling in the I am interested in box or manual prune and focus.

I am interested in cases involving

Workflow: fill in these sentence boxes (or open Prune & focus for more filters), set Facet depth to reduce tree size, then click Reload tree. — Type a label: grey shows the rest of the first matching labels (alphabetically among prefixes). Searches that don't match pre-defined labels show nothing; red border means no label starts that way. Tab accepts the grey completion. Leave blank to skip.

Prune & focus (facet drill-down)

Use Split on to include a dimension in the tree (leave unchecked to skip it; if none are checked, the tree is a single cohort). For Allow values, choose from the dropdown; each pick appears on the right and disappears from the list until you remove it with ×. A case matches when any of its tags on that field are among your picks (OR within a row); between rows, AND. The tree shape follows one primary tag per case per depth (fast layout). Numbers on each box and the cohort ID list count every case that includes that tag on the path (any position), so sibling topic counts can add up to more than the parent.

Partition order (10 steps): 1.Topic → 2.Severity → 3.Platform → 4.Inv. type → 5.Source → 6.Agency → 7.Prosecution Outcome → 8.Location → 9.Severity Phrase → 10.Date range.
Loading facet tree…

Phase 1

Semantic search (in development)

This is a different direction from the facet tree above: text-based exploration using semantic search to find cases that match ideas in language (meaning and similarity), not only drill-down along fixed facet partitions. A transformer-based embedding model would encode the corpus and each natural-language question as dense vectors; cases whose embeddings are closest to the question embedding (by similarity in that space) would be retrieved and displayed. Nothing here is wired yet; the live tree remains the supported path today.