Data & methodology
This page documents how we build and validate structured data for characters, ties, and chapters. For scholarship, always verify against your print edition and the original text.
Public statistics (snapshot)
Character records
248
Relationship edges (GraphEdge, all)
555
Chapters
120
Ch. 1 — 120
Data last updated (UTC): 2026-05-05T15:50:01.474Z
Snapshot time (UTC): 2026-05-14T06:14:18.017Z
Editions and text
Full-text used for chapter bodies and mention counts is checked into the repo as `data/raw/hongloumeng.txt` (120-chapter Cheng–Gao-line imprint; front matter credits Cao Xueqin and Gao E). You can cross-check public editions such as Wikisource (Cheng Yi) or ctext.org. Profiles, tie types, and event chronology are editorial summaries from the text, not pasted from encyclopedia articles. Structured data is checked for consistency against the text and alias tables before it is loaded.
How we process characters and ties
Characters are keyed by distinguishable in-text names; aliases live in a separate table. Edges encode explicit cues from dialogue, narration, and commentary where applicable, with types such as blood, marriage, master–servant, and affection. Layouts (hierarchy / radial) are computed from edge types and configuration; coordinates may be precomputed server-side to keep the browser light.
Citation and reuse
When citing our counts or graphs, please name “hongloudata.com”, the access date, and the exact URL. For formal publication, reconcile with the edition you cite and the original wording.
Metric definitions
“Character count” is the number of Character rows for the work. “Relationship count” counts GraphEdge rows for the work (including generated reverse edges, so it may differ from a filtered subgraph in the UI). “Chapter coverage” is the min and max orderIndex in Chapter. “Snapshot time” is when this page computed the stats (UTC).
Provenance by feature
Summary of where each major feature’s data comes from, how it is processed, and its limits—aligned with the expandable blocks on feature pages. See Data & methodology for the full narrative and edition notes.
| Feature | Reliability | Source | Processing | Limits |
|---|---|---|---|---|
| Relationship graph | Curated | Graph edges and graph-set seeds (GraphEdge / GraphSet, etc.), entered from the novel. | Manual typing and subgraphs that filter edge sets. | Not every implicit tie is modeled; subgraph scope differs across views. |
| Appearance ranking | Generated | Full-text by chapter; character and alias tables; appearances aggregated for rankings. | Per-chapter string matching with alias merge (see Data & methods). | Excludes chapter titles, TOCs, and commentary; alias rules affect totals; CSV matches the list. |
| Poetry index | Text (zh) / generated (en) | Poetry and Chinese notes excerpted from a standard edition; English fields aid reading. | Human curation; English via AI assistance. | English is for reference only; cite the Chinese for scholarship. |
| Story timeline | Curated | Event, participation, and chapter-link seeds; editors align entries to chapters. | Manual chronology; in-story years are a narrative coordinate system. | Chronology is debated; ordering here is editorial, not the only reading. |
| Characters | Curated | Character records, bios, and aliases from seed data. | Human summaries from the novel. | Bios are introductory; details belong to the book. |