Data & methodology

This page documents how we build and validate structured data for characters, ties, and chapters. For scholarship, always verify against your print edition and the original text.

Public statistics (snapshot)

Character records

248

Relationship edges (GraphEdge, all)

555

Chapters

120

Ch. 1 — 120

Data last updated (UTC): 2026-05-05T15:50:01.474Z

Snapshot time (UTC): 2026-05-14T06:14:18.017Z

Editions and text

Full-text used for chapter bodies and mention counts is checked into the repo as `data/raw/hongloumeng.txt` (120-chapter Cheng–Gao-line imprint; front matter credits Cao Xueqin and Gao E). You can cross-check public editions such as Wikisource (Cheng Yi) or ctext.org. Profiles, tie types, and event chronology are editorial summaries from the text, not pasted from encyclopedia articles. Structured data is checked for consistency against the text and alias tables before it is loaded.

How we process characters and ties

Characters are keyed by distinguishable in-text names; aliases live in a separate table. Edges encode explicit cues from dialogue, narration, and commentary where applicable, with types such as blood, marriage, master–servant, and affection. Layouts (hierarchy / radial) are computed from edge types and configuration; coordinates may be precomputed server-side to keep the browser light.

Citation and reuse

When citing our counts or graphs, please name “hongloudata.com”, the access date, and the exact URL. For formal publication, reconcile with the edition you cite and the original wording.

Metric definitions

“Character count” is the number of Character rows for the work. “Relationship count” counts GraphEdge rows for the work (including generated reverse edges, so it may differ from a filtered subgraph in the UI). “Chapter coverage” is the min and max orderIndex in Chapter. “Snapshot time” is when this page computed the stats (UTC).

Provenance by feature

Summary of where each major feature’s data comes from, how it is processed, and its limits—aligned with the expandable blocks on feature pages. See Data & methodology for the full narrative and edition notes.

FeatureReliabilitySourceProcessingLimits
Relationship graphCuratedGraph edges and graph-set seeds (GraphEdge / GraphSet, etc.), entered from the novel.Manual typing and subgraphs that filter edge sets.Not every implicit tie is modeled; subgraph scope differs across views.
Appearance rankingGeneratedFull-text by chapter; character and alias tables; appearances aggregated for rankings.Per-chapter string matching with alias merge (see Data & methods).Excludes chapter titles, TOCs, and commentary; alias rules affect totals; CSV matches the list.
Poetry indexText (zh) / generated (en)Poetry and Chinese notes excerpted from a standard edition; English fields aid reading.Human curation; English via AI assistance.English is for reference only; cite the Chinese for scholarship.
Story timelineCuratedEvent, participation, and chapter-link seeds; editors align entries to chapters.Manual chronology; in-story years are a narrative coordinate system.Chronology is debated; ordering here is editorial, not the only reading.
CharactersCuratedCharacter records, bios, and aliases from seed data.Human summaries from the novel.Bios are introductory; details belong to the book.

Further reading