ADR-JVB-001: JVB Parallel Viewer v1 As-Built¶
- Status: Accepted (as-built)
- Date: 2025-12-12
- Project:
jvb-parallel-viewer(JVB / BJV parallel viewer) - Related ADRs: ADR-003 (bundle as-built), ADR-006 (viewer <-> bundle streamlining)
Context¶
The JVB viewer is a lightweight, local web UI for exploring a bilingual journal (VI scan + VI OCR + EN translation) with:
- A scan/image pane that shows the original page image.
- A left text pane (EN) rendered from an English section AST → HTML.
- A right text pane (VI) rendered from Vietnamese OCR text → HTML.
- A section pane for navigation by logical sections.
- Page synchronization driven by HTML anchors of the form
id="p-<page>".
This v1 is the first prototype that aligns the viewer and server around a normalized bundle contract produced by json_builder3.py.
Decision¶
For v1, we standardized the viewer/server/bundle interface around:
- Section IDs:
sid(e.g.,s:<doc_id>:<NN>) - Page span IDs:
pid(e.g.,p:<doc_id>:<page>) - Top-level bundle keys:
sections— section index (catalog metadata + per-page spans)sections_body— section bodies (EN/VI full text + EN AST + pagination)route_index— routing maps between pages and sections- Page anchors: the HTML renderers emit
<div id="p-<N>" class="anchor pagebreak"></div>so the viewer can synchronize panes and the scan image to the correct actual page.
Drift-handling heuristics (e.g., swapping aid/sid in the viewer) were intentionally removed in favor of a single, explicit contract.
As-Built Architecture¶
Components¶
- Bundle Builder:
json_builder3.py - Produces a single JSON bundle per document
-
Defines the contract used by server + viewer
-
Server:
app.py - Loads bundle(s) into memory (
STATE) -
Exposes a small API for bundle, page routing, and per-section HTML
-
Renderers:
render.py render_en_section(en_ast)renders EN AST → HTML-
render_vi_section(vi_text, target_pages=..., show_headers=...)renders VI text → HTML with anchors -
Client:
templates/doc.html - Fetches
/api/doc/{doc_id}/bundleonce - Builds UI navigation from
bundle.sections - Fetches per-section EN/VI HTML
- Requests per-page scan URL and mapping via
/api/doc/{doc_id}/page/{page}
Data Model (Viewer-Facing)¶
This ADR does not duplicate ADR-003’s full bundle schema; it records the subset consumed by v1:
bundle.sections[*].sidbundle.sections[*].title_en,title_vi,author,summary,keywordsbundle.sections[*].spans[*].page,pid,en,vi,align?bundle.sections_body[*].sidbundle.sections_body[*].en_astbundle.sections_body[*].vi(pages separated by\f)bundle.sections_body[*].pagination.target_pagesbundle.route_index.section_to_pages[sid] -> [page...]bundle.route_index.page_to_section[str(page)] -> sidbundle.mapping.offset,bundle.mapping.overridesbundle.pages_count
API Surface (v1)¶
Bundle¶
GET /api/doc/{doc_id}/bundle
Returns a normalized payload (server may internally fall back to legacy keys but should emit normalized keys):
doc_id,titlepages_countmapping:{ offset, overrides }route_index:{ section_to_pages, page_to_section }sections:[SectionIndex]sections_body:[SectionBody]
Page routing + scan image¶
GET /api/doc/{doc_id}/page/{page}
Returns:
page(actual)logical(logical page number if present, via mapping)section_sid(section owning this actual page)src(image URL for the scan pane)
Section renders¶
GET /api/doc/{doc_id}/en/{sid} → EN HTML
GET /api/doc/{doc_id}/vi/{sid} → VI HTML
The server resolves {sid} against sections_body[*].sid and renders HTML through render.py.
Search (prototype)¶
GET /api/doc/{doc_id}/search?q=...
Searches EN AST text (and titles) and returns hits keyed by sid with a snippet.
Rendering and Sync Rules¶
EN page anchors¶
- EN anchors are derived from
PageBreaknodes in the EN AST: - Each
PageBreak{page:N}produces:<div id="p-N" class="anchor pagebreak"></div>
VI page anchors¶
- VI text in
sections_body.viis split on\f(form-feed). render_vi_section(..., target_pages=[...])prepends an anchor for each chunk:<div id="p-<actual>" class="anchor pagebreak"></div>- If
target_pagesis not present, anchors fall back to sequential indices (1..N).
Viewer synchronization¶
- The viewer listens for scroll/visibility changes on
.anchor.pagebreakelements. - When the active anchor changes to
p-N, the scan pane requests/updates page N (or applies mapping offsets/overrides).
Implementation Notes (as-built)¶
Naming normalization¶
- The stack moved from ambiguous
aidusage to: sidfor sectionspidfor page spansroute_indexfor routing (replacingaid_index)
Backward compatibility (server-side only)¶
app.py may temporarily accept legacy bundle fields:
aid_indexas a fallback ifroute_indexis absentsection_blocksas a fallback ifsections_bodyis absent- legacy section-body keys where
sidwas serialized underaid
The viewer (doc.html) is v1-normalized and should not carry drift heuristics.
Testing (v1)¶
Manual verification checklist:
GET /api/doc/{doc_id}/bundleincludessections,sections_body,route_index.- Section dropdown and sidebar populate from
sections. - Selecting a section triggers:
GET /en/{sid}andGET /vi/{sid}returning non-empty HTML.- EN HTML includes anchors:
id="p-<N>". - VI HTML includes anchors for each page chunk.
- Page bar requests:
GET /page/{n}returnssection_sidandsrc.- Scan pane shows correct image for the active anchor page.
- No console errors; errors are rendered in-page (not silently swallowed).
Consequences¶
Benefits¶
- The viewer is thinner and more reliable: one contract, no guessing.
- Page synchronization is consistent across EN/VI via shared anchor ids.
- The server becomes the single place to implement compatibility shims.
Tradeoffs¶
- Contract changes require coordinated updates across builder/server/viewer.
- The current prototype does not yet formalize schema versioning.
Follow-ups¶
- Add a lightweight “bundle adapter” in
app.pythat validates and normalizes bundle shape at load time (and logs all deltas). - Consider adding an explicit schema version once external consumers exist.
- Populate
sections_body.pagination.breaksfrom EN AST to support richer UI markers. - Add stronger on-screen error reporting for missing sections_body entries or missing pagination targets.