# Sol-37 Site Server

Sol-37 is a public archive presented as a Windows 95 style shell, but the shell is only the visible layer. The live system is a static tree under `/home/david/random/www` served by Caddy on `127.0.0.1:8888`, then augmented by local daemons for chat, voice, semantic retrieval, metrics, GUI share metadata, IRC/logbook transport, and media playlist generation.

This README is the canonical operator-oriented write-up for the site as it actually runs on this machine on 2026-04-17.

Archive reference:

- committed base tag: `sol37-audit-2026-04-13`
- current audit note: `/home/david/random/docs/sol37-site-audit.md`
- API reference: `/home/david/random/docs/sol-api-reference.md`

## What The Site Is

Sol-37 is not a framework app and not a CMS.

It is:

- a static public tree rooted at `/home/david/random/www`
- a large single-file shell app in `index.html`
- a set of embedded program pages such as `/chat`, `/sitemap.html`, `/programs/logbook.html`, and `/programs/media-player.html`
- a small local service spine behind same-origin routes

The design goal is local-first archival continuity. Pages can be plain files. Runtime-backed features are layered in where they add value, not because the entire site is built around a web framework.

## Public And Local Entry Points

Primary hosts:

- public host: `https://sol.system42.one`
- local origin: `http://127.0.0.1:8888`

Primary user-visible surfaces:

- `/`
  Main retro desktop shell
- `/gui`
  Direct shared-window shell entry
- `/chat`
  Dedicated Sol chat client
- `/dashboard`
  AI dashboard app with live camera, inline chat, and service console
- `/sitemap.html`
  Explorer-style archive browser
- `/site-metrics.html`
  Traffic monitor
- `/programs/logbook.html`
  Public IRC-backed logbook client
- `/programs/media-player.html`
  Media player
- `/api/knowledge/health`
  Read-only semantic memory health route

Important local services behind those routes:

- `127.0.0.1:8888`
  Caddy serving the site tree and reverse proxies
- `127.0.0.1:8890`
  public logbook API
- `127.0.0.1:8892`
  knowledge query API
- `127.0.0.1:8893`
  GUI share metadata responder
- `127.0.0.1:8895`
  Sol chat API
- `127.0.0.1:8896`
  local dashboard API
- `127.0.0.1:8897`
  dashboard access control daemon
- `127.0.0.1:18080`
  local model backend used by `sol-chat-api`

## Site Layers

### 1. Shell Layer

`/home/david/random/www/index.html` behaves like a minimal window manager:

- desktop icons
- Start menu
- taskbar buttons
- draggable/minimizable/maximizable windows
- iframe-backed document and program windows
- shareable `/gui?i=...` deep links
- a floating Sol orb / desktop assistant

The shell is intentionally large and centralized. That makes deployment simple and debugging direct, but it also means interaction logic, share handling, window management, and assistant behavior all live in one document.

### 2. Program Layer

Program surfaces keep their own interaction models:

- `/chat`
  dedicated public Sol console with orb, streaming replies, voice controls, and debug metrics
- `/dashboard`
  public dashboard app for live camera, inline dashboard chat, and service telemetry
- `/sitemap.html`
  archive browser driven by generated index state
- `/programs/logbook.html`
  web client over the IRC/logbook transport
- `/programs/media-player.html`
  player surface for video and audio assets
- `/programs/star-map.html`
  interactive star map
- `/programs/dosbox.html`
  browser DOSBox surface

### 3. Runtime Layer

Generated JSON, caches, and local daemons keep the site honest:

- `site-index.json`
  live file inventory for the Explorer surface
- `knowledge-index.json`
  markdown/knowledge manifest used by archive browsing
- `site-metrics.json`
  traffic and telemetry snapshot
- `audio/playlist.json`
  generated audio playlist manifest
- `video/playlist.json`
  generated video playlist manifest
- `~/.local/share/sol_chat_web/tts_cache`
  Sol speech cache
- `www/assets/share-previews/`
  pre-rendered share cards for `/gui` links

## HTTP Topology

The live route map is controlled by `/home/david/random/bin/Caddyfile.pkd_share`.

Key behavior:

- `/api/logbook/*` reverse proxies to `127.0.0.1:8890`
- `/api/knowledge/*` reverse proxies to `127.0.0.1:8892`
- `/api/chat` and `/api/chat/*` reverse proxy to `127.0.0.1:8895`
- `/api/dashboard/*` reverse proxy to `127.0.0.1:8896`
- `/api/dashboard-access/*` reverse proxy to `127.0.0.1:8897`
- `/chat` rewrites to `/chat/index.html`
- `/dashboard` rewrites to `/dashboard/index.html`
- `/chat` and `/chat/*` are marked `Cache-Control: no-store`
- `/dashboard`, `/dashboard/*`, and `/api/dashboard-access/*` are marked `Cache-Control: no-store`
- `/gui` reverse proxies to `127.0.0.1:8893`
- `/knowledge/*.md` and `/*.md` go through the markdown bridge on `127.0.0.1:8894`
- static media receives cache headers:
  - audio assets: `public, max-age=31536000, immutable`
  - image/video/static assets: `public, max-age=86400`

Why this matters:

- the dedicated chat UI and assistant clients must avoid stale JS/CSS builds
- the dashboard app must avoid stale JS/CSS and stale access-state responses
- public media should cache normally
- raw `/gui?i=...` links need metadata injection without changing their visible URL shape

### Dashboard Access Control

The dashboard is now published in two distinct modes:

- local and LAN clients always keep access
- public internet clients are gated by `dashboard_access_daemon.py` on `127.0.0.1:8897`

The daemon owns a small state file under `~/.local/state/sol37/dashboard-access.json` and exposes:

- `GET /api/dashboard-access/status`
- `POST /api/dashboard-access/enable`
- `POST /api/dashboard-access/disable`

Operational rule:

- when public access is disabled, `/dashboard` still exists as a shell app on the public desktop, but its live camera, service data, and inline dashboard chat stay locked for non-local clients
- desktop launch actions and service restarts remain local-only even when public dashboard access is enabled

Public Action schema endpoints:

- `https://sol.system42.one/knowledge-openapi.json`
  importable GPT Action schema for the read-only knowledge query API
- `https://sol.system42.one/chat-openapi.json`
  importable GPT Action schema for the combined Sol API surface:
  `queryChat`, `chatTurn`, `chatHealth`, `queryKnowledge`, `knowledgeHealth`
- `https://sol.system42.one/privacy.html`
  shared privacy policy URL for both public Action schemas

## Sol Assistant Architecture

There are two browser clients backed by the same chat stack:

- the floating desktop assistant inside `index.html`
- the dedicated `/chat` client in `www/chat/`

They share the same same-origin backend routes and the same broad behaviors:

- persistent session id in local storage
- same-origin chat API
- SSE text streaming where streaming is the right UX
- optional voice playback
- retrieval grounding through the knowledge API
- assistant history/reset/health routes
- the same local backend daemon: `sol_chat_api.py` on `127.0.0.1:8895`
- the same reasoning service behind it on `127.0.0.1:18080`

### Chat Request Flow

Normal assistant/chat request path:

1. Browser sends `POST /api/chat` with:
   - `message`
   - `session`
   - `stream`
   - optional `page_context`
2. Caddy proxies the request to `sol_chat_api.py` on `127.0.0.1:8895`.
3. `sol_chat_api.py`:
   - appends the user message to session history
   - queries the knowledge API for retrieval context
   - augments top retrieval hits with current source-file contents when possible
   - adds compact live site-state context from `site-metrics.json` for non-page-bound text turns
   - builds a grounded prompt
   - calls the local model backend
4. The API returns:
   - SSE events for streaming turns:
     - `status`
     - `delta`
     - `done`
     - `error`
   - JSON for non-stream turns
5. Browser clients update the visible transcript either incrementally or from the final JSON payload.

Important detail:

- `/chat` was already using SSE deltas
- the floating desktop assistant now prefers non-stream JSON for grounded page-summary style turns
- the floating desktop assistant still uses streaming for non-page/freeform turns
- the floating desktop assistant retries without stream if a streamed turn fails or yields no usable final reply
- direct `/api/chat/query` turns now have a richer grounding bundle too:
  - embedding hit snippets
  - current file contents for the top matching sources when available
  - live site/sensor metrics for greetings and diagnostics
  - probe-style connectivity tests can use deterministic live-site-state fallback so GPT Action test calls stay short and do not overflow the local context window

### Assistant Page Context

The desktop assistant can be grounded in the page currently open inside the shell.

Two related but distinct flows exist:

1. Chat context
   - bounded title/headings/content summary
   - intentionally smaller than the read-aloud path
   - used to keep prompts small and suggestions relevant
2. Read-aloud context
   - separate full-text extraction pass
   - used when the orb is asked to read the current page aloud

That distinction matters because concise page context is good for chat latency, but it is wrong for narration.

The desktop assistant now fingerprints page context from the target, title, content type, headings, and bounded content. That browser-side fingerprint is used to cache:

- grounded page replies by `prompt + page fingerprint`
- generated narration text by `page fingerprint`

As a result, repeated summary questions and repeated read-aloud requests reuse the earlier work until the page content changes.

### Voice Request Flow

The current low-latency voice path is:

1. Browser chooses a speakable chunk
2. Browser points the `<audio>` element at `GET /api/chat/speak?text=...&session=...`
3. Caddy proxies to `sol_chat_api.py`
4. `sol_chat_api.py` either:
   - streams cached MP3 bytes from disk, or
   - invokes `11speak.py` and streams MP3 bytes as they arrive
5. Browser starts playback as the `audio/mpeg` response is still arriving

This is intentionally different from the older behavior, which waited for a full MP3 blob before playback.

### Voice Runtime Details

Files:

- `/home/david/random/bin/sol_chat_api.py`
- `/home/david/random/bin/11speak.py`

Key behavior:

- speech cache lives under `~/.local/share/sol_chat_web/tts_cache`
- cached hits are reused by content hash
- uncached requests stream through the ElevenLabs-backed `11speak.py` runtime
- long narration is chunked in the browser into smaller utterances
- streaming chat replies can start speaking from the first complete sentence rather than waiting for the final answer
- if the browser reuses cached narration text for an unchanged page, the same TTS cache entry is typically reused too

Current browser behavior:

- `/chat` starts speaking from streamed sentence chunks while text is still arriving
- the desktop orb now does the same
- explicit re-read operations are also chunked and sequenced
- the desktop orb now has an explicit playback transport button:
  - `Pause` while speaking
  - `Resume` when paused with pending playback
  - `Play` otherwise
- closing the assistant stops playback and clears pending continuation so hidden playback does not restart unexpectedly

Practical limitation:

- the browser still queues discrete speech chunks rather than receiving one endless audio stream tied to token deltas
- that is a deliberate compromise for reliability and browser compatibility

## GUI Share Links

Files:

- `/home/david/random/bin/sol37_gui_share_server.py`
- `/home/david/random/www/assets/share-previews/`

Behavior:

- raw `/gui?i=...` links remain the canonical share shape
- metadata is injected server-side by the GUI share responder
- preview cards point at pre-rendered 1200×630 screenshots
- shared GUI links hide the floating assistant by default unless `assistant=1` is present
- the assistant now remembers whether it was manually closed on the desktop via local storage
- the assistant quick-prompt strip now keeps the first suggestion anchored and rerolls the second and third suggestions from a broader prompt pool each time the popup opens

## Generated State And Watchers

### Site Index

Files:

- `/home/david/random/bin/site_index_snapshot.py`
- `/home/david/random/www/site-index.json`
- `/home/david/random/www/knowledge-index.json`

Behavior:

- watches the public tree
- refreshes the site map catalog
- refreshes the knowledge/blog manifest

### Traffic Metrics

Files:

- `/home/david/random/bin/site_metrics_snapshot.py`
- `/home/david/random/www/site-metrics.json`
- `/home/david/random/www/site-metrics.html`

Behavior:

- reads `/tmp/pkd_caddy_access.log`
- distinguishes raw host traffic from visible archive hits
- excludes internal polling noise from visible counters
- mirrors selected Home Assistant and local system telemetry

### Media Watchers

Files:

- `/home/david/random/bin/video_playlist_watch.py`
- `/home/david/random/bin/audio_playlist_watch.py`

Behavior:

- rebuild playlist manifests
- generate browser-friendlier derivatives when needed
- keep the media player decoupled from manual playlist editing

## Service Units

Relevant user services:

- `caddy-sol37.service`
- `sol-chat-api.service`
- `sol-chat-model.service`
- `local-dashboard-api.service`
- `dashboard-access-daemon.service`
- `knowledge-query-api.service`
- `sol37-gui-share.service`
- `site-index-watch.service`
- `video-playlist-watch.service`
- `audio-playlist-watch.service`
- `public-logbook-api.service`
- `public-logbook-irc-logger.service`

Relevant timers:

- `site-metrics-snapshot.timer`
- `synthetic-logbook-snapshot.timer`

Critical system service:

- `ngircd.service`

Boot behavior:

- user linger is enabled for `david`
- this is what makes the site behave like a server after reboot instead of waiting for an interactive login

## Content And Post Authoring

Posts in `www/posts/` are a mix of:

- Markdown-backed entries
- hand-authored HTML entries
- LaTeX-backed paper posts with local `.tex` and cached local `.pdf`

Current document pattern for thesis-style posts:

- HTML page for shell/browser reading
- local `.tex` source published beside it
- cached static `.pdf` published beside it
- optional local figure assets

That keeps downloads fast and cacheable while preserving the source artifact.

## Verification Commands

Useful operational checks:

```bash
git -C /home/david/random status --short
curl -I -s http://127.0.0.1:8888/
curl -s http://127.0.0.1:8895/api/chat/health
curl -s http://127.0.0.1:8896/api/dashboard/status | jq '.access,.camera'
curl -s http://127.0.0.1:8897/api/dashboard-access/status | jq
curl -s http://127.0.0.1:8892/health | jq
curl -s 'http://127.0.0.1:8890/messages?channel=public-logbook&limit=3'
systemctl --user --type=service --state=running | rg -i 'caddy|sol-chat|dashboard|knowledge|logbook|site|video|audio|sol37'
systemctl --user list-timers --all | rg 'site-metrics|synthetic'
loginctl show-user david -p Linger
ss -ltnp | rg '(:8888|:8890|:8892|:8893|:8895|:8896|:8897|:18080)'
python3 /home/david/random/bin/check_sol_chat_asset_versioning.py
```

Useful direct route checks:

```bash
curl -s http://127.0.0.1:8895/api/chat/health
curl -sD - -o /tmp/test.mp3 'http://127.0.0.1:8895/api/chat/speak?text=hello&session=test'
file /tmp/test.mp3
curl -I -s http://127.0.0.1:8888/chat
curl -I -s http://127.0.0.1:8888/dashboard
curl -I -s 'http://127.0.0.1:8888/gui?i=posts/the_place_without_where.html'
```

## Operational Risks

The site works well when treated as a hybrid system. It becomes fragile when described as “just static files.”

Important risks:

- `index.html` remains large and central
- runtime truth is split across git, local files, and long-running services
- media playback is still the least trustworthy subsystem because large assets and tunnel/range behavior amplify each other
- `/chat` freshness depends on keeping `no-store` semantics and versioned assets intact
- speech latency depends on both model/retrieval speed and the external TTS provider path

## Documentation Set

The current documentation set is:

- `/home/david/random/README.md`
  repo root orientation
- `/home/david/random/www/README.md`
  canonical operator-focused site/server write-up
- `/home/david/random/www/site-server.html`
  browser-readable system manual
- `/home/david/random/docs/sol37-site-audit.md`
  committed audit summary of the live stack

If these drift apart, treat `www/README.md` as the canonical in-repo reference and update the browser manual plus audit note to match it.