Sol Chat Web
This document covers the production web chat exposed at https://sol.system42.one/chat, the same-origin backend route at /api/chat, and the replacement of the legacy www/sol-chat.html page.
Recent Changes
Last 72 hours, condensed:
- the live reasoning backend was switched to the local
DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguflane on127.0.0.1:18080 - the website now exposes a direct browser-openable and GPT Action-importable chat route through:
GET /api/chat/queryGET /chat-openapi.json- both assistant surfaces now share the same backend service:
- the dedicated
/chatclient - the floating desktop assistant in
www/index.html - the floating desktop assistant was hardened after page-summary failures:
- streamed turns now fall back to non-stream JSON if the stream faults or yields no usable text
- the backend now applies the non-empty-answer retry path to streamed generations too
- the floating desktop assistant now caches page-grounded replies by prompt plus page fingerprint, and caches generated read-aloud narration text by page fingerprint
- desktop page-grounded turns now default to one-shot JSON instead of SSE to avoid stream startup overhead on summary-style prompts
- the desktop assistant now exposes a real play/pause/resume transport and stops queued playback when the popup is closed
- the desktop assistant prompt strip now keeps the first suggestion anchored while rerolling the second and third suggestions from a broader context-aware pool on each popup open
/chatnow sends a realpage_contextpayload built from the visible interface, including stack cards, debug metrics, diagnostics, and recent transcript context- debug-mode retrieval metadata now persists across reloads via session history instead of disappearing after the original streamed turn
- direct query answers are model-generated on cache miss and reused from a file-backed cache on identical repeats
- direct query grounding is richer than before:
- top retrieval snippets
- current source-file contents for the top hit files when available
- live site-state and sensor context from
site-metrics.json - this was added specifically to improve low-signal turns such as greetings and diagnostics, so prompts like
hello?can answer from local context instead of generic assistant chatter - creative prompts still keep the same retrieval/source-file bundle in play; that context is meant to act as seed material rather than being bypassed
Architecture
- UI:
- canonical client:
/home/david/random/www/chat/index.html - supporting assets:
/home/david/random/www/chat/chat.css,/home/david/random/www/chat/chat.js - desktop shell assistant:
/home/david/random/www/index.html - deprecated entry:
/home/david/random/www/sol-chat.htmlnow redirects to/chat - public face now mirrors the desktop assistant language from
www/index.html: retro dialog framing, Sol orb presence, voice controls, and on-page debug/metrics blocks - origin routing:
- Caddy serves
/chatby rewriting to/chat/index.html - Caddy now sends
Cache-Control: no-storefor/chatand/chat/so mobile clients do not sit on stale CSS/JS after a page refresh - Caddy reverse-proxies
/api/chatto127.0.0.1:8895 - backend:
- daemon:
/home/david/random/bin/sol_chat_api.py - runtime style: stdlib
ThreadingHTTPServer, matching the small local daemons already used for knowledge/logbook/gui metadata - persistence:
- default root:
/home/david/.local/share/sol_chat_web - per-session JSON files under
sessions/ - each session stores
created_at,updated_at,metadata, andmessages[] - adjacent local pipeline docs:
/home/david/random/docs/sol-multimodal-pipeline.mdcovers the direct webcam/bootstrap scripts, local model lanes, and current bring-up status outside the web-only surface
Request Surface
POST /api/chat- request body:
messagesessionstream- optional
images[]with browser-supplied data URLs for multimodal turns - optional
profileto prefervisionorvision_fastwhen images are present - response:
- JSON when streaming is disabled
text/event-streamwhen streaming is enabledGET /api/chat/query?query=...- browser-openable direct JSON chat route
- stateless by default:
- if
sessionis omitted andpersistis not set, the turn is processed without being written into history - message generation behavior:
- this route forces model generation for
messageon normal cache misses instead of using the extractive fallback path - probe-style connectivity and diagnostic requests are allowed to use the deterministic live-site-state fallback so they stay concise and avoid overflowing the local
4096token context window - repeated identical direct-query requests reuse a file-backed cache
- response metadata includes
cache_hit - useful query params:
queryormessagesessionpersist=1profilepage_titlepage_targetpage_content_type- repeated
page_heading - repeated
page_question page_content- or
page_contextas a JSON-encoded object GET /api/chat/history?session=...- returns the stored session history
POST /api/chat/reset- clears the session back to the system prompt
GET /api/chat/health- lightweight readiness/config surface without exposing backend topology details
GET /chat-openapi.json- public OpenAPI schema for importing the combined Sol Action surface as a GPT Action
- import URL:
https://sol.system42.one/chat-openapi.json- privacy policy URL:
https://sol.system42.one/privacy.html
Frontend Behavior
Both website assistant surfaces now use the same same-origin backend service, sol_chat_api.py, through /api/chat.
The dedicated /chat page is not just a thin wrapper over the API anymore. It now carries the same Sol presence language as the desktop shell while keeping a public-facing layout:
- persisted
sol_sessioninlocalStorage - same-origin history load, send, and reset flows
- image attachment flow with local preview cards before send
/chatnow builds a local page-context payload for text turns from the visible UI itself:- stack summary
- stack cards
- debug metrics
- client diagnostics
- recent transcript tail
- Sol orb presence using
www/assets/hue-visualizer.js - voice controls:
- voice arm/disarm
- speak latest assistant message
- stop playback
- automatic speech pre-cache for the latest visible reply
- site-wide debug metrics block sourced from
site-metrics.json - per-message retrieval diagnostics when debug mode is enabled
The page is intentionally same-origin and local-first. The browser never talks directly to the model backend or any LAN-only llama endpoint. Image attachments are serialized in-browser and posted only to the same-origin Sol API.
The floating desktop assistant in www/index.html uses the same route family:
POST /api/chatGET /api/chat/historyGET /api/chat/speak
This keeps the public /chat surface and the shell assistant on the same local reasoning backend, retrieval path, speech cache, and session persistence format.
The clients do not use the transport identically anymore:
/chatremains SSE-first for visible incremental transcript updates- the floating desktop assistant now prefers non-stream JSON when a grounded
page_contextis present, because page-summary turns benefit more from lower startup overhead than from token-by-token UI updates - the floating desktop assistant still uses SSE for non-page/freeform turns and retries without stream if the streamed path fails or yields an empty final message
Context Assembly
The site chat now treats context as a ranked bundle, not a single blob:
- system prompt:
/home/david/random/prompts/sol_chat_system_prompt.txt- tells Sol to treat page context, retrieval, and history as evidence to synthesize from rather than text to parrot
- page context:
- for
/chat, built directly from the visible interface state - includes the page title, stack summary, debug metrics, diagnostics, and a bounded recent transcript
- for the floating desktop assistant, page context is fingerprinted from target, title, content type, headings, and bounded content
- that fingerprint now drives browser-side reuse of:
- grounded reply text for repeated page questions
- generated narration text for repeated read-aloud requests
- retrieval:
- still queried from the knowledge API for text turns
- used as supporting evidence when the question is broader than the current page
- the chat backend can now also synthesize one supplemental knowledge query for the same turn and call
/api/knowledge/queryagain with that generated query - the supplemental lookup is additive; it does not replace the default retrieval query
- the response surface exposes this as
retrieval.supplemental - the backend now also loads the current source file for top retrieval hits when possible, so the model sees both:
- the quoted embedding hit
- the current file contents behind that hit
- for HTML sources, the backend now renders readable visible text from the current file before matching snippets or building excerpts
- retrieval metadata records this with
current_file_representation=rendered_html_text - if an indexed snippet reflects an older snapshot and the current file differs, the model is told that explicitly
- if an indexed snippet is still present in the current readable file text,
snippet_found_in_current_fileis set totrue - live site state:
- for text turns, the backend now injects a live block from
site-metrics.jsonas additional context rather than reserving it for diagnostics only - this includes current traffic summary, top paths, recent requests, sensor readouts, and runtime service status when available
- greetings and diagnostic pings use this block instead of defaulting to generic assistant chatter
- concise diagnostic pings now prioritize this live site-state path and skip archive retrieval so they stay under the local
4096token context ceiling - direct
queryChatconnectivity probes also use this path now, so GPT Action test invocations can return a short grounded confirmation instead of a long model answer or a 502 from context overflow - history:
- still persisted, but the live request budget now favors current page context and grounding evidence over older transcript bulk
Important runtime constraint:
- the live reasoning lane is currently running on a
4096context window because that is the hardware-safe configuration for the localDeepSeek-R1-Distill-Qwen-7Bservice on this GPU - because of that, the backend now spends less of the request budget on stale history and more on active page context plus retrieval hits
Backend Routing
The API no longer assumes a single always-on model lane:
- text-only turns default to the reasoning backend
- image turns route to the vision backend, with
vision_fastavailable as the smaller fallback lane - health now exposes stack inventory plus backend runtime state (
running,model_id,base_url) - vision backends are launched on demand by
sol_chat_api.pyvia localllama-serverand are reaped after an idle timeout
Incomplete downloads are not treated as installed models. The shared stack registry checks minimum file sizes so partially-downloaded GGUF files do not get advertised as ready.
Grounding And Retrieval
Each user turn triggers a retrieval request against the public knowledge API:
- default source:
https://sol.system42.one/api/knowledge/query - default
top_k:3
The backend formats retrieval into transient system context, then adds a grounding contract for the current turn. The model is told to:
- prioritize retrieved evidence for factual claims
- state uncertainty when the evidence is weak
- avoid inventing names, biographies, citations, and titles
Strict grounding mode is enabled by default. In strict mode, the backend falls back to an extractive answer when:
- retrieval quality is weak
- the query is clearly profile/explanation style and retrieval should dominate
- the model output introduces named entities not present in the user query or retrieved evidence
If retrieval fails outright, the turn continues without retrieval and the failure is logged instead of crashing the request path.
When page context is present and the question is clearly about the current page, stack, session, or debug metrics, retrieval is skipped and the answer is synthesized directly from page context instead.
Direct-query grounding details:
/api/chat/querystill forces a model-generatedmessageon cache miss for normal queries- for probe-style requests about diagnostics, connectivity, or minimal status confirmation, the route now permits deterministic live-site-state fallback instead
- the retrieval/debug payload for each turn can now include:
source_documents- current file contents or bounded excerpts for the top retrieved files
- for HTML files, these contents are visible rendered text rather than raw markup
- each item may include
current_file_representationandsnippet_found_in_current_file live_site_state- traffic, sensor, recent-request, and runtime-service telemetry from
site-metrics.jsonand the local dashboard stack - this was added specifically so low-signal prompts like
hello?and diagnostic pings stop collapsing into a single embedding chunk and instead answer from: - archive hit text
- current source file state
- current site metrics
- metrics are now additive context for ordinary text turns too; they are not stripped out just because retrieval or page context is present
- creative/story prompts are not excluded from this path; they can still pick up archive material and current file text as seed context
Config
Environment variables:
SOL_CHAT_HOSTSOL_CHAT_PORTSOL_CHAT_BACKEND_BASE_URLSOL_CHAT_MODELSOL_CHAT_VISION_BACKEND_BASE_URLSOL_CHAT_VISION_FAST_BACKEND_BASE_URLSOL_CHAT_TIMEOUTSOL_CHAT_STREAMSOL_CHAT_HISTORY_DIRSOL_CHAT_SYSTEM_PROMPT_FILESOL_CHAT_ENABLE_STRICT_GROUNDINGSOL_CHAT_HISTORY_WINDOW_CHARSSOL_CHAT_MAX_HISTORY_MESSAGESSOL_CHAT_KNOWLEDGE_URLSOL_CHAT_KNOWLEDGE_TOP_KSOL_CHAT_KNOWLEDGE_TIMEOUTSOL_CHAT_TEMPERATURESOL_CHAT_TOP_PSOL_CHAT_MAX_TOKENSSOL_CHAT_VISION_IDLE_TIMEOUT
Editable prompt file:
/home/david/random/prompts/sol_chat_system_prompt.txt
Default backend assumption:
- the reasoning lane is commonly kept warm on
SOL_CHAT_BACKEND_BASE_URL - the web API can also start local vision backends itself when image turns arrive
Voice/TTS behavior:
/api/chat/speakis served by the same backend daemon- synthesized audio is cached on disk under
/home/david/.local/share/sol_chat_web/tts_cache - repeated identical speech requests reuse cached MP3 output across sessions and refreshes
- because the floating desktop assistant now reuses cached narration text for unchanged pages, repeated page read-aloud runs also tend to hit the same server-side MP3 cache entry
Desktop assistant playback behavior:
- the floating assistant now has an explicit transport button whose label tracks state:
Pausewhile audio is playingResumewhen paused with buffered playback availablePlaywhen no buffered playback is active- closing the popup performs a hard playback stop and clears pending continuation so hidden playback does not resume later
- quick prompt suggestions are now partially dynamic:
- first suggestion remains anchored
- second and third are regenerated from the current page or fallback prompt pool each time the popup is reopened
Asset freshness behavior:
www/chat/index.htmlnow references versioned asset URLs forchat.css,chat.js, andhue-visualizer.js- Caddy also marks
/chatand/chat/asno-store - this combination was added after mobile clients kept reusing stale public JS/CSS while
/chatHTML itself had already updated
Deployment
- Start the backend service:
bash
python3 /home/david/random/bin/sol_chat_api.py
- Ensure Caddy is using
/home/david/random/bin/Caddyfile.pkd_share, which now includes:
/chatrewrite to/chat/index.htmlCache-Control: no-storeon/chatand/chat//api/chatreverse proxy to127.0.0.1:8895
- For persistent boot behavior, add a user service similar to the existing site daemons:
ini
[Unit]
Description=Sol chat web API
After=network-online.target
[Service]
ExecStart=/usr/bin/python3 /home/david/random/bin/sol_chat_api.py
Restart=always
RestartSec=2
[Install]
WantedBy=default.target
Suggested unit path:
/home/david/.config/systemd/user/sol-chat-api.service- the model backend is commonly paired with
/home/david/.config/systemd/user/sol-chat-model.service
After installing or changing the unit:
bash
systemctl --user daemon-reload
systemctl --user enable --now sol-chat-api.service
systemctl --user status --no-pager sol-chat-api.service
systemctl --user status --no-pager sol-chat-model.service
Logging And Verification
The backend logs structured JSON events to stdout/journal with:
- request start/completion
- session id
- retrieval success or failure
- model latency
- fallback-grounding usage
Debug persistence:
- per-assistant-turn retrieval metadata is now stored in session metadata and returned by
GET /api/chat/history - the
/chatfrontend restores those stored traces when debug mode is enabled after a reload - this fixes the earlier behavior where debug detail existed only on the live DOM node during the original response stream
Prompt/response checks used during tuning:
text
Prompt: "What page is open?"
Result: page-context answer from /chat UI state, retrieval skipped, stack summary + metrics included.
Prompt: "Which local models are active right now?"
Result: page-context answer naming Qwen3-VL, Gemma small, and DeepSeek, with the active reasoning backend file.
Prompt: "What do the debug metrics show?"
Result: page-context answer summarizing visible metrics, transport state, grounding state, backend profile, and readouts.
Direct browser-query examples:
text
/api/chat/query?query=what%20is%20Sol%3F
text
/api/chat/query?query=What%20page%20is%20open%3F&page_title=Sol%20Chat&page_target=%2Fchat&page_content_type=chat_ui&page_heading=Sol%20%2F%20Chat&page_content=status%3A%20idle...
GPT Action import details:
text
Schema URL: https://sol.system42.one/chat-openapi.json
Privacy URL: https://sol.system42.one/privacy.html
Available actions: queryChat, chatHealth, queryKnowledge, knowledgeHealth
Cache behavior check used during tuning:
text
1. GET /api/chat/query?query=what%20is%20Sol%20really%3F
-> model-generated answer, cache_hit: false
2. repeat same URL
-> same answer returned from cache, cache_hit: true
Regression/contract check:
bash
python3 /home/david/random/bin/check_sol_chat_api_contract.py
python3 /home/david/random/bin/check_sol_chat_asset_versioning.py
python3 /home/david/random/bin/check_sol_chat_tts_cache.py
That script starts fake knowledge and model backends, boots sol_chat_api.py against them, and verifies:
POST /api/chat- streaming SSE output
- history persistence
- reset behavior
- same-origin route shape compatibility
The asset versioning check verifies that /chat references cache-busted JS/CSS URLs. The TTS check verifies that speech caching is still active and writable.
Legacy Replacement
The old www/sol-chat.html page was retired for three reasons:
- it depended on jQuery for a trivial interaction
- it hardcoded a LAN target instead of using same-origin routing
- it framed Sol as a placeholder journaling companion rather than a production chat surface
The file now exists only as a redirect to /chat, so old links still land on the current interface without preserving the old copy or behavior.