Sol project summary for the last seventy two hours.

First, the local model stack was reorganized into explicit lanes instead of a single opaque backend. The live reasoning lane is now DeepSeek R1 Distill 7B on the local llama.cpp Vulkan runtime. The fast vision lane is Gemma 3 4B with its projector, and the primary vision lane is Qwen 3 VL 8B with its projector files staged in the local model registry.

Second, the website assistant architecture was consolidated. The dedicated /chat client and the floating desktop assistant in the main site shell now use the same same-origin backend service through /api/chat, the same local reasoning backend, the same retrieval path, and the same speech cache behavior.

Third, the public chat surface was expanded. The site now exposes a direct browser-openable JSON chat route at /api/chat/query, and a GPT Action importable schema at /chat-openapi.json. That combined schema now exposes query chat, chat health, query knowledge, and knowledge health in one Action import. The shared privacy policy was updated so both the knowledge action and the chat action describe the current public API behavior correctly.

Fourth, the chat frontend and backend context handling were substantially improved. The /chat interface now sends structured page context that includes stack cards, debug metrics, diagnostics, and recent transcript context. Retrieval debug metadata now persists across reloads. The direct chat route now returns model-generated answers on cache miss and reuses a file-backed cache on identical repeated queries.

Fifth, grounding for direct text queries is richer than before. The backend now combines ranked embedding hits with current source-file contents for the top matching files when available, and with compact live site-state from site-metrics dot json. That means the model can distinguish archived snippets from the current file on disk, and it can answer greetings and diagnostics using current traffic and sensor context instead of generic assistant chatter.

Sixth, the low-signal greeting and diagnostic behavior was corrected. A greeting like hello now grounds against the archived hello friend motif when it is actually present in retrieval evidence, while also referencing current local metrics. Diagnostic requests now return system-status style answers instead of drifting into unrelated archive narratives.

Seventh, the floating desktop assistant inside the main shell was materially hardened. Page-summary turns now default to one-shot JSON instead of incurring SSE startup cost, streamed turns retry without stream if they fail or come back empty, and the streamed backend path now applies the same non-empty-answer retry guard as the non-stream route.

Eighth, the desktop assistant now reuses work aggressively. It fingerprints page context from the current target, title, content type, headings, and bounded content. Grounded page replies are cached in the browser by prompt plus page fingerprint, generated narration text is cached by page fingerprint, and the existing server-side TTS cache continues reusing MP3 output by exact spoken text. In practice, repeated summaries and repeated read-aloud runs now avoid unnecessary regeneration unless the underlying page content changes.

Ninth, desktop playback controls were cleaned up. The assistant now has a real transport button with state-aware Play, Pause, and Resume labeling. Closing the popup performs a hard playback stop, clears queued continuation, and prevents the old behavior where playback could appear to resume after the assistant had visibly exited.

Tenth, the assistant popup prompt strip is no longer effectively static. The first prompt remains anchored, but the second and third prompts are now regenerated from a broader page-aware pool each time the popup opens, which keeps the quick actions more relevant and less repetitive.

Eleventh, the local multimodal operator path was documented and scaffolded. The webcam utility from earlier was preserved as the webcam-feed command, with live preview, info output, and snapshots. The bootstrap scripts for run-vision, run-reasoning, run-pipeline, and loop-vision were documented as the direct local path for webcam to vision to reasoning workflows outside the web UI.

Finally, the documentation set was brought into line with the live stack again. The repo README, the Sol chat web documentation, the site server README, the committed site audit, and the browser-facing server manual now reflect the mixed desktop assistant transport, the page and narration cache behavior, the explicit playback controls, the dynamic prompt suggestions, and the current local DeepSeek reasoning service.