ss-tools/specs/017-llm-analysis-plugin/research.md

# Research: LLM Analysis & Documentation Plugins

**Feature**: `017-llm-analysis-plugin`

## 1. LLM Provider Integration

**Decision**: Use a unified `LLMProviderService` that abstracts OpenAI-compatible APIs.
**Rationale**: OpenRouter, Kilo, and OpenAI all support the standard OpenAI API format. This simplifies implementation by using the `openai` Python SDK and changing the `base_url` and `api_key` dynamically based on configuration.
**Alternatives Considered**: LangChain (too heavy/complex for this specific scope), custom HTTP requests (reinventing the wheel).

## 2. Dashboard Screenshot Capture

**Decision**: Implement a `ScreenshotService` with a strategy pattern supporting `Playwright` (primary) and `Superset API` (fallback).
**Rationale**:
- **Playwright**: Provides the most accurate "user-view" render, handling JS-heavy charts that API thumbnails might miss or render poorly. Requires a browser binary.
- **Superset API**: Faster, lightweight, but relies on Superset's internal thumbnail cache which can be stale.
**Implementation Detail**: The service will check configuration. If 'Headless' is selected, it launches a Playwright context, logs in (using a service account or session cookie), navigates to the dashboard, waits for network idle, and captures.

## 3. Multimodal Analysis Prompting

**Decision**: Use a structured prompt template that accepts base64-encoded images and text logs.
**Rationale**: Models like GPT-4o and Claude 3.5 Sonnet (via OpenRouter) support this natively.
**Prompt Structure**:
- System: "You are a Data Observability Expert..."
- User Image: [Base64 Screenshot]
- User Text: "Recent Logs: \n[Log Snippets]..."
- Output Format: JSON (Status, Issues[], Recommendations[])

## 4. Documentation Persistence

**Decision**: Update `Dataset` and `Column` models in the existing metadata database (likely `mappings.db` or the main application DB if integrated).
**Rationale**: Keeps documentation co-located with the assets.
**Mechanism**: The `DocumentationPlugin` will fetch schema, generate markdown, and execute an `UPDATE` operation on the relevant tables/fields.

## 5. Git Commit Integration

**Decision**: Add a REST endpoint `/api/git/generate-message` used by the frontend Git component.
**Rationale**: Keeps the heavy lifting (LLM call, diff processing) on the backend. The frontend simply sends the list of staged files and a diff summary (truncated if necessary).

## 6. Security & Storage

**Decision**: Encrypt API keys at rest using the existing `Fernet` or similar encryption mechanism used for database credentials.
**Rationale**: API keys are sensitive. They should not be stored in plain text in `config.json` or the DB.

## 7. Retry Logic

**Decision**: Use `tenacity` library for decorators on LLM service methods.
**Rationale**: Standard, robust, declarative retry logic (exponential backoff) as required by FR-018.