# Research: LLM Analysis & Documentation Plugins **Feature**: `017-llm-analysis-plugin` ## 1. LLM Provider Integration **Decision**: Use a unified `LLMProviderService` that abstracts OpenAI-compatible APIs. **Rationale**: OpenRouter, Kilo, and OpenAI all support the standard OpenAI API format. This simplifies implementation by using the `openai` Python SDK and changing the `base_url` and `api_key` dynamically based on configuration. **Alternatives Considered**: LangChain (too heavy/complex for this specific scope), custom HTTP requests (reinventing the wheel). ## 2. Dashboard Screenshot Capture **Decision**: Implement a `ScreenshotService` with a strategy pattern supporting `Playwright` (primary) and `Superset API` (fallback). **Rationale**: - **Playwright**: Provides the most accurate "user-view" render, handling JS-heavy charts that API thumbnails might miss or render poorly. Requires a browser binary. - **Superset API**: Faster, lightweight, but relies on Superset's internal thumbnail cache which can be stale. **Implementation Detail**: The service will check configuration. If 'Headless' is selected, it launches a Playwright context, logs in (using a service account or session cookie), navigates to the dashboard, waits for network idle, and captures. ## 3. Multimodal Analysis Prompting **Decision**: Use a structured prompt template that accepts base64-encoded images and text logs. **Rationale**: Models like GPT-4o and Claude 3.5 Sonnet (via OpenRouter) support this natively. **Prompt Structure**: - System: "You are a Data Observability Expert..." - User Image: [Base64 Screenshot] - User Text: "Recent Logs: \n[Log Snippets]..." - Output Format: JSON (Status, Issues[], Recommendations[]) ## 4. Documentation Persistence **Decision**: Update `Dataset` and `Column` models in the existing metadata database (likely `mappings.db` or the main application DB if integrated). **Rationale**: Keeps documentation co-located with the assets. **Mechanism**: The `DocumentationPlugin` will fetch schema, generate markdown, and execute an `UPDATE` operation on the relevant tables/fields. ## 5. Git Commit Integration **Decision**: Add a REST endpoint `/api/git/generate-message` used by the frontend Git component. **Rationale**: Keeps the heavy lifting (LLM call, diff processing) on the backend. The frontend simply sends the list of staged files and a diff summary (truncated if necessary). ## 6. Security & Storage **Decision**: Encrypt API keys at rest using the existing `Fernet` or similar encryption mechanism used for database credentials. **Rationale**: API keys are sensitive. They should not be stored in plain text in `config.json` or the DB. ## 7. Retry Logic **Decision**: Use `tenacity` library for decorators on LLM service methods. **Rationale**: Standard, robust, declarative retry logic (exponential backoff) as required by FR-018.