48 lines
2.8 KiB
Markdown
48 lines
2.8 KiB
Markdown
# Research: LLM Analysis & Documentation Plugins
|
|
|
|
**Feature**: `017-llm-analysis-plugin`
|
|
|
|
## 1. LLM Provider Integration
|
|
|
|
**Decision**: Use a unified `LLMProviderService` that abstracts OpenAI-compatible APIs.
|
|
**Rationale**: OpenRouter, Kilo, and OpenAI all support the standard OpenAI API format. This simplifies implementation by using the `openai` Python SDK and changing the `base_url` and `api_key` dynamically based on configuration.
|
|
**Alternatives Considered**: LangChain (too heavy/complex for this specific scope), custom HTTP requests (reinventing the wheel).
|
|
|
|
## 2. Dashboard Screenshot Capture
|
|
|
|
**Decision**: Implement a `ScreenshotService` with a strategy pattern supporting `Playwright` (primary) and `Superset API` (fallback).
|
|
**Rationale**:
|
|
- **Playwright**: Provides the most accurate "user-view" render, handling JS-heavy charts that API thumbnails might miss or render poorly. Requires a browser binary.
|
|
- **Superset API**: Faster, lightweight, but relies on Superset's internal thumbnail cache which can be stale.
|
|
**Implementation Detail**: The service will check configuration. If 'Headless' is selected, it launches a Playwright context, logs in (using a service account or session cookie), navigates to the dashboard, waits for network idle, and captures.
|
|
|
|
## 3. Multimodal Analysis Prompting
|
|
|
|
**Decision**: Use a structured prompt template that accepts base64-encoded images and text logs.
|
|
**Rationale**: Models like GPT-4o and Claude 3.5 Sonnet (via OpenRouter) support this natively.
|
|
**Prompt Structure**:
|
|
- System: "You are a Data Observability Expert..."
|
|
- User Image: [Base64 Screenshot]
|
|
- User Text: "Recent Logs: \n[Log Snippets]..."
|
|
- Output Format: JSON (Status, Issues[], Recommendations[])
|
|
|
|
## 4. Documentation Persistence
|
|
|
|
**Decision**: Update `Dataset` and `Column` models in the existing metadata database (likely `mappings.db` or the main application DB if integrated).
|
|
**Rationale**: Keeps documentation co-located with the assets.
|
|
**Mechanism**: The `DocumentationPlugin` will fetch schema, generate markdown, and execute an `UPDATE` operation on the relevant tables/fields.
|
|
|
|
## 5. Git Commit Integration
|
|
|
|
**Decision**: Add a REST endpoint `/api/git/generate-message` used by the frontend Git component.
|
|
**Rationale**: Keeps the heavy lifting (LLM call, diff processing) on the backend. The frontend simply sends the list of staged files and a diff summary (truncated if necessary).
|
|
|
|
## 6. Security & Storage
|
|
|
|
**Decision**: Encrypt API keys at rest using the existing `Fernet` or similar encryption mechanism used for database credentials.
|
|
**Rationale**: API keys are sensitive. They should not be stored in plain text in `config.json` or the DB.
|
|
|
|
## 7. Retry Logic
|
|
|
|
**Decision**: Use `tenacity` library for decorators on LLM service methods.
|
|
**Rationale**: Standard, robust, declarative retry logic (exponential backoff) as required by FR-018. |