2.8 KiB
Research: LLM Analysis & Documentation Plugins
Feature: 017-llm-analysis-plugin
1. LLM Provider Integration
Decision: Use a unified LLMProviderService that abstracts OpenAI-compatible APIs.
Rationale: OpenRouter, Kilo, and OpenAI all support the standard OpenAI API format. This simplifies implementation by using the openai Python SDK and changing the base_url and api_key dynamically based on configuration.
Alternatives Considered: LangChain (too heavy/complex for this specific scope), custom HTTP requests (reinventing the wheel).
2. Dashboard Screenshot Capture
Decision: Implement a ScreenshotService with a strategy pattern supporting Playwright (primary) and Superset API (fallback).
Rationale:
- Playwright: Provides the most accurate "user-view" render, handling JS-heavy charts that API thumbnails might miss or render poorly. Requires a browser binary.
- Superset API: Faster, lightweight, but relies on Superset's internal thumbnail cache which can be stale. Implementation Detail: The service will check configuration. If 'Headless' is selected, it launches a Playwright context, logs in (using a service account or session cookie), navigates to the dashboard, waits for network idle, and captures.
3. Multimodal Analysis Prompting
Decision: Use a structured prompt template that accepts base64-encoded images and text logs. Rationale: Models like GPT-4o and Claude 3.5 Sonnet (via OpenRouter) support this natively. Prompt Structure:
- System: "You are a Data Observability Expert..."
- User Image: [Base64 Screenshot]
- User Text: "Recent Logs: \n[Log Snippets]..."
- Output Format: JSON (Status, Issues[], Recommendations[])
4. Documentation Persistence
Decision: Update Dataset and Column models in the existing metadata database (likely mappings.db or the main application DB if integrated).
Rationale: Keeps documentation co-located with the assets.
Mechanism: The DocumentationPlugin will fetch schema, generate markdown, and execute an UPDATE operation on the relevant tables/fields.
5. Git Commit Integration
Decision: Add a REST endpoint /api/git/generate-message used by the frontend Git component.
Rationale: Keeps the heavy lifting (LLM call, diff processing) on the backend. The frontend simply sends the list of staged files and a diff summary (truncated if necessary).
6. Security & Storage
Decision: Encrypt API keys at rest using the existing Fernet or similar encryption mechanism used for database credentials.
Rationale: API keys are sensitive. They should not be stored in plain text in config.json or the DB.
7. Retry Logic
Decision: Use tenacity library for decorators on LLM service methods.
Rationale: Standard, robust, declarative retry logic (exponential backoff) as required by FR-018.