Files
2026-01-28 18:30:23 +03:00

2.8 KiB

Research: LLM Analysis & Documentation Plugins

Feature: 017-llm-analysis-plugin

1. LLM Provider Integration

Decision: Use a unified LLMProviderService that abstracts OpenAI-compatible APIs. Rationale: OpenRouter, Kilo, and OpenAI all support the standard OpenAI API format. This simplifies implementation by using the openai Python SDK and changing the base_url and api_key dynamically based on configuration. Alternatives Considered: LangChain (too heavy/complex for this specific scope), custom HTTP requests (reinventing the wheel).

2. Dashboard Screenshot Capture

Decision: Implement a ScreenshotService with a strategy pattern supporting Playwright (primary) and Superset API (fallback). Rationale:

  • Playwright: Provides the most accurate "user-view" render, handling JS-heavy charts that API thumbnails might miss or render poorly. Requires a browser binary.
  • Superset API: Faster, lightweight, but relies on Superset's internal thumbnail cache which can be stale. Implementation Detail: The service will check configuration. If 'Headless' is selected, it launches a Playwright context, logs in (using a service account or session cookie), navigates to the dashboard, waits for network idle, and captures.

3. Multimodal Analysis Prompting

Decision: Use a structured prompt template that accepts base64-encoded images and text logs. Rationale: Models like GPT-4o and Claude 3.5 Sonnet (via OpenRouter) support this natively. Prompt Structure:

  • System: "You are a Data Observability Expert..."
  • User Image: [Base64 Screenshot]
  • User Text: "Recent Logs: \n[Log Snippets]..."
  • Output Format: JSON (Status, Issues[], Recommendations[])

4. Documentation Persistence

Decision: Update Dataset and Column models in the existing metadata database (likely mappings.db or the main application DB if integrated). Rationale: Keeps documentation co-located with the assets. Mechanism: The DocumentationPlugin will fetch schema, generate markdown, and execute an UPDATE operation on the relevant tables/fields.

5. Git Commit Integration

Decision: Add a REST endpoint /api/git/generate-message used by the frontend Git component. Rationale: Keeps the heavy lifting (LLM call, diff processing) on the backend. The frontend simply sends the list of staged files and a diff summary (truncated if necessary).

6. Security & Storage

Decision: Encrypt API keys at rest using the existing Fernet or similar encryption mechanism used for database credentials. Rationale: API keys are sensitive. They should not be stored in plain text in config.json or the DB.

7. Retry Logic

Decision: Use tenacity library for decorators on LLM service methods. Rationale: Standard, robust, declarative retry logic (exponential backoff) as required by FR-018.