Files
ss-tools/specs/017-llm-analysis-plugin/spec.md
busya 7de96c17c4 feat(llm-plugin): switch to environment API for log retrieval
- Replace local backend.log reading with Superset API /log/ fetch
- Update DashboardValidationPlugin to use SupersetClient
- Filter logs by dashboard_id and last 24 hours
- Update spec FR-006 to reflect API usage
2026-02-06 17:57:25 +03:00

12 KiB
Raw Permalink Blame History

Feature Specification: LLM Analysis & Documentation Plugins

Feature Branch: 017-llm-analysis-plugin Created: 2026-01-28 Status: Draft Input: User description: "LLM Dashboard Validation Plugin для интеграции LLM в ss-tools. Плагин должен поддерживать анализ корректности работы дашбордов через мультимодальную LLM (скриншот + логи). Второй плагин должен заниматься документированием датасетов и дашбордов, используя LLM. Возможно включение в создание текста коммитов в плагине Git Поддержка провайдеров: OpenRouter, Kilo Provider, OpenAI API. Интеграция с существующей PluginBase архитектурой, Task Manager, WebSocket логами."

Clarifications

Session 2026-01-28

  • Q: Notification Content Strategy → A: Summary with Link: Include status (Pass/Fail), key issues count, and a direct link to the full report in the UI.
  • Q: Dashboard Screenshot Source → A: Hybrid (Configurable): Support both Headless Browser (accurate) and API/Thumbnail (fast) methods, allowing admin configuration.
  • Q: Dataset Documentation Output Format → A: Direct Object Update: Update the description fields of the dataset and its columns directly within the dataset object (persisted to backend/metadata).
  • Q: Git Commit Message Context → A: Diff + Recent History: Send diff, file names, and the last 3 commit messages to match style.
  • Q: LLM Failure Handling → A: Retry then Fail: Automatically retry 3 times with exponential backoff before failing.

User Scenarios & Testing (mandatory)

User Story 1 - Dashboard Health Analysis (Priority: P1)

As a Data Engineer, I want to automatically analyze a dashboard's status using visual and log data directly from the Environments interface so that I can identify rendering issues or data errors without manual inspection.

Why this priority: Core value proposition of the feature. Enables automated quality assurance.

Independent Test: Can be tested by selecting a dashboard in the Environment list and clicking "Validate", or by scheduling a validation task.

Acceptance Scenarios:

  1. Given I am on the Environments page (Dashboard list), When I select a dashboard and click "Validate", Then the system triggers a validation task with the dashboard's context.
  2. Given the validation task is running, When it completes, Then I see the analysis report (visual + logs) in the task output history.
  3. Given I want regular checks, When I configure a schedule for the validation task (similar to Backup plugin), Then the system runs the check automatically at the specified interval.
  4. Given a validation issue is found, When the task completes, Then the system sends a notification (Email/Pulse) if configured.

User Story 2 - Automated Dataset Documentation (Priority: P1)

As a Data Steward, I want to generate documentation for datasets and dashboards using LLMs so that I can maintain up-to-date metadata with minimal manual effort.

Why this priority: significantly reduces maintenance overhead for data governance.

Independent Test: Can be tested by selecting a dataset and triggering the documentation task, then checking if a description is generated.

Acceptance Scenarios:

  1. Given a dataset identifier, When I run the Documentation task, Then the system fetches the dataset's schema and metadata.
  2. Given the metadata is fetched, When sent to the LLM, Then a structured description/documentation text is returned.
  3. Given the documentation is generated, When the task completes, Then the result is available for review (e.g., in the task log or saved to a file/db).

User Story 3 - LLM Provider Configuration (Priority: P1)

As an Administrator, I want to configure different LLM providers (OpenAI, OpenRouter, Kilo) so that I can switch between models based on cost or capability.

Why this priority: Prerequisite for any LLM functionality.

Independent Test: Can be tested by entering API keys in settings and verifying a connection/test call.

Acceptance Scenarios:

  1. Given the Settings page, When I select an LLM provider (e.g., OpenAI) and enter an API key, Then the system saves the configuration.
  2. Given a configured provider, When I run an analysis task, Then the system uses the selected provider for API calls.

User Story 4 - Git Commit Message Suggestion (Priority: P3)

As a Developer, I want the system to suggest commit messages based on changes directly within the Git plugin interface so that I can maintain consistent history with minimal effort.

Why this priority: Enhances the existing Git workflow and improves commit quality.

Independent Test: Can be tested by staging files in the Git plugin and clicking the "Generate Message" button.

Acceptance Scenarios:

  1. Given staged changes in the Git plugin, When I click "Generate Message", Then the system analyzes the diff using the configured LLM and populates the commit message field with a suggested summary.

Edge Cases

  • What happens when the LLM provider API is down or times out? (System should retry or fail gracefully with a clear error message).
  • What happens if the dashboard screenshot cannot be generated? (System should proceed with logs only or fail depending on configuration).
  • What happens if the context (logs/metadata) exceeds the LLM's token limit? (System should truncate or summarize input).
  • How does the system handle missing API keys? (Task should fail immediately with a configuration error).
  • What happens if the dashboard has multiple tabs with lazy-loaded charts? (System must switch through all tabs recursively to trigger chart rendering before capture).
  • What happens if Playwright encounters font loading timeouts in headless mode? (System must use CDP Page.captureScreenshot to bypass Playwright's internal timeout mechanism).

Requirements (mandatory)

Functional Requirements

  • FR-001: System MUST allow configuration of multiple LLM providers, specifically supporting OpenAI API, OpenRouter, and Kilo Provider.
  • FR-002: System MUST securely store API keys for these providers using AES-256 encryption. [Security]
  • FR-028: The system MUST mask all API keys in the UI and logs, displaying only the last 4 characters (e.g., sk-...1234). [Security]
  • FR-003: System MUST implement a DashboardValidationPlugin that integrates with the existing PluginBase architecture.
  • FR-004: DashboardValidationPlugin MUST accept a dashboard identifier as input.
  • FR-005: DashboardValidationPlugin MUST be capable of retrieving a visual representation (screenshot) of the dashboard. The visual representation MUST be a PNG image with a resolution of 1920px width and full page height to ensure all dashboard content is captured. [Clarity]
  • FR-016: System MUST support configurable screenshot strategies: 'Headless Browser' (default, high accuracy) and 'API Thumbnail' (fallback/fast).
  • FR-030: The screenshot capture MUST use Playwright with Chrome DevTools Protocol (CDP) to avoid font loading timeouts in headless mode.
  • FR-031: The screenshot capture MUST implement recursive tab switching to trigger lazy-loaded chart rendering on multi-tab dashboards before capturing.
  • FR-006: DashboardValidationPlugin MUST retrieve recent execution logs associated with the dashboard from the Environment API (e.g., /api/v1/log/), limited to the last 100 lines or 24 hours (whichever is smaller) to prevent token overflow. [Reliability]
  • FR-007: DashboardValidationPlugin MUST combine visual and text data to prompt a Multimodal LLM for analysis. The analysis output MUST be structured as a JSON object containing status (Pass/Fail), issues (list of strings), and summary (text) to enable structured UI presentation. [Clarity]
  • FR-008: System MUST implement a DocumentationPlugin (or similar) for documenting datasets and dashboards.
  • FR-009: DocumentationPlugin MUST retrieve schema and metadata for the target asset.
  • FR-017: DocumentationPlugin MUST apply generated descriptions directly to the target object's metadata fields (dataset description, column descriptions). It MUST handle schema changes by only updating fields that exist in the current schema and ignoring hallucinated columns. [Data Integrity]
  • FR-023: The system MUST use optimistic locking or version checks to prevent overwrites during concurrent documentation updates. [Data Integrity]
  • FR-024: Generated documentation MUST be plain text or Markdown, strictly avoiding executable code blocks or active HTML. [Security]
  • FR-025: The system MUST validate that generated commit messages follow the conventional commits format (e.g., feat:, fix:) and do not exceed 72 characters in the subject line. [Data Integrity]
  • FR-026: If a metadata update fails partially, the system MUST rollback all changes to preserve data consistency. [Data Integrity]
  • FR-027: The system MUST reject documentation requests for empty or null schemas with a clear error message. [Edge Case]
  • FR-010: All LLM interactions MUST be executed as asynchronous tasks via the Task Manager.
  • FR-018: System MUST implement automatic retry logic (3 attempts with exponential backoff: 2s, 4s, 8s) for failed LLM API calls. [Reliability]
  • FR-029: The system MUST filter sensitive data (PII, credentials) from logs and screenshots before sending them to external LLM providers. [Privacy]
  • FR-011: Task execution logs and results MUST be streamed via the existing WebSocket infrastructure.
  • FR-012: System SHOULD expose an interface to generate text summaries for Git diffs, utilizing the diff, file list, and recent commit history as context. If the generated message is empty or invalid, the system MUST display a user-friendly error toast and retain the manual entry capability. [Edge Case]
  • FR-013: System MUST support scheduling of validation tasks for dashboards (leveraging existing scheduler architecture).
  • FR-014: System SHOULD support notification dispatch (Email, Pulse) upon validation failure or completion.
  • FR-015: Notifications MUST contain a summary of results (Status, Issue Count) and a direct link to the full report, avoiding sensitive full details in the message body.
  • FR-019: The "Validate" button MUST display a loading spinner and be disabled during active execution to prevent multiple triggers. [UX]
  • FR-020: The system MUST provide immediate visual feedback (Toast notifications) for successful or failed connection tests in LLM Settings. [UX]
  • FR-021: New LLM actions (Validate, Generate Docs) MUST use standard system icons (e.g., heroicons:beaker for validate, heroicons:document-text for docs) and follow existing button placement patterns. [Consistency]
  • FR-022: Error messages for LLM failures MUST follow the standard ss-tools error format, including a clear title, descriptive message, and troubleshooting link if applicable. [Consistency]

Key Entities

  • LLMProviderConfig: Stores provider type (OpenAI, etc.), base URL, model name, and API key.
  • ValidationResult: Stores the analysis output, timestamp, and reference to the dashboard.
  • AutoDocumentation: Stores the generated documentation text for an asset.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: Users can successfully configure and validate a connection to at least one LLM provider.
  • SC-002: A dashboard validation task completes within 90 seconds (assuming standard LLM latency).
  • SC-003: The system successfully processes a multimodal prompt (image + text) and returns a structured analysis.
  • SC-004: Generated documentation for a standard dataset contains descriptions for at least 80% of the columns (based on LLM capability, but pipeline must support it).
  • SC-005: Screenshots capture full dashboard content including all tabs (1920px width, full height) without font loading timeouts.
  • SC-006: Analysis results are displayed in task logs with clear [ANALYSIS_SUMMARY] and [ANALYSIS_ISSUE] markers for easy parsing.