Files
ss-tools/specs/017-llm-analysis-plugin/spec.md
2026-01-28 18:30:23 +03:00

8.8 KiB
Raw Blame History

Feature Specification: LLM Analysis & Documentation Plugins

Feature Branch: 017-llm-analysis-plugin Created: 2026-01-28 Status: Draft Input: User description: "LLM Dashboard Validation Plugin для интеграции LLM в ss-tools. Плагин должен поддерживать анализ корректности работы дашбордов через мультимодальную LLM (скриншот + логи). Второй плагин должен заниматься документированием датасетов и дашбордов, используя LLM. Возможно включение в создание текста коммитов в плагине Git Поддержка провайдеров: OpenRouter, Kilo Provider, OpenAI API. Интеграция с существующей PluginBase архитектурой, Task Manager, WebSocket логами."

Clarifications

Session 2026-01-28

  • Q: Notification Content Strategy → A: Summary with Link: Include status (Pass/Fail), key issues count, and a direct link to the full report in the UI.
  • Q: Dashboard Screenshot Source → A: Hybrid (Configurable): Support both Headless Browser (accurate) and API/Thumbnail (fast) methods, allowing admin configuration.
  • Q: Dataset Documentation Output Format → A: Direct Object Update: Update the description fields of the dataset and its columns directly within the dataset object (persisted to backend/metadata).
  • Q: Git Commit Message Context → A: Diff + Recent History: Send diff, file names, and the last 3 commit messages to match style.
  • Q: LLM Failure Handling → A: Retry then Fail: Automatically retry 3 times with exponential backoff before failing.

User Scenarios & Testing (mandatory)

User Story 1 - Dashboard Health Analysis (Priority: P1)

As a Data Engineer, I want to automatically analyze a dashboard's status using visual and log data directly from the Environments interface so that I can identify rendering issues or data errors without manual inspection.

Why this priority: Core value proposition of the feature. Enables automated quality assurance.

Independent Test: Can be tested by selecting a dashboard in the Environment list and clicking "Validate", or by scheduling a validation task.

Acceptance Scenarios:

  1. Given I am on the Environments page (Dashboard list), When I select a dashboard and click "Validate", Then the system triggers a validation task with the dashboard's context.
  2. Given the validation task is running, When it completes, Then I see the analysis report (visual + logs) in the task output history.
  3. Given I want regular checks, When I configure a schedule for the validation task (similar to Backup plugin), Then the system runs the check automatically at the specified interval.
  4. Given a validation issue is found, When the task completes, Then the system sends a notification (Email/Pulse) if configured.

User Story 2 - Automated Dataset Documentation (Priority: P1)

As a Data Steward, I want to generate documentation for datasets and dashboards using LLMs so that I can maintain up-to-date metadata with minimal manual effort.

Why this priority: significantly reduces maintenance overhead for data governance.

Independent Test: Can be tested by selecting a dataset and triggering the documentation task, then checking if a description is generated.

Acceptance Scenarios:

  1. Given a dataset identifier, When I run the Documentation task, Then the system fetches the dataset's schema and metadata.
  2. Given the metadata is fetched, When sent to the LLM, Then a structured description/documentation text is returned.
  3. Given the documentation is generated, When the task completes, Then the result is available for review (e.g., in the task log or saved to a file/db).

User Story 3 - LLM Provider Configuration (Priority: P1)

As an Administrator, I want to configure different LLM providers (OpenAI, OpenRouter, Kilo) so that I can switch between models based on cost or capability.

Why this priority: Prerequisite for any LLM functionality.

Independent Test: Can be tested by entering API keys in settings and verifying a connection/test call.

Acceptance Scenarios:

  1. Given the Settings page, When I select an LLM provider (e.g., OpenAI) and enter an API key, Then the system saves the configuration.
  2. Given a configured provider, When I run an analysis task, Then the system uses the selected provider for API calls.

User Story 4 - Git Commit Message Suggestion (Priority: P3)

As a Developer, I want the system to suggest commit messages based on changes directly within the Git plugin interface so that I can maintain consistent history with minimal effort.

Why this priority: Enhances the existing Git workflow and improves commit quality.

Independent Test: Can be tested by staging files in the Git plugin and clicking the "Generate Message" button.

Acceptance Scenarios:

  1. Given staged changes in the Git plugin, When I click "Generate Message", Then the system analyzes the diff using the configured LLM and populates the commit message field with a suggested summary.

Edge Cases

  • What happens when the LLM provider API is down or times out? (System should retry or fail gracefully with a clear error message).
  • What happens if the dashboard screenshot cannot be generated? (System should proceed with logs only or fail depending on configuration).
  • What happens if the context (logs/metadata) exceeds the LLM's token limit? (System should truncate or summarize input).
  • How does the system handle missing API keys? (Task should fail immediately with a configuration error).

Requirements (mandatory)

Functional Requirements

  • FR-001: System MUST allow configuration of multiple LLM providers, specifically supporting OpenAI API, OpenRouter, and Kilo Provider.
  • FR-002: System MUST securely store API keys for these providers.
  • FR-003: System MUST implement a DashboardValidationPlugin that integrates with the existing PluginBase architecture.
  • FR-004: DashboardValidationPlugin MUST accept a dashboard identifier as input.
  • FR-005: DashboardValidationPlugin MUST be capable of retrieving a visual representation (screenshot) of the dashboard.
  • FR-016: System MUST support configurable screenshot strategies: 'Headless Browser' (default, high accuracy) and 'API Thumbnail' (fallback/fast).
  • FR-006: DashboardValidationPlugin MUST retrieve recent execution logs associated with the dashboard.
  • FR-007: DashboardValidationPlugin MUST combine visual and text data to prompt a Multimodal LLM for analysis.
  • FR-008: System MUST implement a DocumentationPlugin (or similar) for documenting datasets and dashboards.
  • FR-009: DocumentationPlugin MUST retrieve schema and metadata for the target asset.
  • FR-017: DocumentationPlugin MUST apply generated descriptions directly to the target object's metadata fields (dataset description, column descriptions).
  • FR-010: All LLM interactions MUST be executed as asynchronous tasks via the Task Manager.
  • FR-018: System MUST implement automatic retry logic (3 attempts with exponential backoff) for failed LLM API calls.
  • FR-011: Task execution logs and results MUST be streamed via the existing WebSocket infrastructure.
  • FR-012: System SHOULD expose an interface to generate text summaries for Git diffs, utilizing the diff, file list, and recent commit history as context.
  • FR-013: System MUST support scheduling of validation tasks for dashboards (leveraging existing scheduler architecture).
  • FR-014: System SHOULD support notification dispatch (Email, Pulse) upon validation failure or completion.
  • FR-015: Notifications MUST contain a summary of results (Status, Issue Count) and a direct link to the full report, avoiding sensitive full details in the message body.

Key Entities

  • LLMProviderConfig: Stores provider type (OpenAI, etc.), base URL, model name, and API key.
  • ValidationResult: Stores the analysis output, timestamp, and reference to the dashboard.
  • AutoDocumentation: Stores the generated documentation text for an asset.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: Users can successfully configure and validate a connection to at least one LLM provider.
  • SC-002: A dashboard validation task completes within 90 seconds (assuming standard LLM latency).
  • SC-003: The system successfully processes a multimodal prompt (image + text) and returns a structured analysis.
  • SC-004: Generated documentation for a standard dataset contains descriptions for at least 80% of the columns (based on LLM capability, but pipeline must support it).