Files
ss-tools/specs/021-llm-project-assistant/research.md

98 lines
3.4 KiB
Markdown

# Phase 0 Research: LLM Chat Assistant for Project Operations
**Feature**: [`021-llm-project-assistant`](specs/021-llm-project-assistant)
**Input Spec**: [`spec.md`](specs/021-llm-project-assistant/spec.md)
**Related UX**: [`ux_reference.md`](specs/021-llm-project-assistant/ux_reference.md)
## 1) Intent parsing for operational commands (RU/EN)
### Decision
Use hybrid parsing:
1. Lightweight deterministic recognizers for high-signal commands/entities (IDs, env names, keywords).
2. LLM interpretation for ambiguous/free-form phrasing.
3. Confidence threshold gate (`needs_clarification` when below threshold).
### Rationale
Deterministic layer reduces hallucination risk for critical operations; LLM layer preserves natural-language flexibility.
### Alternatives considered
- **Pure LLM parser only**: rejected due to unsafe variance for critical ops.
- **Pure regex/rules only**: rejected due to low language flexibility and high maintenance.
---
## 2) Risk classification and confirmation policy
### Decision
Introduce explicit risk levels per command template:
- `safe`: read/status operations.
- `guarded`: state-changing non-production operations.
- `dangerous`: production deploy/migration and similarly impactful actions.
`dangerous` always requires explicit confirmation token before execution.
### Rationale
This maps to user acceptance criteria and prevents accidental production-impact operations.
### Alternatives considered
- **Prompt-only warning without hard confirmation gate**: rejected as insufficiently safe.
- **Confirmation for every command**: rejected due to severe UX friction.
---
## 3) Execution integration strategy
### Decision
Assistant dispatch calls existing internal services/endpoints instead of duplicating business logic.
Target mappings:
- Git -> existing git routes/service methods.
- Migration/backup/LLM analysis -> task creation with existing plugin IDs.
- Status/report -> existing tasks/reports APIs.
### Rationale
Reusing existing execution paths preserves permission checks and reduces regression risk.
### Alternatives considered
- **New parallel execution engine**: rejected (duplicated logic, bypass risk).
---
## 4) Confirmation token lifecycle
### Decision
Confirmation token model with one-time usage + TTL + user binding.
- Token includes normalized command snapshot + risk metadata.
- Confirm executes exactly once.
- Expired/cancelled token cannot be reused.
### Rationale
Prevents duplicate destructive execution and supports explicit audit trail.
### Alternatives considered
- **Session flag confirmation without token**: rejected due to weak idempotency guarantees.
---
## 5) Auditability and observability model
### Decision
Log each assistant interaction as structured audit entry:
- actor, raw message, parsed intent, confidence, risk level, decision, dispatch target, outcome, linked `task_id` if any.
### Rationale
Required for post-incident analysis and security traceability.
### Alternatives considered
- **Log only successful executions**: rejected (misses denied/failed attempts and ambiguity events).
---
## Consolidated Research Outcomes for Planning
- Hybrid parser with confidence gating is selected.
- Risk-classified confirmation flow is mandatory for dangerous operations.
- Existing APIs/plugins are the only execution backends.
- One-time confirmation token with TTL is required for safety and idempotency.
- Structured assistant audit logging is required for operational trust.