feat: whyrating - initial project from turbostarter boilerplate
This commit is contained in:
627
docs/tcs/TCS-CLAUDE-CODE.md
Normal file
627
docs/tcs/TCS-CLAUDE-CODE.md
Normal file
@@ -0,0 +1,627 @@
|
||||
# TCS with Claude Code CLI
|
||||
|
||||
> Run Triangulated Compiler Synthesis entirely within Claude Code using multi-agent Task orchestration. No external API calls needed.
|
||||
|
||||
## Overview
|
||||
|
||||
Claude Code can execute TCS by spawning parallel Task agents that act as the generators, judge, and builder roles. All computation happens within Claude Code's context.
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
U[User] -->|"Run TCS for X"| CC[Claude Code Main]
|
||||
CC -->|spawn| G1[Task: Generator 1]
|
||||
CC -->|spawn| G2[Task: Generator 2]
|
||||
CC -->|spawn| G3[Task: Generator 3]
|
||||
G1 & G2 & G3 -->|results| CC
|
||||
CC -->|spawn| J[Task: Judge]
|
||||
J -->|findings| CC
|
||||
CC -->|if gaps| E[Task: Evolver]
|
||||
E -->|new spec| CC
|
||||
CC -->|spawn parallel| B1[Task: Scanner]
|
||||
CC -->|spawn parallel| B2[Task: Parser]
|
||||
CC -->|spawn parallel| B3[Task: Emitter]
|
||||
B1 & B2 & B3 -->|code| CC
|
||||
CC -->|write| F[Files]
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **User provides**: Domain, initial spec, target sample count
|
||||
2. **Claude Code orchestrates**: Spawns Task agents in parallel
|
||||
3. **No API calls**: All LLM work is done by Claude Code itself via Task tool
|
||||
4. **Files written**: Spec, compiler code, test samples saved to disk
|
||||
|
||||
## Execution Protocol
|
||||
|
||||
When asked to run TCS, Claude Code should:
|
||||
|
||||
### Step 1: Gather Parameters
|
||||
|
||||
Ask the user:
|
||||
```
|
||||
To run TCS, I need:
|
||||
1. Domain name (e.g., "form-builder", "chart-dsl")
|
||||
2. Path to initial spec (or I can create one)
|
||||
3. Number of iterations (recommended: 5-10)
|
||||
4. Samples per iteration (recommended: 5-10)
|
||||
5. Output directory
|
||||
```
|
||||
|
||||
### Step 2: Initialize
|
||||
|
||||
```
|
||||
Create output structure:
|
||||
{output}/
|
||||
├── spec/
|
||||
│ └── SPEC-v0.md (initial)
|
||||
├── samples/
|
||||
│ ├── valid/
|
||||
│ └── invalid/
|
||||
├── compiler/
|
||||
│ ├── scanner.ts
|
||||
│ ├── parser.ts
|
||||
│ └── emitter.ts
|
||||
└── state.json (checkpoint)
|
||||
```
|
||||
|
||||
### Step 3: Run Iteration Loop
|
||||
|
||||
For each iteration:
|
||||
|
||||
#### 3a. Generate Samples (Parallel Tasks)
|
||||
|
||||
Spawn N parallel Task agents with `subagent_type: "general-purpose"`:
|
||||
|
||||
```
|
||||
Task 1: "Generate sample for [domain].
|
||||
Prompt: 'Create a [random scenario]'
|
||||
|
||||
Return JSON:
|
||||
{
|
||||
'prompt': '...',
|
||||
'rep1_visual': '... (description/JSX)',
|
||||
'rep2_structured': { ... (JSON schema) },
|
||||
'rep3_dsl': '... (DSL code)'
|
||||
}
|
||||
|
||||
Use this spec: [current spec content]"
|
||||
|
||||
Task 2: (same with different random prompt)
|
||||
Task 3: (same with different random prompt)
|
||||
...
|
||||
Task N: (same with different random prompt)
|
||||
```
|
||||
|
||||
#### 3b. Triangulate (Parallel Tasks)
|
||||
|
||||
For each sample, spawn a judge Task:
|
||||
|
||||
```
|
||||
Task: "Compare these 3 representations for semantic equivalence:
|
||||
|
||||
VISUAL: [rep1]
|
||||
STRUCTURED: [rep2]
|
||||
DSL: [rep3]
|
||||
|
||||
Return JSON:
|
||||
{
|
||||
'equivalent': true/false,
|
||||
'disagreements': ['...'],
|
||||
'spec_gaps': ['...']
|
||||
}"
|
||||
```
|
||||
|
||||
#### 3c. Evolve Spec (Single Task if gaps found)
|
||||
|
||||
```
|
||||
Task: "Based on these findings from triangulation:
|
||||
[list of spec_gaps]
|
||||
|
||||
Current spec:
|
||||
[spec content]
|
||||
|
||||
Return the COMPLETE updated spec with changes marked."
|
||||
```
|
||||
|
||||
Write new spec version to `spec/SPEC-v{N}.md`
|
||||
|
||||
#### 3d. Build/Update Compiler (Parallel Tasks)
|
||||
|
||||
Spawn 3 parallel Tasks:
|
||||
|
||||
```
|
||||
Task Scanner: "Write a TypeScript scanner for this DSL spec:
|
||||
[spec content]
|
||||
|
||||
Return complete scanner.ts code."
|
||||
|
||||
Task Parser: "Write a TypeScript parser for this DSL spec:
|
||||
[spec content]
|
||||
|
||||
Return complete parser.ts code."
|
||||
|
||||
Task Emitter: "Write a TypeScript emitter for this DSL spec:
|
||||
[spec content]
|
||||
Target output: [structured format]
|
||||
|
||||
Return complete emitter.ts code."
|
||||
```
|
||||
|
||||
Write files to `compiler/`
|
||||
|
||||
#### 3e. Test (Main Agent)
|
||||
|
||||
Claude Code main agent:
|
||||
- Reads validated samples
|
||||
- Runs compiler mentally or via Bash if executable
|
||||
- Compares output to expected
|
||||
- Records pass/fail
|
||||
|
||||
#### 3f. Checkpoint
|
||||
|
||||
Write to `state.json`:
|
||||
```json
|
||||
{
|
||||
"iteration": 3,
|
||||
"spec_version": 3,
|
||||
"samples_validated": 25,
|
||||
"samples_invalid": 3,
|
||||
"pass_rate": 0.92,
|
||||
"last_findings": ["..."]
|
||||
}
|
||||
```
|
||||
|
||||
### Step 4: Convergence Check
|
||||
|
||||
Stop when:
|
||||
- `pass_rate >= 0.95` AND
|
||||
- `last 2 iterations had 0 spec changes` AND
|
||||
- `samples_validated >= target`
|
||||
|
||||
Or max iterations reached.
|
||||
|
||||
### Step 5: Final Output
|
||||
|
||||
```
|
||||
TCS Complete!
|
||||
|
||||
Results:
|
||||
- Iterations: 7
|
||||
- Final spec: spec/SPEC-v7.md
|
||||
- Validated samples: 52
|
||||
- Pass rate: 98%
|
||||
- Compiler: compiler/{scanner,parser,emitter}.ts
|
||||
|
||||
Files written to: {output}/
|
||||
```
|
||||
|
||||
## Parallel Task Strategy
|
||||
|
||||
**Maximize parallelism** by launching independent tasks together:
|
||||
|
||||
```
|
||||
Iteration N:
|
||||
├── [PARALLEL] Generate 10 samples (10 Tasks)
|
||||
│ └── Wait for all
|
||||
├── [PARALLEL] Triangulate 10 samples (10 Tasks)
|
||||
│ └── Wait for all
|
||||
├── [SEQUENTIAL] Evolve spec (1 Task, only if gaps)
|
||||
├── [PARALLEL] Build compiler (3 Tasks)
|
||||
│ └── Wait for all
|
||||
└── [SEQUENTIAL] Test & checkpoint (Main agent)
|
||||
```
|
||||
|
||||
## Sample Task Prompts
|
||||
|
||||
### Generator Task Prompt Template
|
||||
|
||||
```
|
||||
You are generating a TCS sample for domain: {domain}
|
||||
|
||||
SPEC:
|
||||
---
|
||||
{spec_content}
|
||||
---
|
||||
|
||||
USER SCENARIO: {random_prompt}
|
||||
|
||||
Generate 3 semantically equivalent representations:
|
||||
|
||||
1. VISUAL (description or JSX):
|
||||
Describe what the user would see/interact with.
|
||||
|
||||
2. STRUCTURED (JSON):
|
||||
The underlying data model.
|
||||
|
||||
3. DSL (compact notation):
|
||||
Using the spec syntax above.
|
||||
|
||||
Return as JSON:
|
||||
{
|
||||
"prompt": "{random_prompt}",
|
||||
"rep1_visual": "...",
|
||||
"rep2_structured": {...},
|
||||
"rep3_dsl": "..."
|
||||
}
|
||||
```
|
||||
|
||||
### Judge Task Prompt Template
|
||||
|
||||
```
|
||||
You are a TCS triangulation judge.
|
||||
|
||||
Compare these 3 representations for SEMANTIC EQUIVALENCE:
|
||||
|
||||
REP1 (Visual):
|
||||
{rep1}
|
||||
|
||||
REP2 (Structured):
|
||||
{rep2}
|
||||
|
||||
REP3 (DSL):
|
||||
{rep3}
|
||||
|
||||
Check:
|
||||
1. Do all 3 represent the SAME thing?
|
||||
2. Are there missing fields in any representation?
|
||||
3. Are there contradictions?
|
||||
4. Does the DSL follow the spec correctly?
|
||||
|
||||
SPEC for reference:
|
||||
{spec}
|
||||
|
||||
Return JSON:
|
||||
{
|
||||
"equivalent": true/false,
|
||||
"disagreements": [
|
||||
"Rep1 has X but Rep2 missing",
|
||||
"Rep3 uses Y syntax not in spec"
|
||||
],
|
||||
"spec_gaps": [
|
||||
"Spec doesn't define how to handle X",
|
||||
"Need syntax for Y"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Evolver Task Prompt Template
|
||||
|
||||
```
|
||||
You are evolving a DSL specification based on TCS findings.
|
||||
|
||||
CURRENT SPEC:
|
||||
---
|
||||
{spec}
|
||||
---
|
||||
|
||||
FINDINGS FROM TRIANGULATION:
|
||||
{findings_list}
|
||||
|
||||
Update the spec to address these gaps. Rules:
|
||||
1. Add missing syntax/semantics
|
||||
2. Clarify ambiguities
|
||||
3. Add examples for new features
|
||||
4. Keep backwards compatible if possible
|
||||
5. Mark changes with <!-- NEW v{N} -->
|
||||
|
||||
Return the COMPLETE updated spec.
|
||||
```
|
||||
|
||||
### Builder Task Prompt Templates
|
||||
|
||||
**Scanner:**
|
||||
```
|
||||
Write a TypeScript scanner/tokenizer for this DSL:
|
||||
|
||||
SPEC:
|
||||
{spec}
|
||||
|
||||
Requirements:
|
||||
- Export: scan(source: string): Token[]
|
||||
- Token type: { type: string, value: string, line: number, column: number }
|
||||
- Handle all syntax in spec
|
||||
- Good error messages with line/column
|
||||
|
||||
Return complete scanner.ts file.
|
||||
```
|
||||
|
||||
**Parser:**
|
||||
```
|
||||
Write a TypeScript parser for this DSL:
|
||||
|
||||
SPEC:
|
||||
{spec}
|
||||
|
||||
TOKEN TYPES (from scanner):
|
||||
{token_types}
|
||||
|
||||
Requirements:
|
||||
- Export: parse(tokens: Token[]): AST
|
||||
- AST should mirror the structured representation
|
||||
- Handle all syntax in spec
|
||||
- Good error messages
|
||||
|
||||
Return complete parser.ts file.
|
||||
```
|
||||
|
||||
**Emitter:**
|
||||
```
|
||||
Write a TypeScript emitter for this DSL:
|
||||
|
||||
SPEC:
|
||||
{spec}
|
||||
|
||||
AST TYPE:
|
||||
{ast_type}
|
||||
|
||||
TARGET:
|
||||
{target_format}
|
||||
|
||||
Requirements:
|
||||
- Export: emit(ast: AST): TargetType
|
||||
- Handle all node types
|
||||
- Preserve all semantic information
|
||||
|
||||
Return complete emitter.ts file.
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
If a Task fails:
|
||||
1. Log the error
|
||||
2. Retry once with clarified prompt
|
||||
3. If still fails, mark sample as invalid
|
||||
4. Continue with other samples
|
||||
|
||||
If compiler build fails:
|
||||
1. Spawn a "Reflector" Task to diagnose
|
||||
2. Apply suggested fix
|
||||
3. Retry build
|
||||
|
||||
## Usage Example
|
||||
|
||||
User says:
|
||||
> "Run TCS to build a chart DSL compiler. 5 iterations, 8 samples each."
|
||||
|
||||
Claude Code responds:
|
||||
> "Starting TCS for chart-dsl domain.
|
||||
>
|
||||
> Configuration:
|
||||
> - Iterations: 5
|
||||
> - Samples/iteration: 8
|
||||
> - Output: ./output/chart-dsl/
|
||||
>
|
||||
> Creating initial spec... Do you have an existing spec or should I create one?"
|
||||
|
||||
User:
|
||||
> "Create one"
|
||||
|
||||
Claude Code:
|
||||
> "Created initial spec at ./output/chart-dsl/spec/SPEC-v0.md
|
||||
>
|
||||
> Starting iteration 1...
|
||||
> [Spawns 8 parallel generator Tasks]
|
||||
> Generated 8 samples.
|
||||
>
|
||||
> [Spawns 8 parallel judge Tasks]
|
||||
> Triangulation complete: 6 valid, 2 with disagreements.
|
||||
>
|
||||
> Findings:
|
||||
> - Axis label syntax undefined
|
||||
> - Legend position options missing
|
||||
>
|
||||
> [Spawns evolver Task]
|
||||
> Spec updated to v1.
|
||||
>
|
||||
> [Spawns 3 parallel builder Tasks]
|
||||
> Compiler built: scanner.ts, parser.ts, emitter.ts
|
||||
>
|
||||
> Testing... 5/6 passing (83%)
|
||||
>
|
||||
> Iteration 1 complete. Continuing..."
|
||||
|
||||
## Checkpoint & Resume
|
||||
|
||||
State file enables resume:
|
||||
|
||||
```json
|
||||
{
|
||||
"domain": "chart-dsl",
|
||||
"iteration": 3,
|
||||
"max_iterations": 5,
|
||||
"samples_per_iteration": 8,
|
||||
"spec_version": 3,
|
||||
"validated_samples": ["s001", "s002", ...],
|
||||
"invalid_samples": ["s003"],
|
||||
"compiler_version": 3,
|
||||
"pass_rate": 0.88
|
||||
}
|
||||
```
|
||||
|
||||
To resume:
|
||||
> "Resume TCS from ./output/chart-dsl/"
|
||||
|
||||
Claude Code reads state.json and continues from last checkpoint.
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use `run_in_background: false`** for Tasks - need results immediately
|
||||
2. **Batch parallel Tasks** in single message for efficiency
|
||||
3. **Keep spec in memory** during iteration, write to disk at checkpoints
|
||||
4. **Accumulate samples** across iterations for comprehensive testing
|
||||
5. **Use Opus** for compiler/spec tasks, **Sonnet** for judge/generator tasks
|
||||
|
||||
## Model Selection per Task
|
||||
|
||||
Use the best models for critical tasks:
|
||||
|
||||
```
|
||||
| Task Type | Model | Reason |
|
||||
|------------|----------|-------------------------------------|
|
||||
| Scanner | opus | Compiler code - highest quality |
|
||||
| Parser | opus | Compiler code - complex logic |
|
||||
| Emitter | opus | Compiler code - correctness critical|
|
||||
| Evolver | opus | Spec design - architecture decisions|
|
||||
| Architect | opus | Redesign - needs deep reasoning |
|
||||
| Judge | sonnet | Comparison - thorough but fast |
|
||||
| Reflector | sonnet | Diagnosis - good reasoning |
|
||||
| Generator | sonnet | Sample generation - quality matters |
|
||||
```
|
||||
|
||||
**Philosophy**: Use Opus 4.5 for anything that produces code or evolves the spec.
|
||||
Use Sonnet for validation, comparison, and generation tasks.
|
||||
|
||||
Specify in Task call:
|
||||
```
|
||||
Task(subagent_type="general-purpose", model="opus", prompt="...") // For compiler/spec
|
||||
Task(subagent_type="general-purpose", model="sonnet", prompt="...") // For judge/generator
|
||||
```
|
||||
|
||||
## Quality Principles (CRITICAL)
|
||||
|
||||
These principles are **non-negotiable** for TCS execution:
|
||||
|
||||
### 1. No Bypassing Issues
|
||||
|
||||
```
|
||||
❌ WRONG: "Add a try-catch to suppress the error"
|
||||
❌ WRONG: "Skip this edge case for now"
|
||||
❌ WRONG: "Use 'any' type to fix compilation"
|
||||
|
||||
✅ RIGHT: "The error reveals a spec gap - evolve the spec"
|
||||
✅ RIGHT: "This edge case needs proper handling in the parser"
|
||||
✅ RIGHT: "Define proper types that match the semantic model"
|
||||
```
|
||||
|
||||
When a test fails or error occurs:
|
||||
1. **Understand the root cause** - Why did it fail?
|
||||
2. **Trace to spec or architecture** - Is the spec incomplete? Is the design wrong?
|
||||
3. **Fix at the source** - Update spec, redesign module, or add proper handling
|
||||
4. **Never suppress** - Errors are signals, not noise
|
||||
|
||||
### 2. No Isolated Patches
|
||||
|
||||
```
|
||||
❌ WRONG: "Add special case for this one sample"
|
||||
❌ WRONG: "Hardcode this value to make test pass"
|
||||
❌ WRONG: "Add if-statement to handle user X's edge case"
|
||||
|
||||
✅ RIGHT: "This pattern appears in 3 samples - generalize the solution"
|
||||
✅ RIGHT: "Update the grammar to properly handle this construct"
|
||||
✅ RIGHT: "Refactor to use a consistent approach across all cases"
|
||||
```
|
||||
|
||||
When fixing issues:
|
||||
1. **Look for patterns** - Is this a one-off or systemic?
|
||||
2. **Generalize the fix** - Solution should work for all similar cases
|
||||
3. **Update tests** - Add samples that would catch similar issues
|
||||
4. **Refactor if needed** - Don't bolt-on, integrate properly
|
||||
|
||||
### 3. Production-Ready Code Only
|
||||
|
||||
```
|
||||
❌ WRONG: "// TODO: handle this later"
|
||||
❌ WRONG: "console.log('debug:', value)"
|
||||
❌ WRONG: "function parse(x: any): any"
|
||||
|
||||
✅ RIGHT: Complete implementation with all cases handled
|
||||
✅ RIGHT: Proper error types with descriptive messages
|
||||
✅ RIGHT: Full type safety with strict TypeScript
|
||||
```
|
||||
|
||||
Every compiler module must have:
|
||||
- **Strict TypeScript** - `strict: true`, `noUncheckedIndexedAccess: true`
|
||||
- **Complete error handling** - All error paths with useful messages
|
||||
- **No TODOs** - Every feature fully implemented
|
||||
- **No debug code** - Clean, production-ready output
|
||||
- **Proper exports** - Well-defined public API
|
||||
|
||||
### 4. Architecture Changes When Needed
|
||||
|
||||
```
|
||||
❌ WRONG: "Add another parameter to this 10-parameter function"
|
||||
❌ WRONG: "Nest another if-statement in this 200-line function"
|
||||
❌ WRONG: "Keep the broken design, just work around it"
|
||||
|
||||
✅ RIGHT: "The scanner needs a state machine - redesign it"
|
||||
✅ RIGHT: "Split this into separate passes for clarity"
|
||||
✅ RIGHT: "The AST structure doesn't match semantics - restructure"
|
||||
```
|
||||
|
||||
Signs you need architecture change:
|
||||
- **Same bug keeps appearing** in different forms
|
||||
- **Adding features is painful** - too many special cases
|
||||
- **Code is hard to understand** - even for the builder agent
|
||||
- **Tests are brittle** - small changes break many tests
|
||||
|
||||
When architecture needs changing:
|
||||
1. **Spawn Architect Task** to design new structure
|
||||
2. **Rebuild affected modules** from scratch (don't patch)
|
||||
3. **Migrate tests** to new structure
|
||||
4. **Verify all samples** still pass
|
||||
|
||||
### Quality Gate Checklist
|
||||
|
||||
Before marking an iteration complete:
|
||||
|
||||
```
|
||||
[ ] All compiler code compiles with strict TypeScript
|
||||
[ ] No 'any' types (except unavoidable external interfaces)
|
||||
[ ] No suppressed errors or warnings
|
||||
[ ] No TODO comments
|
||||
[ ] No console.log/debug statements
|
||||
[ ] All error paths have descriptive messages
|
||||
[ ] Code handles all spec-defined constructs
|
||||
[ ] No isolated patches - all fixes are generalized
|
||||
[ ] Architecture supports future spec evolution
|
||||
```
|
||||
|
||||
### Reflector Task for Quality Issues
|
||||
|
||||
When quality problems are detected, spawn a Reflector:
|
||||
|
||||
```
|
||||
Task: "Review this compiler module for quality issues:
|
||||
|
||||
CODE:
|
||||
{module_code}
|
||||
|
||||
SPEC:
|
||||
{spec}
|
||||
|
||||
Check for:
|
||||
1. Bypassed errors (try-catch hiding issues)
|
||||
2. Isolated patches (special cases that should be generalized)
|
||||
3. Missing type safety
|
||||
4. Incomplete error handling
|
||||
5. Architecture problems
|
||||
|
||||
Return JSON:
|
||||
{
|
||||
'quality_score': 0-100,
|
||||
'issues': [
|
||||
{
|
||||
'severity': 'critical|major|minor',
|
||||
'type': 'bypass|patch|types|errors|architecture',
|
||||
'location': 'line or function name',
|
||||
'problem': 'description',
|
||||
'solution': 'how to fix properly'
|
||||
}
|
||||
],
|
||||
'needs_redesign': true/false,
|
||||
'redesign_reason': 'why architecture change needed'
|
||||
}"
|
||||
```
|
||||
|
||||
### Iteration Quality Standards
|
||||
|
||||
| Metric | Minimum | Target |
|
||||
|--------|---------|--------|
|
||||
| TypeScript strict compliance | 100% | 100% |
|
||||
| Test pass rate | 90% | 98%+ |
|
||||
| Code quality score | 80 | 95+ |
|
||||
| Isolated patches | 0 | 0 |
|
||||
| Bypassed errors | 0 | 0 |
|
||||
| TODO comments | 0 | 0 |
|
||||
|
||||
**Do not proceed to next iteration if minimums not met.**
|
||||
Reference in New Issue
Block a user