Introduction
The Challenge: I needed to build a 38-application Business Central ecosystem to validate an architectural vision. Not a toy demo – a complete, production-quality implementation with proper testing, integration patterns, and architectural best practices.
The Approach: Instead of traditional development, I used Claude Code in long-running agentic mode, orchestrating AI-driven development across 700+ AL files spanning 19 production applications (with 19 companion test app structures for future test implementation).
The Result: A proven three-phase workflow (Building → Reviewing → Improving) that successfully generated a complex, interconnected Business Central ecosystem while maintaining quality and architectural integrity.
This isn’t theoretical – it’s the real-world pattern I used to build at scale with AI assistance.
Prerequisites
If You Want to Apply These Patterns:
- Access to an AI coding assistant with long-running conversation capability (Claude Code, GitHub Copilot, etc.)
- Ideally, access to multiple model tiers (planning models vs execution models)
- A complex, multi-module project requiring systematic development
- Comfort with PowerShell or scripting for orchestration
- Understanding that different models behave differently – more on this below
Background Knowledge Helpful:
- Experience with large-scale software development
- Understanding of dependency management
- Familiarity with code review practices
- Working knowledge of JSON and basic scripting
The Challenge: Building 38 Applications with AI
Project Scope
NubimancyPencilSketch – A complete Business Central ecosystem:
- 19 Production applications (4 AppSource-style foundation apps + 15 PTE extensions)
- 19 Test application structures (companion apps ready for future test code implementation)
- 108 modules across 20 extensions
- Object ranges 50,000-83,799
- Five distinct hero businesses with cross-integration requirements
Note on Test Code: The test application structures exist, but test code implementation is planned for a later phase. I wanted the foundation stable first – including naming conventions, object structures, and anything that might trigger refactoring – before investing in comprehensive test code. This avoids the pain of refactoring tests when core objects change.
The Problem
Traditional development approaches don’t work well with AI agents because:
- Context drift: Agents lose track in long sessions
- Dependency confusion: Without structure, agents skip prerequisites
- Quality inconsistency: No systematic review means missed patterns
- Progress opacity: Hard to know what’s done and what remains
- Model gravity: Even with crystal-clear instructions, models have strong behavioral tendencies that pull against your guidance
The Model Gravity Challenge
One of the most interesting discoveries during this project: models and system prompts exert a gravity-like effect on outcomes. Even with comprehensive, explicit instructions, the underlying model can persistently do things that contradict those instructions.
For example, I had clear naming conventions documented, but the model would repeatedly generate different filename patterns. It wasn’t ignoring the instructions – it was being pulled by stronger patterns baked into the model’s training or system prompt.
This means your orchestration system needs to:
- Be even more explicit than you think necessary
- Include validation steps to catch drift
- Accept that some model behaviors require multiple correction passes
- Design review phases that catch systematic deviations
The Model Selection Strategy
One of the most effective patterns I also used: use different models for different types of work.
Planning and Instruction Creation – Claude Opus 4.5 and 4.6:
- Used planning modes with richer thinking models
- Built comprehensive instructions and architectural guidelines
- Made critical design decisions and established patterns
- Defined the “rules of the game” that would guide implementation
Mechanical Implementation – Claude Sonnet or Haiku:
- Executed well-defined tasks where decision-making was less essential
- Followed established patterns and instructions
- Generated code based on clear specifications
- Much faster execution for straightforward implementation work
Parallel Task Execution:
Once instructions were clear and dependencies mapped, I could launch multiple implementation tasks simultaneously. The first Building pass took only about 2 hours total – remarkably fast for generating the initial structure of 19 applications.
The key insight: expensive thinking models for planning and decision-making, faster models for execution. This dramatically improves both cost and speed.
The Solution
A three-phase agentic orchestration system with structured instructions, progress tracking, dependency management, and validation checkpoints to counteract model gravity.
Phase 1: Building – Generate the Implementation
The Concept
Create detailed specifications for each module, then let agents implement them systematically while respecting dependencies.
Key Components
1. Comprehensive Instructions
Detailed agent guidelines covering:
- Architectural standards (naming, namespaces, object ranges)
- Hero business contexts (why each application exists)
- Integration patterns (how apps connect)
- Development constraints (what to avoid)
Why This Matters: Without clear instructions, agents make inconsistent decisions. Comprehensive guidance minimizes drift.
2. Machine-Readable Progress Tracking
JSON files tracking phase completion and dependencies.
Why This Matters: Agents need state management. JSON provides resumable, queryable progress.
3. Dependency-Aware Task Identification
PowerShell scripts identify next actionable tasks by finding incomplete phases, checking dependency completion, and outputting “Next Actionable Tasks” lists.
Why This Matters: Prevents wasted effort on blocked tasks. Enables agents to self-direct.
4. Organized Specifications
- AppSource Foundation: Base app specifications
- Hero PTEs: Extension specs for each hero business
- Each phase has markdown spec with business context and technical requirements
Real Results
- 114 implementation phases completed
- Systematic progression through dependencies
- Consistent architectural patterns across all apps
- Complete traceability of what was built and when
- Initial building pass completed in approximately 2 hours using parallel task execution
Phase 2: Reviewing – Comprehensive Quality Assurance
The Concept
Don’t review code line-by-line. Use pattern-based detection across 9 architectural categories to identify systematic improvements.
The Nine Review Dimensions
- Naming – Object/field/file naming compliance with standards
- Data Model – Enum vs Text, Setup tables vs hardcoded values
- Business Logic – Magic numbers, hardcoded thresholds, business rules
- Performance – SetLoadFields, SIFT, CalcFields optimization
- Security – Permission sets, data classification, user access
- Error Handling – ErrorInfo patterns, suggested actions, user guidance
- Extensibility – Events, interfaces, labels for translation
- UX – Page design, FactBoxes, promoted actions, user experience
- Sustainability – Code comments, documentation, tooltips, maintainability
The Power of Pattern-Based Review
Traditional Approach: Read every line, review everything
Problem: Token explosion, context loss, inconsistent thoroughness
Pattern-Based Approach: Search for specific patterns
Benefits:
- Scalable to large codebases
- Consistent detection across modules
- Token-efficient (grep searches, not full file reads)
- Parallelizable across categories
Real Results
- 108 modules reviewed across 9 dimensions = 972 category reviews
- Systematic pattern detection across entire codebase
- Structured findings enabling targeted improvements
- Complete audit trail of quality assessment
Phase 3: Improving – Systematic Quality Enhancement
The Concept
Use review findings to drive category-by-category improvements with clear tracking and logical commit boundaries.
Key Components
- Improvement Instructions: Guidelines for applying fixes
- Category-Level Progress: JSON tracking which categories are complete
- Progress Reporting: Scripts showing completion statistics
- Special Phases: Complex improvements need inventory and implementation guides
The Improvement Workflow
- Select Category: Use progress script to identify next category
- Read Findings: Review reports from Phase 2 for that category
- Apply Fixes: Use file edit tools to implement improvements
- Batch Commits: Logical groupings (e.g., all naming fixes, all performance optimizations)
- Update Progress: Mark category completion in JSON
- Verify: Quick build to ensure no breaking changes
Real Results
- Systematic improvement across all quality dimensions
- Clear commit history showing category-based improvements
- Measurable progress tracking
- No regressions due to structured approach
The Orchestration Pattern: Key Principles
1. Stateful Progress Tracking
Problem: Agents forget what’s been done between sessions
Solution: JSON files provide resumable, queryable state
2. Instruction-Driven Development
Problem: Agent drift and inconsistent decisions
Solution: Comprehensive markdown instructions minimize variations
3. Dependency Management
Problem: Agents work on blocked tasks
Solution: Scripts identify only unblocked, actionable work
4. Pattern-Based Detection
Problem: Line-by-line review doesn’t scale
Solution: Grep searches for specific patterns across entire codebase
5. Parallel Work Streams
Problem: Sequential work is slow
Solution: Independent categories can progress simultaneously
6. Structured Outputs
Problem: Hard to track what agents produced
Solution: Organized folders, consistent report formats, clear naming
Lessons Learned: What Works, What Doesn’t
✅ What Works
Strategic Model Selection
- Use richer thinking models (Opus) for planning and instruction creation
- Use faster models (Sonnet, Haiku) for mechanical implementation
- Match model capability to task complexity
- Parallel task execution when dependencies allow
Clear, Comprehensive Instructions
- Agents need context about why, not just what
- Hero business contexts prevent generic implementations
- Architectural constraints keep implementations consistent
Machine-Readable Progress
- JSON enables agents to self-direct
- PowerShell scripts provide human-readable views
- Clear state management prevents duplicate work
Pattern-Based Quality Checks
- Grep searches scale better than full file reads
- Consistent patterns enable systematic improvements
- Category-based reviews are parallelizable
Logical Commit Boundaries
- Category-based commits make sense in git history
- Easy to review changes by quality dimension
- Clear audit trail for improvements
❌ What Doesn’t Work
Vague Instructions
- “Make it better” leads to arbitrary decisions
- Agents need specific guidance and constraints
- Generic patterns don’t tell business stories
Underestimating Model Gravity
- Even crystal-clear instructions can be overridden by model tendencies
- Single-pass corrections often revert in later sessions
- Need validation layers to catch systematic drift
- Different models have different “gravity wells” – what works with one may not work with another
Over-Ambitious Single Sessions
- Trying to do too much causes context loss
- Better to do one category thoroughly than many superficially
- Break work into agent-sized chunks
Line-by-Line Code Review
- Doesn’t scale to 700+ files
- Token explosion kills context (and budgets!)
- Pattern-based detection is more effective
Ignoring Dependencies
- Wasted effort on blocked work
- Rework when prerequisites change
- Dependency-aware scheduling is essential
Real-World Application Beyond Business Central
These patterns apply to any large-scale software development:
Web Application Development
- Building: Component-by-component implementation
- Reviewing: Accessibility, performance, security patterns
- Improving: Category-based enhancements
Infrastructure as Code
- Building: Module-by-module Terraform/ARM templates
- Reviewing: Security compliance, cost optimization, naming
- Improving: Systematic hardening and optimization
API Development
- Building: Endpoint-by-endpoint implementation
- Reviewing: OpenAPI compliance, error handling, authentication
- Improving: Documentation and response optimization
The Common Thread
Any project with:
- Multiple modules with dependencies
- Quality standards across dimensions
- Need for systematic review and improvement
Getting Started with Long-Running Agents
Start Small
- Pick ONE module or component
- Write comprehensive instructions
- Create simple JSON progress tracking
- Have agent implement following instructions
- Review and refine instructions
Scale Up
- Add dependency management
- Create progress reporting scripts
- Expand to multiple modules
- Introduce category-based reviews
- Implement improvement workflows
- Consider model selection strategy (planning vs execution models)
- Leverage parallel task execution where dependencies allow
Refine
- Watch for agent drift – strengthen instructions
- Monitor for repeated questions – add to docs
- Track quality issues – add to review categories
- Observe model-specific behaviors – some things require multiple correction passes
- Measure effectiveness – adjust workflows
Conclusion
What We’ve Accomplished
- Built 38 applications (700+ files) using agentic orchestration
- Validated the pattern for long-running AI-driven development
- Created reusable workflows applicable to any large-scale project
- Demonstrated practical AI beyond simple code completion
The Key Insight
AI agents are incredibly powerful when given:
- ✅ Clear instructions with business context
- ✅ Structured progress tracking
- ✅ Dependency-aware task management
- ✅ Pattern-based quality frameworks
- ✅ Systematic improvement workflows
- ✅ Validation checkpoints to counteract model gravity
- ✅ Strategic model selection (planning models vs execution models)
- ✅ Parallel task execution where appropriate
Without this structure, they drift. With it, they scale.
A note on tools and models: I used Claude Code for this project, leveraging different Claude models strategically – Opus 4.5/4.6 for planning and instruction creation, Sonnet or Haiku for mechanical implementation. This model selection strategy, combined with parallel task execution, enabled remarkably fast development (initial building pass in ~2 hours). Different AI coding assistants have different strengths – choose based on your project needs and be prepared to work with each model’s particular behaviors.
Next Steps for You
- Try It: Start with one module and comprehensive instructions
- Measure: Track what works and what causes drift
- Iterate: Refine your orchestration based on results
- Scale: Expand to larger projects with confidence
Want to See the Results?
Curious what this orchestration actually built? Check out The Pencil Sketch: Building the Destination First to see the complete 38-application Business Central ecosystem that proves the vision.
Resources
- NubimancyPencilSketch Repository: github.com/Nubimancy/PencilSketch
- Orchestration Scripts: PowerShell examples available in project
.docs/folder - Instruction Templates: Build, Review, and Improve instruction patterns
Connect
Have questions about implementing these patterns? Want to share your own agentic orchestration experiences?
- LinkedIn: jeremyvyska
- BlueSky: @bc.jeremy.vyska.info
- Blog: jeremyvyska.com (here)
This development approach was validated building the Nubimancy educational ecosystem – where fantasy storytelling meets real Business Central development. Learn more at nubimancy.com.

