Building at Scale: Long-Running AI Agents for Real Development

Photo by Glenn Carstens-Peters on Unsplash

Introduction

The Challenge: I needed to build a 38-application Business Central ecosystem to validate an architectural vision. Not a toy demo – a complete, production-quality implementation with proper testing, integration patterns, and architectural best practices.

The Approach: Instead of traditional development, I used Claude Code in long-running agentic mode, orchestrating AI-driven development across 700+ AL files spanning 19 production applications (with 19 companion test app structures for future test implementation).

The Result: A proven three-phase workflow (Building → Reviewing → Improving) that successfully generated a complex, interconnected Business Central ecosystem while maintaining quality and architectural integrity.

This isn’t theoretical – it’s the real-world pattern I used to build at scale with AI assistance.

Prerequisites

If You Want to Apply These Patterns:

  • Access to an AI coding assistant with long-running conversation capability (Claude Code, GitHub Copilot, etc.)
  • Ideally, access to multiple model tiers (planning models vs execution models)
  • A complex, multi-module project requiring systematic development
  • Comfort with PowerShell or scripting for orchestration
  • Understanding that different models behave differently – more on this below

Background Knowledge Helpful:

  • Experience with large-scale software development
  • Understanding of dependency management
  • Familiarity with code review practices
  • Working knowledge of JSON and basic scripting

The Challenge: Building 38 Applications with AI

Project Scope

NubimancyPencilSketch – A complete Business Central ecosystem:

  • 19 Production applications (4 AppSource-style foundation apps + 15 PTE extensions)
  • 19 Test application structures (companion apps ready for future test code implementation)
  • 108 modules across 20 extensions
  • Object ranges 50,000-83,799
  • Five distinct hero businesses with cross-integration requirements

Note on Test Code: The test application structures exist, but test code implementation is planned for a later phase. I wanted the foundation stable first – including naming conventions, object structures, and anything that might trigger refactoring – before investing in comprehensive test code. This avoids the pain of refactoring tests when core objects change.

The Problem

Traditional development approaches don’t work well with AI agents because:

  • Context drift: Agents lose track in long sessions
  • Dependency confusion: Without structure, agents skip prerequisites
  • Quality inconsistency: No systematic review means missed patterns
  • Progress opacity: Hard to know what’s done and what remains
  • Model gravity: Even with crystal-clear instructions, models have strong behavioral tendencies that pull against your guidance

The Model Gravity Challenge

One of the most interesting discoveries during this project: models and system prompts exert a gravity-like effect on outcomes. Even with comprehensive, explicit instructions, the underlying model can persistently do things that contradict those instructions.

For example, I had clear naming conventions documented, but the model would repeatedly generate different filename patterns. It wasn’t ignoring the instructions – it was being pulled by stronger patterns baked into the model’s training or system prompt.

This means your orchestration system needs to:

  • Be even more explicit than you think necessary
  • Include validation steps to catch drift
  • Accept that some model behaviors require multiple correction passes
  • Design review phases that catch systematic deviations

The Model Selection Strategy

One of the most effective patterns I also used: use different models for different types of work.

Planning and Instruction Creation – Claude Opus 4.5 and 4.6:

  • Used planning modes with richer thinking models
  • Built comprehensive instructions and architectural guidelines
  • Made critical design decisions and established patterns
  • Defined the “rules of the game” that would guide implementation

Mechanical Implementation – Claude Sonnet or Haiku:

  • Executed well-defined tasks where decision-making was less essential
  • Followed established patterns and instructions
  • Generated code based on clear specifications
  • Much faster execution for straightforward implementation work

Parallel Task Execution:
Once instructions were clear and dependencies mapped, I could launch multiple implementation tasks simultaneously. The first Building pass took only about 2 hours total – remarkably fast for generating the initial structure of 19 applications.

The key insight: expensive thinking models for planning and decision-making, faster models for execution. This dramatically improves both cost and speed.

The Solution

A three-phase agentic orchestration system with structured instructions, progress tracking, dependency management, and validation checkpoints to counteract model gravity.


Phase 1: Building – Generate the Implementation

The Concept

Create detailed specifications for each module, then let agents implement them systematically while respecting dependencies.

Key Components

1. Comprehensive Instructions

Detailed agent guidelines covering:

  • Architectural standards (naming, namespaces, object ranges)
  • Hero business contexts (why each application exists)
  • Integration patterns (how apps connect)
  • Development constraints (what to avoid)

Why This Matters: Without clear instructions, agents make inconsistent decisions. Comprehensive guidance minimizes drift.

2. Machine-Readable Progress Tracking

JSON files tracking phase completion and dependencies.

Why This Matters: Agents need state management. JSON provides resumable, queryable progress.

3. Dependency-Aware Task Identification

PowerShell scripts identify next actionable tasks by finding incomplete phases, checking dependency completion, and outputting “Next Actionable Tasks” lists.

Why This Matters: Prevents wasted effort on blocked tasks. Enables agents to self-direct.

4. Organized Specifications

  • AppSource Foundation: Base app specifications
  • Hero PTEs: Extension specs for each hero business
  • Each phase has markdown spec with business context and technical requirements

Real Results

  • 114 implementation phases completed
  • Systematic progression through dependencies
  • Consistent architectural patterns across all apps
  • Complete traceability of what was built and when
  • Initial building pass completed in approximately 2 hours using parallel task execution

Phase 2: Reviewing – Comprehensive Quality Assurance

The Concept

Don’t review code line-by-line. Use pattern-based detection across 9 architectural categories to identify systematic improvements.

The Nine Review Dimensions

  1. Naming – Object/field/file naming compliance with standards
  2. Data Model – Enum vs Text, Setup tables vs hardcoded values
  3. Business Logic – Magic numbers, hardcoded thresholds, business rules
  4. Performance – SetLoadFields, SIFT, CalcFields optimization
  5. Security – Permission sets, data classification, user access
  6. Error Handling – ErrorInfo patterns, suggested actions, user guidance
  7. Extensibility – Events, interfaces, labels for translation
  8. UX – Page design, FactBoxes, promoted actions, user experience
  9. Sustainability – Code comments, documentation, tooltips, maintainability

The Power of Pattern-Based Review

Traditional Approach: Read every line, review everything
Problem: Token explosion, context loss, inconsistent thoroughness

Pattern-Based Approach: Search for specific patterns
Benefits:

  • Scalable to large codebases
  • Consistent detection across modules
  • Token-efficient (grep searches, not full file reads)
  • Parallelizable across categories

Real Results

  • 108 modules reviewed across 9 dimensions = 972 category reviews
  • Systematic pattern detection across entire codebase
  • Structured findings enabling targeted improvements
  • Complete audit trail of quality assessment

Phase 3: Improving – Systematic Quality Enhancement

The Concept

Use review findings to drive category-by-category improvements with clear tracking and logical commit boundaries.

Key Components

  1. Improvement Instructions: Guidelines for applying fixes
  2. Category-Level Progress: JSON tracking which categories are complete
  3. Progress Reporting: Scripts showing completion statistics
  4. Special Phases: Complex improvements need inventory and implementation guides

The Improvement Workflow

  1. Select Category: Use progress script to identify next category
  2. Read Findings: Review reports from Phase 2 for that category
  3. Apply Fixes: Use file edit tools to implement improvements
  4. Batch Commits: Logical groupings (e.g., all naming fixes, all performance optimizations)
  5. Update Progress: Mark category completion in JSON
  6. Verify: Quick build to ensure no breaking changes

Real Results

  • Systematic improvement across all quality dimensions
  • Clear commit history showing category-based improvements
  • Measurable progress tracking
  • No regressions due to structured approach

The Orchestration Pattern: Key Principles

1. Stateful Progress Tracking

Problem: Agents forget what’s been done between sessions
Solution: JSON files provide resumable, queryable state

2. Instruction-Driven Development

Problem: Agent drift and inconsistent decisions
Solution: Comprehensive markdown instructions minimize variations

3. Dependency Management

Problem: Agents work on blocked tasks
Solution: Scripts identify only unblocked, actionable work

4. Pattern-Based Detection

Problem: Line-by-line review doesn’t scale
Solution: Grep searches for specific patterns across entire codebase

5. Parallel Work Streams

Problem: Sequential work is slow
Solution: Independent categories can progress simultaneously

6. Structured Outputs

Problem: Hard to track what agents produced
Solution: Organized folders, consistent report formats, clear naming


Lessons Learned: What Works, What Doesn’t

✅ What Works

Strategic Model Selection

  • Use richer thinking models (Opus) for planning and instruction creation
  • Use faster models (Sonnet, Haiku) for mechanical implementation
  • Match model capability to task complexity
  • Parallel task execution when dependencies allow

Clear, Comprehensive Instructions

  • Agents need context about why, not just what
  • Hero business contexts prevent generic implementations
  • Architectural constraints keep implementations consistent

Machine-Readable Progress

  • JSON enables agents to self-direct
  • PowerShell scripts provide human-readable views
  • Clear state management prevents duplicate work

Pattern-Based Quality Checks

  • Grep searches scale better than full file reads
  • Consistent patterns enable systematic improvements
  • Category-based reviews are parallelizable

Logical Commit Boundaries

  • Category-based commits make sense in git history
  • Easy to review changes by quality dimension
  • Clear audit trail for improvements

❌ What Doesn’t Work

Vague Instructions

  • “Make it better” leads to arbitrary decisions
  • Agents need specific guidance and constraints
  • Generic patterns don’t tell business stories

Underestimating Model Gravity

  • Even crystal-clear instructions can be overridden by model tendencies
  • Single-pass corrections often revert in later sessions
  • Need validation layers to catch systematic drift
  • Different models have different “gravity wells” – what works with one may not work with another

Over-Ambitious Single Sessions

  • Trying to do too much causes context loss
  • Better to do one category thoroughly than many superficially
  • Break work into agent-sized chunks

Line-by-Line Code Review

  • Doesn’t scale to 700+ files
  • Token explosion kills context (and budgets!)
  • Pattern-based detection is more effective

Ignoring Dependencies

  • Wasted effort on blocked work
  • Rework when prerequisites change
  • Dependency-aware scheduling is essential

Real-World Application Beyond Business Central

These patterns apply to any large-scale software development:

Web Application Development

  • Building: Component-by-component implementation
  • Reviewing: Accessibility, performance, security patterns
  • Improving: Category-based enhancements

Infrastructure as Code

  • Building: Module-by-module Terraform/ARM templates
  • Reviewing: Security compliance, cost optimization, naming
  • Improving: Systematic hardening and optimization

API Development

  • Building: Endpoint-by-endpoint implementation
  • Reviewing: OpenAPI compliance, error handling, authentication
  • Improving: Documentation and response optimization

The Common Thread

Any project with:

  • Multiple modules with dependencies
  • Quality standards across dimensions
  • Need for systematic review and improvement

Getting Started with Long-Running Agents

Start Small

  1. Pick ONE module or component
  2. Write comprehensive instructions
  3. Create simple JSON progress tracking
  4. Have agent implement following instructions
  5. Review and refine instructions

Scale Up

  1. Add dependency management
  2. Create progress reporting scripts
  3. Expand to multiple modules
  4. Introduce category-based reviews
  5. Implement improvement workflows
  6. Consider model selection strategy (planning vs execution models)
  7. Leverage parallel task execution where dependencies allow

Refine

  1. Watch for agent drift – strengthen instructions
  2. Monitor for repeated questions – add to docs
  3. Track quality issues – add to review categories
  4. Observe model-specific behaviors – some things require multiple correction passes
  5. Measure effectiveness – adjust workflows

Conclusion

What We’ve Accomplished

  • Built 38 applications (700+ files) using agentic orchestration
  • Validated the pattern for long-running AI-driven development
  • Created reusable workflows applicable to any large-scale project
  • Demonstrated practical AI beyond simple code completion

The Key Insight

AI agents are incredibly powerful when given:

  • ✅ Clear instructions with business context
  • ✅ Structured progress tracking
  • ✅ Dependency-aware task management
  • ✅ Pattern-based quality frameworks
  • ✅ Systematic improvement workflows
  • ✅ Validation checkpoints to counteract model gravity
  • ✅ Strategic model selection (planning models vs execution models)
  • ✅ Parallel task execution where appropriate

Without this structure, they drift. With it, they scale.

A note on tools and models: I used Claude Code for this project, leveraging different Claude models strategically – Opus 4.5/4.6 for planning and instruction creation, Sonnet or Haiku for mechanical implementation. This model selection strategy, combined with parallel task execution, enabled remarkably fast development (initial building pass in ~2 hours). Different AI coding assistants have different strengths – choose based on your project needs and be prepared to work with each model’s particular behaviors.

Next Steps for You

  1. Try It: Start with one module and comprehensive instructions
  2. Measure: Track what works and what causes drift
  3. Iterate: Refine your orchestration based on results
  4. Scale: Expand to larger projects with confidence

Want to See the Results?

Curious what this orchestration actually built? Check out The Pencil Sketch: Building the Destination First to see the complete 38-application Business Central ecosystem that proves the vision.


Resources

  • NubimancyPencilSketch Repository: github.com/Nubimancy/PencilSketch
  • Orchestration Scripts: PowerShell examples available in project .docs/ folder
  • Instruction Templates: Build, Review, and Improve instruction patterns

Connect

Have questions about implementing these patterns? Want to share your own agentic orchestration experiences?

  • LinkedIn: jeremyvyska
  • BlueSky: @bc.jeremy.vyska.info
  • Blog: jeremyvyska.com (here)

This development approach was validated building the Nubimancy educational ecosystem – where fantasy storytelling meets real Business Central development. Learn more at nubimancy.com.

Leave a Comment

Your email address will not be published. Required fields are marked *