Building at Scale: Long-Running AI Agents for Real Development

Introduction

The Challenge: I needed to build a 38-application Business Central ecosystem to validate an architectural vision. Not a toy demo – a complete, production-quality implementation with proper testing, integration patterns, and architectural best practices.

The Approach: Instead of traditional development, I used Claude Code in long-running agentic mode, orchestrating AI-driven development across 700+ AL files spanning 19 production applications (with 19 companion test app structures for future test implementation).

The Result: A proven three-phase workflow (Building → Reviewing → Improving) that successfully generated a complex, interconnected Business Central ecosystem while maintaining quality and architectural integrity.

This isn’t theoretical – it’s the real-world pattern I used to build at scale with AI assistance.

Prerequisites

If You Want to Apply These Patterns:

Access to an AI coding assistant with long-running conversation capability (Claude Code, GitHub Copilot, etc.)
Ideally, access to multiple model tiers (planning models vs execution models)
A complex, multi-module project requiring systematic development
Comfort with PowerShell or scripting for orchestration
Understanding that different models behave differently – more on this below

Background Knowledge Helpful:

Experience with large-scale software development
Understanding of dependency management
Familiarity with code review practices
Working knowledge of JSON and basic scripting

The Challenge: Building 38 Applications with AI

Project Scope

NubimancyPencilSketch – A complete Business Central ecosystem:

19 Production applications (4 AppSource-style foundation apps + 15 PTE extensions)
19 Test application structures (companion apps ready for future test code implementation)
108 modules across 20 extensions
Object ranges 50,000-83,799
Five distinct hero businesses with cross-integration requirements

Note on Test Code: The test application structures exist, but test code implementation is planned for a later phase. I wanted the foundation stable first – including naming conventions, object structures, and anything that might trigger refactoring – before investing in comprehensive test code. This avoids the pain of refactoring tests when core objects change.

The Problem

Traditional development approaches don’t work well with AI agents because:

Context drift: Agents lose track in long sessions
Dependency confusion: Without structure, agents skip prerequisites
Quality inconsistency: No systematic review means missed patterns
Progress opacity: Hard to know what’s done and what remains
Model gravity: Even with crystal-clear instructions, models have strong behavioral tendencies that pull against your guidance

The Model Gravity Challenge

One of the most interesting discoveries during this project: models and system prompts exert a gravity-like effect on outcomes. Even with comprehensive, explicit instructions, the underlying model can persistently do things that contradict those instructions.

For example, I had clear naming conventions documented, but the model would repeatedly generate different filename patterns. It wasn’t ignoring the instructions – it was being pulled by stronger patterns baked into the model’s training or system prompt.

This means your orchestration system needs to:

Be even more explicit than you think necessary
Include validation steps to catch drift
Accept that some model behaviors require multiple correction passes
Design review phases that catch systematic deviations

The Model Selection Strategy

One of the most effective patterns I also used: use different models for different types of work.

Planning and Instruction Creation – Claude Opus 4.5 and 4.6:

Used planning modes with richer thinking models
Built comprehensive instructions and architectural guidelines
Made critical design decisions and established patterns
Defined the “rules of the game” that would guide implementation

Mechanical Implementation – Claude Sonnet or Haiku:

Executed well-defined tasks where decision-making was less essential
Followed established patterns and instructions
Generated code based on clear specifications
Much faster execution for straightforward implementation work

Parallel Task Execution:
Once instructions were clear and dependencies mapped, I could launch multiple implementation tasks simultaneously. The first Building pass took only about 2 hours total – remarkably fast for generating the initial structure of 19 applications.

The key insight: expensive thinking models for planning and decision-making, faster models for execution. This dramatically improves both cost and speed.

The Solution

A three-phase agentic orchestration system with structured instructions, progress tracking, dependency management, and validation checkpoints to counteract model gravity.

Phase 1: Building – Generate the Implementation

The Concept

Create detailed specifications for each module, then let agents implement them systematically while respecting dependencies.

Key Components

1. Comprehensive Instructions

Detailed agent guidelines covering:

Architectural standards (naming, namespaces, object ranges)
Hero business contexts (why each application exists)
Integration patterns (how apps connect)
Development constraints (what to avoid)

Why This Matters: Without clear instructions, agents make inconsistent decisions. Comprehensive guidance minimizes drift.

2. Machine-Readable Progress Tracking

JSON files tracking phase completion and dependencies.

Why This Matters: Agents need state management. JSON provides resumable, queryable progress.

3. Dependency-Aware Task Identification

PowerShell scripts identify next actionable tasks by finding incomplete phases, checking dependency completion, and outputting “Next Actionable Tasks” lists.

Why This Matters: Prevents wasted effort on blocked tasks. Enables agents to self-direct.

4. Organized Specifications

AppSource Foundation: Base app specifications
Hero PTEs: Extension specs for each hero business
Each phase has markdown spec with business context and technical requirements

Real Results

114 implementation phases completed
Systematic progression through dependencies
Consistent architectural patterns across all apps
Complete traceability of what was built and when
Initial building pass completed in approximately 2 hours using parallel task execution

Phase 2: Reviewing – Comprehensive Quality Assurance

The Concept

Don’t review code line-by-line. Use pattern-based detection across 9 architectural categories to identify systematic improvements.

The Nine Review Dimensions

Naming – Object/field/file naming compliance with standards
Data Model – Enum vs Text, Setup tables vs hardcoded values
Business Logic – Magic numbers, hardcoded thresholds, business rules
Performance – SetLoadFields, SIFT, CalcFields optimization
Security – Permission sets, data classification, user access
Error Handling – ErrorInfo patterns, suggested actions, user guidance
Extensibility – Events, interfaces, labels for translation
UX – Page design, FactBoxes, promoted actions, user experience
Sustainability – Code comments, documentation, tooltips, maintainability

The Power of Pattern-Based Review

Traditional Approach: Read every line, review everything
Problem: Token explosion, context loss, inconsistent thoroughness

Pattern-Based Approach: Search for specific patterns
Benefits:

Scalable to large codebases
Consistent detection across modules
Token-efficient (grep searches, not full file reads)
Parallelizable across categories

Real Results

108 modules reviewed across 9 dimensions = 972 category reviews
Systematic pattern detection across entire codebase
Structured findings enabling targeted improvements
Complete audit trail of quality assessment

Phase 3: Improving – Systematic Quality Enhancement

The Concept

Use review findings to drive category-by-category improvements with clear tracking and logical commit boundaries.

Key Components

Improvement Instructions: Guidelines for applying fixes
Category-Level Progress: JSON tracking which categories are complete
Progress Reporting: Scripts showing completion statistics
Special Phases: Complex improvements need inventory and implementation guides

The Improvement Workflow

Select Category: Use progress script to identify next category
Read Findings: Review reports from Phase 2 for that category
Apply Fixes: Use file edit tools to implement improvements
Batch Commits: Logical groupings (e.g., all naming fixes, all performance optimizations)
Update Progress: Mark category completion in JSON
Verify: Quick build to ensure no breaking changes

Real Results

Systematic improvement across all quality dimensions
Clear commit history showing category-based improvements
Measurable progress tracking
No regressions due to structured approach

The Orchestration Pattern: Key Principles

1. Stateful Progress Tracking

Problem: Agents forget what’s been done between sessions
Solution: JSON files provide resumable, queryable state

2. Instruction-Driven Development

Problem: Agent drift and inconsistent decisions
Solution: Comprehensive markdown instructions minimize variations

3. Dependency Management

Problem: Agents work on blocked tasks
Solution: Scripts identify only unblocked, actionable work

4. Pattern-Based Detection

Problem: Line-by-line review doesn’t scale
Solution: Grep searches for specific patterns across entire codebase

5. Parallel Work Streams

Problem: Sequential work is slow
Solution: Independent categories can progress simultaneously

6. Structured Outputs

Problem: Hard to track what agents produced
Solution: Organized folders, consistent report formats, clear naming

Lessons Learned: What Works, What Doesn’t

✅ What Works

Strategic Model Selection

Use richer thinking models (Opus) for planning and instruction creation
Use faster models (Sonnet, Haiku) for mechanical implementation
Match model capability to task complexity
Parallel task execution when dependencies allow

Clear, Comprehensive Instructions

Agents need context about why, not just what
Hero business contexts prevent generic implementations
Architectural constraints keep implementations consistent

Machine-Readable Progress

JSON enables agents to self-direct
PowerShell scripts provide human-readable views
Clear state management prevents duplicate work

Pattern-Based Quality Checks

Grep searches scale better than full file reads
Consistent patterns enable systematic improvements
Category-based reviews are parallelizable

Logical Commit Boundaries

Category-based commits make sense in git history
Easy to review changes by quality dimension
Clear audit trail for improvements

❌ What Doesn’t Work

Vague Instructions

“Make it better” leads to arbitrary decisions
Agents need specific guidance and constraints
Generic patterns don’t tell business stories

Underestimating Model Gravity

Even crystal-clear instructions can be overridden by model tendencies
Single-pass corrections often revert in later sessions
Need validation layers to catch systematic drift
Different models have different “gravity wells” – what works with one may not work with another

Over-Ambitious Single Sessions

Trying to do too much causes context loss
Better to do one category thoroughly than many superficially
Break work into agent-sized chunks

Line-by-Line Code Review

Doesn’t scale to 700+ files
Token explosion kills context (and budgets!)
Pattern-based detection is more effective

Ignoring Dependencies

Wasted effort on blocked work
Rework when prerequisites change
Dependency-aware scheduling is essential

Real-World Application Beyond Business Central

These patterns apply to any large-scale software development:

Web Application Development

Building: Component-by-component implementation
Reviewing: Accessibility, performance, security patterns
Improving: Category-based enhancements

Infrastructure as Code

Building: Module-by-module Terraform/ARM templates
Reviewing: Security compliance, cost optimization, naming
Improving: Systematic hardening and optimization

API Development

Building: Endpoint-by-endpoint implementation
Reviewing: OpenAPI compliance, error handling, authentication
Improving: Documentation and response optimization

The Common Thread

Any project with:

Multiple modules with dependencies
Quality standards across dimensions
Need for systematic review and improvement

Getting Started with Long-Running Agents

Start Small

Pick ONE module or component
Write comprehensive instructions
Create simple JSON progress tracking
Have agent implement following instructions
Review and refine instructions

Scale Up

Add dependency management
Create progress reporting scripts
Expand to multiple modules
Introduce category-based reviews
Implement improvement workflows
Consider model selection strategy (planning vs execution models)
Leverage parallel task execution where dependencies allow

Refine

Watch for agent drift – strengthen instructions
Monitor for repeated questions – add to docs
Track quality issues – add to review categories
Observe model-specific behaviors – some things require multiple correction passes
Measure effectiveness – adjust workflows

Conclusion

What We’ve Accomplished

Built 38 applications (700+ files) using agentic orchestration
Validated the pattern for long-running AI-driven development
Created reusable workflows applicable to any large-scale project
Demonstrated practical AI beyond simple code completion

The Key Insight

AI agents are incredibly powerful when given:

✅ Clear instructions with business context
✅ Structured progress tracking
✅ Dependency-aware task management
✅ Pattern-based quality frameworks
✅ Systematic improvement workflows
✅ Validation checkpoints to counteract model gravity
✅ Strategic model selection (planning models vs execution models)
✅ Parallel task execution where appropriate

Without this structure, they drift. With it, they scale.

A note on tools and models: I used Claude Code for this project, leveraging different Claude models strategically – Opus 4.5/4.6 for planning and instruction creation, Sonnet or Haiku for mechanical implementation. This model selection strategy, combined with parallel task execution, enabled remarkably fast development (initial building pass in ~2 hours). Different AI coding assistants have different strengths – choose based on your project needs and be prepared to work with each model’s particular behaviors.

Next Steps for You

Try It: Start with one module and comprehensive instructions
Measure: Track what works and what causes drift
Iterate: Refine your orchestration based on results
Scale: Expand to larger projects with confidence

Want to See the Results?

Curious what this orchestration actually built? Check out The Pencil Sketch: Building the Destination Firs t to see the complete 38-application Business Central ecosystem that proves the vision.

Resources

NubimancyPencilSketch Repository: github.com/Nubimancy/PencilSketch
Orchestration Scripts: PowerShell examples available in project .docs/ folder
Instruction Templates: Build, Review, and Improve instruction patterns

Connect

Have questions about implementing these patterns? Want to share your own agentic orchestration experiences?

LinkedIn: jeremyvyska
BlueSky: @bc.jeremy.vyska.info
Blog: jeremyvyska.com (here)

This development approach was validated building the Nubimancy educational ecosystem – where fantasy storytelling meets real Business Central development. Learn more at nubimancy.com.

Introduction

Prerequisites

If You Want to Apply These Patterns:

Background Knowledge Helpful:

The Challenge: Building 38 Applications with AI

Project Scope

The Problem

The Model Gravity Challenge

The Model Selection Strategy

The Solution

Phase 1: Building – Generate the Implementation

The Concept

Key Components

1. Comprehensive Instructions

2. Machine-Readable Progress Tracking

3. Dependency-Aware Task Identification

4. Organized Specifications

Real Results

Phase 2: Reviewing – Comprehensive Quality Assurance

The Concept

The Nine Review Dimensions

The Power of Pattern-Based Review

Real Results

Phase 3: Improving – Systematic Quality Enhancement

The Concept

Key Components

The Improvement Workflow

Real Results

The Orchestration Pattern: Key Principles

1. Stateful Progress Tracking

2. Instruction-Driven Development

3. Dependency Management

4. Pattern-Based Detection

5. Parallel Work Streams

6. Structured Outputs

Lessons Learned: What Works, What Doesn’t

✅ What Works

❌ What Doesn’t Work

Real-World Application Beyond Business Central

Web Application Development

Infrastructure as Code

API Development

The Common Thread

Getting Started with Long-Running Agents

Start Small

Scale Up

Refine

Conclusion

What We’ve Accomplished

The Key Insight

Next Steps for You

Want to See the Results?

Resources

Connect

Leave a Comment Cancel Reply