AI / Capabilities
The next shift in enterprise AI is from systems that answer to systems that act. Agentic AI plans, calls tools, checks its own work, and corrects when it's wrong. Done right, it moves work that used to require multiple analysts into a single orchestrated flow. Done wrong, it automates the wrong thing at scale. We build agents bounded by evaluation harnesses, policy guardrails, and observability so ambition doesn't outrun accountability.
Agentic AI is the frontier, and the place most enterprise AI projects go off the rails. The technology is real: 35% of global insurers are expected to deploy AI agents across three or more functions by the end of 2026. The opportunity is real: multi-step workflows that used to need a team now collapse into a single orchestrated agent run. But unbounded agents are a liability in regulated industries. We build agents the way you'd build a new team member, with a defined scope, clear tools, a training manual, a review process, and a way to fire the one that isn't working.
Four phases. The boundary work happens before the autonomy work.
We map the multi-step workflow you want to collapse. Which tools does the agent need? Which decisions are it allowed to make? Which require human review? Which would cause a compliance finding if the agent got them wrong? We define the policy surface before we define the planner.
A six-week pilot on a bounded multi-step task, typically research-and-report, triage-and-route, or monitor-and-act. We build the planner, the tool integrations, the checker, and the eval harness together. Pass/fail criteria on task completion, tool accuracy, policy adherence, and escalation quality.
We engineer the eval harness at production scale, red-teaming, adversarial test sets, drift monitoring, and behavioral audits. For regulated workflows, we document human-oversight triggers against OJK April 2025 requirements, UU PDP access scoping, and sector-specific audit needs.
Handover. Your team gets the agent runbook, the eval dashboards, the policy-rule repository (versioned), and the human-review queue. New tasks added as new agents, not larger scope on the same one. Agents are small, focused, and observable, or they're not agents, they're problems.
Four disciplines that together turn "autonomous AI" from a liability into a production system you can run.
The planner is where an agent earns its keep. We design decomposition strategies that break complex tasks into tool-callable steps, with fallback plans and re-planning paths when intermediate results change the picture.
Agents without good tools are just chat interfaces. We wire agents to your real backend systems, internal APIs, databases, external services, human-review queues, with observability, retries, and failure handling at every call.
The difference between demo-agent and production-agent. We build test sets, adversarial red-team suites, task-completion benchmarks, and drift-monitoring pipelines that catch regression before it reaches your customers or your regulator.
Explicit, named, versioned rules that define what the agent may and may not do, with logged triggers when boundaries are approached. For OJK-regulated contexts, human-oversight handoffs are designed first, not retrofitted.
One agentic deployment we've shipped, plus the market shape for enterprises evaluating agents right now.
We built GIS-based decision-intelligence agents for telecom infrastructure and network operations. Agents ingest spatial data, run multi-step analysis across infrastructure and capacity, and recommend deployment actions in real time, replacing a multi-day manual review cycle. Deployed as Setara.
35% of global insurers are expected to deploy AI agents across three or more claims, underwriting, or service functions by the end of 2026. Indonesia is an early market, which is exactly where the first-mover advantage lives. Unbounded deployment is where most of them will fail.
OJK's April 2025 AI Governance Guidance specifies human-oversight requirements for high-risk automated decisions in financial services, credit, claims denial, fraud flagging, AML classification. For agentic systems in these domains, policy guardrails are not optional.

Why the production pattern that's actually working is many specialized agents, each with a bounded tool set and clear policy, rather than one generalist agent with everything in scope.

What the April 2025 guidance means for agent architecture, versioned policy rules, human-oversight triggers, and the audit trail that turns autonomy into a regulator-defensible capability.

Task benchmarks, adversarial red-teaming, and drift monitoring as a continuous-integration pattern, not a one-time pre-launch check. The engineering discipline that turns demo-agents into production systems.
Tell us the workflow that currently needs three analysts and a spreadsheet, triage-and-route, research-and-report, monitor-and-act, reconcile-and-flag. We'll scope a six-week agent pilot with explicit tool boundaries, policy rules, and an evaluation harness.
Start a project