- Archive
- /
- Foreman Assistant
- /
- AI Chat Usage, Limits, and Costs
AI Chat Usage, Limits, and Costs
Learn how AI Chat usage is measured, how token limits and costs work by tier, and where to view your monthly consumption and budget details
The token budget
Every AI reply consumes tokens — the unit of text the underlying model (Claude) reads and writes. Foreman tracks your consumption per-message and totals it monthly per user. When you hit your tenant's monthly token cap, further messages are blocked until the first of the next month.
Each usage record includes:
- Input tokens — your message plus all cached context (system prompt, tools, recent history).
- Output tokens — what the AI wrote back.
- Cache creation & read tokens — separate counters for prompt caching (much cheaper per-token).
- Tool call count — how many tools the AI invoked.
- Round-trip count — how many times the model was re-prompted with tool results.
- Latency — end-to-end wall clock time.
- Estimated cost — USD calculated from Anthropic list prices at the time of the call.
Monthly limits by tier
| Tier | Monthly token limit | Notes |
|---|---|---|
| Business | See Billing → Plans | Shared pool across all users in the tenant. |
| Enterprise | Configurable per-tenant | Contact sales for custom pools. |
| Legacy tiers | None — AI Chat is not available. | Upgrade to Business to enable. |
Your organisation admin can also set a custom override on your tenant that replaces the tier default (in either direction). Check Organization → AI Chat if you want to confirm.
Where to see your usage
Three places:
- Composer footer — the small line below the input ("1.2M / 5M tokens this month") shows live monthly consumption. Turns red when you cross 90%.
- Sidebar footer (widget) — same info plus your tier badge.
- AI Usage page — accessible from the sidebar (AI & API → AI Usage). This page has:
- Today / this month rollups (messages, input tokens, output tokens, cost USD).
- Daily timeline chart for the last 30 days.
- Top sessions by message count for the month.
- Your tier and budget status.
Rate limiting behaviour
When a message would exceed the monthly pool:
- The server returns HTTP 429 with an
error: rate_limitbody. - The composer disables the Send button and shows "Limit reached" as the placeholder.
- A yellow banner appears in the thread: "Monthly token budget exhausted. Resets on the 1st."
- Existing conversations stay readable; you just can't add new messages.
The monthly counter resets at 00:00 UTC on the 1st of each month.
Prompt caching
Foreman Assistant uses Anthropic's prompt caching to keep costs down on long conversations:
- The system prompt is cached.
- The full tool definition block is cached (tools + descriptions + schemas can run to ~20K tokens).
- Cache hits cost ~10% of a normal input token; cache writes cost ~25% more than a normal input token.
You'll see cache token counts in your usage records — a healthy session has the bulk of its input tokens as cache reads after the first turn.
Admin kill switches
System admins can disable AI Chat at three levels:
| Level | Where | Effect |
|---|---|---|
| Global | POST /api/foreman-assistant/admin/toggle |
Disables AI for everyone in the instance. |
| Per tenant | PUT /api/foreman-assistant/admin/overrides/tenant/{id} |
Force-enable (override free tier) or force-disable. |
| Per user | PUT /api/foreman-assistant/admin/overrides/user/{id} |
Same, per-user precedence. |
The resolution order is user → tenant → global. The most specific override wins.
When AI Chat is disabled for you (any level), the widget shows a "Temporarily Unavailable" state and the full-page displays the same. Messages can't be sent.
If AI Chat suddenly disappears from the sidebar, check with your org admin — they may have temporarily disabled the module via Organization → Feature Modules, or a system admin may have flipped the global switch.
What counts toward your budget
Every message you send costs tokens, including messages that hit errors, tool failures, and the AI's "I can't do that" replies. A few specifics:
- Regenerating a reply ("Try again") incurs a new full-cost turn — you're re-running the model.
- Editing a user message does the same — the new sibling triggers a fresh run.
- Stopping a reply mid-stream still charges for whatever the model wrote before you hit Stop.
- Loading a past conversation does not cost tokens. Only new messages do.
- Browsing, searching, pinning, archiving sessions costs nothing.
Storage limits (attachments)
Chat attachments count against your tenant's shared storage quota, not the token budget. See Attachments in AI Chat for the per-attachment size caps and how storage is accounted.
Reducing costs
A few practical tips:
- Use
/exportand paste into an external tool for long analysis sessions — you pay for each turn here. - Prefer artifacts over inline dumps — large inline tables re-enter the input on every subsequent turn.
- Keep sessions focused — start a new chat for unrelated topics. Claude's context window grows as conversations lengthen, and long histories get expensive.
- Archive or delete dead conversations — not for cost directly, but to keep your sidebar manageable.
- Set up MCP in an external client for heavy coding work — your Anthropic / host budget, not your Foreman tenant's.
Troubleshooting
- "Monthly token budget exhausted" — you've hit the cap. Wait until the 1st or ask your admin for an override.
- Cost seems higher than expected — check AI Usage for the session breakdown. Tool-heavy sessions or very long threads are typically the culprit.
- Usage counter seems stuck — the composer footer updates on the
[USAGE]SSE event after each reply. If it lags, refresh the page to re-fetch from/api/foreman-assistant/usage.
See Also
- Foreman Assistant Overview — where these limits fit in.
- Attachments in AI Chat — storage quota (separate from token budget).
- AI Tools & MCP Integration — tool calls and their token impact.
- Billing & Plans — which tier includes AI Chat.