A "source" in Atlas is one external website you're indexing. Usually one root URL like https://docs.acme.com or https://blog.acme.com. Each source is team-wide; per-agent toggles control which agents can read it.
Before you start
You'll need:
- A root URL of a site you control or have permission to index. Atlas honors
robots.txt, but you should still only point it at sites where it's appropriate to do so. - An AI agent created in your team (Settings → Agent). Atlas is per-agent, so you turn it on for the agent that should be able to read these pages.
Step-by-step
- Open Settings → Agent → Abilities.
- Toggle Atlas access ON for the agent.
- Click Add your first source (or Add Atlas source if you already have some).
- Step 1 of the modal, Form: enter a name (whatever you'll recognize it by, like "Acme Docs" or "Company Blog") and the root URL. Optionally check Render JavaScript if the site is a single-page app where the content only appears after JS runs.
- Step 2, Preview: Atlas runs sitemap discovery and shows you how many URLs it found and via which method. If the count looks wildly off (too high, too low, or zero), you can cancel here and adjust your root URL or exclude patterns before committing.
- Step 3, Charting: Atlas works through the URLs in batches. You'll see a live progress bar and the last few completed URLs. You can cancel at any point.
- Done: the source flips to status Ready and shows up in the per-agent toggle list.
Sitemap discovery, in order
Atlas tries each method until one succeeds:
| Method | What it does |
|---|---|
robots.txt |
Reads the Sitemap: directive. Most authoritative. |
sitemap_index.xml |
A sitemap of sitemaps. Common on WordPress / Yoast / large sites. |
sitemap.xml |
Plain sitemap at the root. |
wp-sitemap.xml |
WordPress core's built-in sitemap (WP 5.5+). |
| Homepage crawl | Last resort: scrape links from the homepage. Lower coverage, only used when no sitemap exists at all. |
The preview screen tells you which method won so you can sanity-check the result.
JavaScript rendering
Most marketing sites and docs sites are server-rendered or pre-rendered, so the default plain-fetch mode works fine and is fast. Turn on Render JavaScript when:
- The site is a SPA (React/Vue/Svelte/etc.) with no pre-rendering
- Pages return a near-empty HTML shell when you
curlthem - Atlas charts a page successfully but the resulting Markdown is just nav and footer, with no main content
JS rendering uses a headless browser, which is 5–10× slower per page and consumes more resources. Don't turn it on by default; turn it on when you actually need it.
Per-agent toggles
Once a source exists, every agent on the team sees it in the per-agent list with a toggle. Sources are shared infrastructure, but access is per agent. This lets you:
- Add a single "Public Docs" source and enable it for your customer-facing support agent
- Add a separate "Internal Runbook" source and enable it only for an internal agent
- Disable a source without deleting it (toggle off; the source remains for other agents)
Re-charting
Source content goes stale when the underlying site updates. Re-chart from the source row's Re-chart button to refresh:
- The whole source (re-discovers sitemap, refetches every page)
- A single page (use the per-page Re-chart in the Pages modal. Useful when a single doc was just edited.)
We're working on scheduled auto-re-charts; for now, re-chart manually when you ship docs changes.
Hidden pages
Sometimes a chart succeeds for a URL but the result isn't useful. A 404 page, a redirect that landed somewhere generic, a page where the main content is JS-only and you haven't enabled JS rendering yet. Open the Pages modal on the source and click the eye icon to hide a page from the agent without deleting it. Hidden pages persist across re-charts.
Next steps
- Atlas overview (what Atlas is, vs the Library)
- Troubleshooting Atlas (common issues)