Skip to content
Talk to an Engineer Dashboard

Exa

Connect to Exa for AI-powered semantic web search, content enrichment, finding similar pages, website crawling, direct answers, structured research, and large-scale URL discovery.

Connect to Exa for AI-powered semantic web search, content enrichment, finding similar pages, website crawling, direct answers, structured research, and large-scale URL discovery.

Exa logo

Supports authentication: API Key

Register your Exa API key with Scalekit so it can authenticate and proxy requests on behalf of your users. Unlike OAuth connectors, Exa uses API key authentication — there is no redirect URI or OAuth flow.

  1. Generate an Exa API key

    • Sign in to dashboard.exa.ai/api-keys. Under Management, click API Keys.

    • Click + Create Key, enter a name (e.g., Agent Auth), and confirm.

    • In the Secret Key column, click the eye icon to reveal the key and copy it. Store it somewhere safe — you will not be able to view it again.

    Exa dashboard API Keys page showing existing keys and the + Create Key button

  2. Create a connection in Scalekit

    • In Scalekit dashboard, go to Agent AuthCreate Connection. Find Exa and click Create.

    • Note the Connection name — you will use this as connection_name in your code (e.g., exa).

    Scalekit connection configuration for Exa showing the connection name and API Key authentication type

  3. Add a connected account

    Connected accounts link a specific user identifier in your system to an Exa API key. You can add them via the dashboard for testing, or via the Scalekit API in production.

    Via dashboard (for testing)

    • Open the connection you created and click the Connected Accounts tab → Add account.

    • Fill in:

      • Your User’s ID — a unique identifier for this user in your system (e.g., user_123)
      • API Key — the Exa API key you copied in step 1
    • Click Save.

    Add connected account form for Exa in Scalekit dashboard

    Via API (for production)

    scalekit_client.actions.upsert_connected_account(
    connection_name="exa",
    identifier="user_123", # your user's unique ID
    credentials={"api_key": "your-exa-api-key"}
    )

Once a connected account is set up, make API calls through the Scalekit proxy. Scalekit injects the Exa API key automatically — you never handle credentials in your application code.

import scalekit.client, os
from dotenv import load_dotenv
load_dotenv()
connection_name = "exa" # connection name from your Scalekit dashboard
identifier = "user_123" # your user's unique identifier
# Get your credentials from app.scalekit.com → Developers → Settings → API Credentials
scalekit_client = scalekit.client.ScalekitClient(
client_id=os.getenv("SCALEKIT_CLIENT_ID"),
client_secret=os.getenv("SCALEKIT_CLIENT_SECRET"),
env_url=os.getenv("SCALEKIT_ENV_URL"),
)
actions = scalekit_client.actions
# Make a request via Scalekit proxy — no API key needed here
result = actions.request(
connection_name=connection_name,
identifier=identifier,
path="/search",
method="POST",
json={"query": "LLM observability tools 2025", "num_results": 5}
)
print(result)

Search the web with a semantic or keyword query. Returns ranked results with titles, URLs, published dates, and optional full-page content. Semantic search (type: neural) finds conceptually similar pages even when your exact words don’t appear in the source. Keyword search (type: keyword) uses traditional exact-match scoring.

Credits: 1 credit per request + 1 credit per result when contents.text or contents.summary is requested.

NameTypeRequiredDescription
querystringYesSearch query — phrase it as a statement or question for best semantic results
num_resultsintegerNoNumber of results to return (default 10, max 100)
typestringNoSearch mode: neural for semantic similarity (default), keyword for exact-match
categorystringNoFilter by content type: company, research paper, news, github, tweet, personal site, pdf, linkedin profile
include_domainsarrayNoRestrict results to these domains (e.g., ["techcrunch.com", "arxiv.org"])
exclude_domainsarrayNoExclude results from these domains
start_published_datestringNoISO 8601 date — only return pages published after this date (e.g., 2025-01-01)
end_published_datestringNoISO 8601 date — only return pages published before this date
start_crawl_datestringNoISO 8601 date — only return pages crawled by Exa after this date
end_crawl_datestringNoISO 8601 date — only return pages crawled by Exa before this date
include_textarrayNoStrings that must appear in the page text
exclude_textarrayNoStrings that must not appear in the page text
locationstringNoISO country code for location-biased results (e.g., US, GB, DE)
contents.textbooleanNoInclude full page text in each result (costs 1 extra credit per result)
contents.highlightsobjectNoInclude highlighted snippets — set num_sentences and highlights_per_url
contents.summaryobjectNoInclude an AI-generated page summary — optionally set query to focus the summary

Find web pages that are semantically similar to a given URL. Exa uses its neural index to surface pages with matching content, tone, and topic — useful for competitive research, discovering similar products, or finding alternative sources on a topic.

Credits: 1 credit per request + 1 credit per result when contents.text or contents.summary is requested.

NameTypeRequiredDescription
urlstringYesURL to find similar pages for (e.g., https://example.com/product)
num_resultsintegerNoNumber of results to return (default 10, max 100)
include_domainsarrayNoRestrict results to these domains (e.g., ["techcrunch.com", "wired.com"])
exclude_domainsarrayNoExclude results from these domains
start_published_datestringNoISO 8601 date — only return pages published after this date
end_published_datestringNoISO 8601 date — only return pages published before this date
start_crawl_datestringNoISO 8601 date — only return pages crawled by Exa after this date
end_crawl_datestringNoISO 8601 date — only return pages crawled by Exa before this date
contents.textbooleanNoInclude full page text in each result (costs 1 extra credit per result)
contents.highlightsobjectNoInclude highlighted snippets — set num_sentences and highlights_per_url
contents.summaryobjectNoInclude an AI-generated page summary — optionally set query to focus the summary

Crawl all pages of a website starting from a given URL and extract their text content. Follows internal links up to the configured depth. Use this to index documentation sites, company knowledge bases, or competitor content for downstream processing.

Credits: Credits vary based on the number of pages crawled and content extracted.

NameTypeRequiredDescription
urlstringYesStarting URL to crawl (e.g., https://docs.example.com)
max_depthintegerNoMaximum number of link hops to follow from the starting URL (default 1)
max_pagesintegerNoMaximum number of pages to retrieve (default 100) — set this to control credit usage
include_subdomainsbooleanNoWhether to follow links to subdomains of the starting URL (default false)
exclude_pathsarrayNoURL path patterns to skip (e.g., ["/archive/*", "/blog/*"])

Retrieve enriched content — full text, highlighted snippets, or AI-generated summaries — for one or more specific URLs without running a search. Use this when you already have URLs (from a prior search, a CRM, or an external list) and want to extract structured content from them.

Credits: 1 credit per URL + 1 credit per content item retrieved (text, highlights, or summary each count separately).

NameTypeRequiredDescription
urlsarrayYes*List of URLs to retrieve content for (e.g., ["https://example.com/about"])
idsarrayYes*List of Exa result IDs from a prior exa_search or exa_find_similar call — use instead of urls when you have Exa IDs
textbooleanNoInclude full page text for each URL (costs 1 extra credit per page)
text.max_charactersintegerNoTruncate page text to this many characters
text.include_html_tagsbooleanNoPreserve HTML tags in the returned text — helps LLMs interpret page structure
highlightsobjectNoExtract relevant text snippets from each page
highlights.querystringNoCustom query to guide which snippets are highlighted
summaryobjectNoGenerate an AI summary for each page (costs 1 extra credit per page)
summary.querystringNoFocus the summary on a specific question or topic
summary.schemaobjectNoJSON schema for structured summary output
subpagesintegerNoNumber of subpages to also crawl from each URL (default 0)
subpage_targetstringNoKeyword or path pattern to target specific subpages (e.g., "pricing")
max_age_hoursintegerNoMaximum age of cached content in hours — set a lower value to force a fresher crawl

*Provide either urls or ids — at least one is required.

Get a concise natural language answer to a question, synthesized from live web search results. Returns the answer text and a list of source URLs used to generate it. Use this when you need a direct factual response rather than a list of links.

Credits: Credits vary based on the number of sources retrieved.

NameTypeRequiredDescription
querystringYesThe question to answer (e.g., "What are the rate limits for GPT-4o?")
num_resultsintegerNoNumber of web sources used to generate the answer (default 5)
textbooleanNoInclude the source text snippets alongside the answer
include_domainsarrayNoRestrict source lookup to these domains
exclude_domainsarrayNoExclude these domains from the source lookup
start_published_datestringNoISO 8601 date — only use sources published after this date
end_published_datestringNoISO 8601 date — only use sources published before this date

Run in-depth, multi-angle research on a topic. Exa decomposes the topic into multiple sub-queries, runs each in parallel, and synthesizes a structured output. Optionally provide a JSON schema to control the output shape. Use this for competitive intelligence, market research, or any question that requires aggregating information across many sources.

Credits: Significantly more expensive than exa_search — cost scales with num_subqueries. Each sub-query consumes search and content retrieval credits.

NameTypeRequiredDescription
topicstringYesResearch topic or question (e.g., "Competitive landscape of AI coding assistants in 2025")
num_subqueriesintegerNoNumber of parallel sub-queries to run (default 5, max 15) — higher values improve coverage but increase cost
output_schemaobjectNoJSON schema defining the structure of the output — omit for free-form markdown output
include_domainsarrayNoRestrict all sub-query sources to these domains

Execute a complex natural language query designed to return a large set of matching URLs — potentially thousands. Each result is vetted against your criteria before being returned. Use this for bulk data collection tasks such as building lead lists, aggregating industry news, or sourcing research datasets.

Credits: Credits are consumed per vetted result. Cost scales with count.

NameTypeRequiredDescription
querystringYesNatural language description of the set to find (e.g., "YC-backed AI startups founded after 2022")
countintegerNoTarget number of matching URLs to return (default 10, can be thousands)
entity_typestringNoType of entity to find: company, person, article, research paper, job listing, event, product
criteriaarrayNoAdditional filter criteria — each item has a description and options.type (must, should, or must_not)
include_domainsarrayNoRestrict results to these domains (e.g., ["linkedin.com", "crunchbase.com"])