01 · The shift from search to answers
Search has split. Roughly half of all consumer search queries now go to AI answer engines (ChatGPT, Perplexity, Claude, Google's AI Mode, Bing Copilot) instead of the classic ten blue links. The behaviour is different, the funnel is different, and the optimization moves are different.
When a user types a question into a generative engine, the engine does three things in sequence:
- Decomposes the question into sub-queries
- Retrieves a small set of candidate documents per sub-query
- Synthesises an answer, citing the documents it leaned on
The traffic prize is no longer rank #1. The prize is being one of the 3-7 documents the engine cites. That is a binary outcome: cited or not cited. There is no second page.
This paper is about how to engineer your site so that you are in the cited set, repeatedly, for the question types your buyers ask.
02 · How AI engines decide what to cite
AI engines do not read websites the way a human reader does. They do not bounce, they do not scroll, they do not follow side links. They see a structured representation of a page — typically a markdown-or-text version after JavaScript has run — and they look for three things:
1. Direct answers to the sub-query. A passage that answers the exact question the engine is trying to ground. The closer your sentence is to the way the user asked the question, the higher the citation probability.
2. Trust signals. Author bylines, organisation schema, citations to primary sources, plain dates. Engines have learned to down-weight pages with no author, no date, no source list.
3. Structural clarity. H1 / H2 hierarchy that matches the page's argument. Lists where lists make sense. Tables where tables make sense. The structure tells the model what to extract.
A blog post that hides the answer in paragraph six, with no bylines and no schema, will lose the citation slot to a competitor's clearly-structured page even if your post is better-written.
03 · Technical foundations
GEO is mostly accessibility, structured data, and clear writing. If you are already doing those well, you are 70% of the way there.
Server-side rendering or pre-rendering
Most engines do not execute JavaScript when fetching pages. SPAs that render content client-side are essentially invisible. Use SSR (Next.js, Astro, Nuxt) or pre-rendering. Verify by viewing the raw HTML of your top pages — the answer copy must be in the source, not painted in by JS.
Schema.org structured data
Minimum useful set:
Organizationon the homepageArticleorBlogPostingon every editorial pageProducton every commerce pageFAQPageon Q&A blocksBreadcrumbListsite-wideServiceon every service page
Headings, lists, and tables
One H1 per page. H2s for major sections. H3s for sub-sections. Lists for parallel items. Tables for comparison. Engines extract on these boundaries.
Author and date metadata
Visible byline, visible publish date, visible last-updated date. Behind the scenes: author and datePublished in schema. Pages with neither are 30-40% less likely to be cited (Brightedge, 2025).
04 · Content patterns that work
The five content patterns AI engines lean on most heavily:
Direct-answer paragraphs. A 30-60 word answer placed within the first 200 words of the page, followed by a longer explanation. The first paragraph does the citation work.
Numbered or bulleted lists. When the answer is a sequence (steps, options, criteria), give the engine a list. Do not bury the list inside prose.
Comparison tables. When the user is comparing two or more things, a table is far easier for the engine to extract than a paragraph that says "X is Y but Z is W".
Definitions with examples. "X is Y. For example, A and B." The engine cites the definition; the example is the proof.
Original numbers and case data. Numbers from your own data — "In our 2025 audit of 47 K-12 sites, 71% had broken H1 hierarchy" — are a strong citation magnet. Engines prefer original sources to summaries of summaries.
What to avoid: walls of text, sales-y copy with no factual claim, content that requires an account or paywall to read, stock-photo-heavy pages with sparse copy.
05 · Measurement & monitoring
GEO has poor analytics. None of the major engines pass referral traffic the way Google does. You will not see "Perplexity" as a source in GA4 today. The proxy metrics that work:
- Direct-traffic uplift to the pages you GEO-optimised, year over year. Most AI-driven traffic lands as direct.
- Brand-mention monitoring. Tools like Profound, AthenaHQ, and Otterly run prompt suites against the engines and flag whether your brand appears in answers.
- Server-side log analysis. AI engines crawl with identifiable user agents. Track GPTBot, ClaudeBot, PerplexityBot in your access logs.
- Manual prompt testing. Once a month, run your top 20 buyer questions through ChatGPT, Perplexity, and Google AI Mode. Note when you appear and when a competitor does instead.
Do not chase a single dashboard number. Treat it the way SEO teams treated rank tracking pre-2010: directional, not absolute.
06 · The 30-day audit checklist
Run this against your top 20 commercial pages. Score each item Yes / No / Partial.
Week 1 — technical foundations
- Page renders on the server (view-source contains the answer copy)
- One H1 per page, sensible H2 / H3 hierarchy
Organizationschema on homepageArticleorServiceschema on the pageBreadcrumbListschema site-wide- Visible author byline and publish date
- Last-updated date when content changed
Week 2 — content patterns
- Direct-answer paragraph in the first 200 words
- At least one bulleted or numbered list when the answer is a sequence
- A comparison table when the user is comparing options
- Original numbers, citations, or case-data
- No content gated behind a form for top-of-funnel queries
Week 3 — trust and authority
- Author has a real bio page on your site
- Author bio links to LinkedIn or other public profile
- The page cites primary sources (not just other blog posts)
- The page has internal links to related deeper content
Week 4 — monitoring setup
- Manual prompt sweep on 20 buyer questions, monthly
- Server logs filtered for GPTBot / ClaudeBot / PerplexityBot
- Direct-traffic baseline set in GA4 for the optimised pages
- One brand-mention tool (Profound, AthenaHQ, Otterly) running
07 · Common mistakes
Patterns we see repeatedly in audits.
Treating GEO as a separate channel. It is not. The same page serves Google, ChatGPT, Perplexity, and Bing. You are not building a different page for each engine; you are building one well-structured page they all agree on.
Schema soup. Adding every schema type without thinking. Engines down-weight pages with mismatched or contradictory schema (e.g. Product schema on a blog post). Pick the right one and fill it in completely.
Hiding answers behind forms. A page that asks for an email before it shows the answer is invisible to AI engines. Run the gated content separately, or move the answer above the form and gate the deeper material.
Ignoring accessibility. WCAG 2.2 AA is largely the same checklist as GEO foundations: alt text, headings, contrast, semantic HTML. Teams that have shipped accessibility already have most of the work done.
Setting up dashboards before content. A monitoring dashboard with no optimised pages to monitor is a waste. Optimise first, then measure.
08 · About the authors
Shaili Gupta is President at OpenSource Technologies. Shaili has run the GEO programme for OST clients in K-12, government, and ecommerce since 2024.
Manish Mittal is CEO at OpenSource Technologies and a Forbes Technology Council member. Manish has been working on accessibility-driven SEO since 2011 and led the technical foundation work behind this playbook.
This paper is published under a permissive license — quote it, cite it, share it. If you find a mistake or want to add a case, email shaili@ost.agency.
Want the full 24-page paper?
Download as PDF or talk to the lead author.