SEO & AI Engine Optimization Framework · May 2026

Keyword Research: query intent, clustering, gap analysis

A comprehensive installation and audit reference for keyword research and search intent analysis — the discipline of identifying what users search for, what they want when they search, and how to map…

Query Discovery, Intent Classification, SERP Analysis, Keyword Mapping, and Cannibalization Resolution

A comprehensive installation and audit reference for keyword research and search intent analysis — the discipline of identifying what users search for, what they want when they search, and how to map those queries to specific pages on a site. This is the planning layer that drives content strategy, internal linking, and topical clustering. Dual-purpose: installation manual and audit document.


1. Document Purpose

This is the canonical reference for keyword research and search intent. Every other framework in this library assumes the site is targeting the right queries with the right pages. This document specifies how to figure out what those queries should be, what users want when they search them, and how to organize site content around them without cannibalization.

In 2026, traditional keyword research has evolved. Search volume tools have always been approximations; they're now more imperfect with AI assistant search siphoning queries away from traditional engines. Search intent analysis has become more important as Google's understanding of intent matures — ranking for a query you don't deserve is harder than ever, while ranking for queries that genuinely match your content has gotten more straightforward. Topic-based content (covering the full intent landscape around a topic) outperforms keyword-targeted content (single-keyword optimization).

1.1 Required Tools


2. Client Variables Intake

business_type: ""
primary_audience: ""
geographic_market: ""                # local, regional, national, international
service_or_product_categories: []
competitor_domains: []
existing_keyword_targets: []         # If any documented keyword strategy exists
existing_content_inventory_url: ""   # Sitemap or content list
average_monthly_search_traffic: 0    # From GSC if available
top_landing_pages: []                # Already-ranking pages
known_cannibalization_issues: []

3. Search Intent Classification

The most important keyword research framework: understanding what users want when they search.

3.1 Four Primary Intent Types

Informational: User wants to learn something

Navigational: User is looking for a specific website or page

Commercial Investigation: User is researching before purchase

Transactional: User is ready to act

3.2 Modern Intent Subtypes

Beyond the four primary types:

Local intent: User wants location-specific results (covered in framework-localseo.md)

Visual intent: User wants images or visual results

Video intent: User wants video content

News intent: User wants recent news

Question intent: User asking a specific question

3.3 Intent Verification via SERP Analysis

The fastest way to verify intent: search the query and observe what Google ranks.

SERP signals:

Reading the SERP:

If you search "best web hosting" and the top 10 results are 8 listicle articles + 2 forum threads, the intent is research/comparison. Don't try to rank a sales page for this query — Google ranks listicles because that's what users want.

If you search "buy hosting" and the top 10 are mostly product pages, the intent is transactional. Don't try to rank a comparison article.

Match content type to ranking content type.

3.4 Mixed Intent

Some queries have multiple legitimate intents. Google often shows mixed results:

"Web hosting" might show:

For mixed intent queries, decide which intent your content addresses and accept you'll only capture that share.


4. Keyword Discovery Methodology

4.1 Seed Keyword Generation

Start with seed keywords — the obvious terms for your business:

seed_keywords_for_thatdeveloperguy:
  service_seeds:
    - "web development"
    - "SEO services"
    - "AEO services"
    - "AI search optimization"
    - "computer repair"
    - "website hosting"
    - "WordPress development"
  audience_seeds:
    - "small business website"
    - "small business SEO"
  geographic_seeds:
    - "Cassville web developer"
    - "Missouri web design"
    - "SDVOSB web development"
  problem_seeds:
    - "website not ranking"
    - "computer slow"
    - "WordPress site hacked"

4.2 Keyword Expansion

For each seed, expand using:

Tool-based expansion:

SERP-based expansion:

Competitor-based expansion:

Customer-based expansion:

4.3 Long-Tail Strategy

Long-tail keywords (3+ words, lower volume each) collectively drive significant traffic with less competition.

Pattern: A site might rank for 1,000 long-tail keywords driving 5 visits each = 5,000 monthly visits. That's often more achievable than ranking for one head term driving 5,000 visits.

Long-tail discovery:

4.4 Branded Keyword Research

Track searches for your brand and variants:

Branded queries are high-conversion and easy to rank for. Make sure your brand pages capture them all.

4.5 Question Keyword Research

Questions drive featured snippets, AI Overviews, and PAA boxes. Specifically research:

For each topic in your content strategy, generate the question variants and ensure content addresses them.


5. Search Volume Realities

5.1 Search Volume Tools Are Approximations

Different tools report different numbers for the same keyword. This is normal — they sample differently and process data differently. Don't fixate on exact numbers.

Use volume directionally:

5.2 Volume Decline in AI Era

Search volume reported by traditional tools may overstate actual organic search volume in 2026:

Don't optimize purely for search volume. Combine with:

5.3 Keyword Difficulty

Tools provide difficulty scores (typically 0-100) estimating how hard ranking will be.

Reality check: Difficulty scores are based on backlink profiles of current rankers. They don't account for:

A "difficulty 60" keyword might be easy if the current rankers all have weak content and your topical authority is established. A "difficulty 30" might be hard if it requires specialized expertise the existing rankers all have.

Use difficulty as one signal, not the deciding factor.


6. Topic-Based Content Strategy

Modern SEO favors topic clusters over individual keyword targeting.

6.1 The Topic Cluster Approach

Instead of: "Write article targeting keyword X"

Do: "Build a comprehensive resource on topic X, covering all related queries"

A single 3,000-word comprehensive article on "schema markup" can rank for hundreds of related queries:

This is more efficient than writing dozens of individual keyword-targeted articles.

6.2 Topic Identification

For your business, identify topics where you have or can develop authority:

For ThatDeveloperGuy:

Each topic becomes a topical cluster (see framework-internallinking.md Section 6).

6.3 Topic Coverage Planning

For each topic:

  1. Pillar content — comprehensive overview (3,000-10,000 words)
  2. Supporting articles — deep dives on specific subtopics (1,500-3,000 words each)
  3. Question articles — addressing specific questions (500-1,500 words)
  4. Comparison articles — comparing options within topic
  5. Buying guides — for purchase-related topics
  6. Case studies — real-world examples
  7. Original research — proprietary data or analysis

Each addresses different search intents and captures different query patterns.


7. Keyword Mapping

Keyword mapping assigns specific queries to specific pages — preventing cannibalization and ensuring intent matching.

7.1 The Mapping Process

For each significant page on the site:

page_keyword_map:
  url: "https://example.com/services/web-development/"
  primary_keyword: "web development services"
  secondary_keywords:
    - "custom website development"
    - "small business website development"
  long_tail_keywords:
    - "web development services for small business"
    - "custom WordPress development"
    - "responsive website design"
  intent: "transactional"
  page_type: "service page"

For new content:

planned_content:
  topic: "Difference between SEO, AEO, and GEO"
  primary_keyword: "SEO vs AEO vs GEO"
  secondary_keywords:
    - "what is AEO"
    - "answer engine optimization explained"
    - "AEO vs traditional SEO"
  intent: "informational"
  content_type: "explainer article"
  word_count_target: 2500
  cluster: "AI search optimization"
  hub_page: "/topics/ai-search-optimization/"

7.2 Mapping Discipline

One primary keyword per page. The page is "about" this query primarily.

Multiple secondary keywords per page. The same page can rank for related queries.

No duplicate primary keywords across pages. This is cannibalization (Section 8).

Map systematically across the site. Don't have keywords floating without page assignments.

7.3 Mapping for Existing Sites

For sites with existing content:

  1. Inventory all pages
  2. For each page, identify the primary keyword it's currently ranking for (from GSC)
  3. Document this in mapping spreadsheet
  4. Identify gaps (queries you should rank for but no page targets)
  5. Identify duplicates (multiple pages targeting same query → cannibalization)

8. Keyword Cannibalization

Cannibalization occurs when multiple pages on the same site target the same query — splitting ranking signals and confusing Google about which page should rank.

8.1 Detection

GSC method:

  1. Performance report
  2. Filter by specific query
  3. View "Pages" tab
  4. If multiple URLs appear with significant impressions, cannibalization exists

Site search method:

Tool method:

8.2 Resolution Strategies

Strategy 1: Consolidate

Merge multiple pages into one comprehensive page. 301 redirect the others to the merged page. Best when:

Strategy 2: Differentiate

Restructure pages to target different intents or angles:

Each becomes the canonical for its specific intent.

Strategy 3: Internal link prioritization

Choose one page as primary. Internal link from other pages to the primary. Add canonicals from secondary pages to primary if they're truly the same.

Strategy 4: Noindex secondary

If secondary pages have minimal value, noindex them. Eliminates cannibalization without removal.

8.3 Prevention


9. SERP Feature Targeting

Beyond standard organic ranking, SERP features capture additional visibility:

9.1 Featured Snippets (Position Zero)

The single result displayed at top of SERP with extracted content from a ranking page.

Snippet types:

Optimization:

9.2 People Also Ask

Question boxes with expandable answers.

Optimization:

9.3 AI Overviews

AI-generated synthesis answers above traditional results.

Optimization: See framework-aicitations.md for comprehensive AI citation strategy. Key points:

9.4 Image Pack

Image carousel results.

Optimization: See framework-imageseo.md (when built). Quick points:

9.5 Video Carousel

Video results.

Optimization: See framework-videoseo.md (when built). Quick points:

9.6 Local Pack

Local business results.

Optimization: Comprehensive coverage in framework-localseo.md.


10. Semantic Keyword Clustering Tools

Traditional keyword research treats every query as a separate target. Semantic clustering groups queries that share intent, so a single page can cover a cluster instead of a single keyword. In an AI Overview era where Google interprets queries as topics rather than strings, clustering is increasingly the primary research output.

10.1 Tool Inventory

SurferSEO Topical Maps

SurferSEO Topical Maps was redesigned in mid-2025 to ingest a seed topic and return a multi-level tree: pillar topic at the root, subtopics as branches, article-level targets as leaves. Each leaf has a recommended H1, suggested word count, and a list of cluster keywords. Output is sortable by search volume, difficulty, and the AI Overview probability score Surfer started tracking after their Dec 2025 study (Surfer SEO Topical Maps documentation, 2025, 11,256-topic sample).

When to reach for it: building a new content silo from scratch, refreshing a stalled blog, or planning programmatic clusters. Surfer's strength is sheer volume of suggestions and its built-in AIO probability column. Its weakness is that suggestions often skew toward generic informational pages even when the seed implies commercial intent. Always re-classify intent before assigning content type.

SE Ranking Cluster Tool

SE Ranking's clustering operates on the SERP overlap method: two queries belong to the same cluster if N or more of their top 10 results overlap. Threshold is configurable, typically 3 to 5. This is the most defensible clustering signal because it reflects Google's own grouping behavior rather than a model's semantic guess (SE Ranking platform documentation, 2025).

Best use: validating a clustering hypothesis from another tool. Run 200 candidate keywords through SE Ranking at threshold 4. The resulting clusters are how Google actually treats them.

Frase Topic Clustering

Frase ingests a primary keyword and pulls SERP competitor headings, entity mentions, and PAA questions, then clusters them into subtopics. Output is a topic outline rather than a keyword list. Frase is the best tool when the deliverable is a content brief rather than a strategy document (Frase product documentation, 2025).

Keyword Insights

Keyword Insights runs a large input list (often 10K+ keywords from a Semrush or Ahrefs export) through SERP overlap clustering with a configurable similarity threshold. Output is a CSV mapping each keyword to a cluster ID and recommended pillar. Best use: post-export consolidation when a competitor analysis produced thousands of candidate keywords (Keyword Insights platform documentation, 2025).

KeyClusters

KeyClusters is a low-cost alternative for SERP-overlap clustering. Functionally similar to SE Ranking and Keyword Insights, with a simpler interface and lower volume limits. Best use: small-site projects where the keyword universe is under 5K.

NeuronWriter Topic Maps

NeuronWriter's topic maps are tightly integrated with their NLP optimization scoring. The map identifies subtopics, but each subtopic feeds directly into a content score for the article you write next. Best use: a single writer producing one article at a time, where the topic map and the optimization rubric should live in the same tool (NeuronWriter platform documentation, 2025).

10.2 When Semantic Clustering Beats Traditional Keyword Research

Traditional keyword research, the kind that produces a flat list of queries with volume and difficulty, is still appropriate when:

Semantic clustering wins when:

10.3 The Clustering Workflow

  1. Pull a wide keyword universe from Ahrefs Keywords Explorer or Semrush Keyword Magic Tool. Aim for 2K to 10K candidate queries seeded from 5 to 10 primary terms.
  2. Filter for intent relevance. Drop queries that clearly target a different audience.
  3. Run through SE Ranking or Keyword Insights at SERP-overlap threshold 3 or 4.
  4. Review clusters manually. Merge near-duplicates. Split clusters that mix intents.
  5. For each cluster, designate a pillar query (highest volume + most central) and supporting queries.
  6. Map clusters to existing pages or planned pages. See Section 7.
  7. Validate against framework-topicalauthority.md cluster completeness scoring before publishing.

10.4 Cluster Cannibalization

Clusters can cannibalize each other when two clusters overlap in intent. Detection method: if 30% or more of the supporting queries appear in both clusters, the clusters should be merged or one should be reassigned. Run this check before assigning pages.


11. Question Mining

Questions are now a primary keyword research output, not a secondary one. AI Overviews, People Also Ask boxes, and AI Mode sub-queries all surface question patterns more aggressively than head-term patterns. Cross-ref framework-featuredsnippets.md for the on-page optimization that follows question discovery.

11.1 Tools

AlsoAsked.com

AlsoAsked pulls the People Also Ask tree for a seed query. Each answered PAA generates new PAAs, and AlsoAsked traverses the tree to 4 or 5 levels. Output is a visual tree or a CSV. Best use: building the question coverage spec for a pillar page.

The Mind Map view (released Q3 2025) lets you click any node to expand its descendants on demand. Practical for live planning sessions with a content team.

Limit: PAA results vary by location and device. Run AlsoAsked from a clean profile in the target country to avoid personalization skew (AlsoAsked product documentation, 2025).

AnswerThePublic

AnswerThePublic groups suggestions by question word (who, what, when, where, why, how, which, can, are, will) and by preposition. Output is a visual wheel or a CSV.

Where AlsoAsked surfaces actual PAA questions, AnswerThePublic surfaces autocomplete and related search patterns. The two are complementary, not redundant.

Quora Intent Extraction

Quora is an underused keyword research source. Method:

  1. Identify the topic's top Quora threads via site:quora.com [topic] query.
  2. Extract the questions verbatim. These are how users phrase the topic in their own language, often more specific than autocomplete suggests.
  3. Cross-reference against AlsoAsked PAA results. Questions that appear in both Quora and PAA are validated as real, high-intent queries.
  4. Pay attention to follow-up questions in Quora threads. These often surface sub-queries that don't appear in any keyword tool.

Quora intent extraction is especially valuable for technical specialty topics where keyword tools have thin data. The query "how do I migrate from go-sqlite3 to modernc.org/sqlite for CGO_ENABLED=0 builds" has near-zero search volume but Quora threads show it's an active question with measurable demand.

Reddit and Niche Forums

Same method as Quora. The patterns vary by community but the principle is identical: capture actual user language and use it to seed PAA and autocomplete research.

11.2 The PAA Tree Mapping Methodology

PAA boxes are recursive. When a user clicks one PAA, Google injects 2 to 4 new PAAs related to that branch. A pillar page that addresses all 12 to 30 questions in the full tree is positioned to capture multiple PAA placements across sessions.

Tree mapping process:

  1. Seed query into AlsoAsked. Capture the initial 4 to 8 PAAs.
  2. For each PAA, expand. Capture the 2nd-level PAAs. Continue to depth 3 or 4.
  3. Output is a tree with 12 to 40 question nodes per seed.
  4. Map each node to a page section, H2 or H3, or FAQ entry on the pillar page.
  5. Cross-check against AI Overview citations. AIO often cites PAA-style content because Google's underlying intent model treats PAAs as canonical question expressions.

A pillar page with explicit answers to 25+ PAA tree nodes typically captures 4 to 8 PAA placements in production, based on aggregate observation across the TDG client roster in Q1 2026.

11.3 Question-to-Section Mapping

For each question node:

question_node:
  question: "What is the difference between AEO and traditional SEO?"
  section_heading: "AEO vs Traditional SEO: Key Differences"
  heading_level: "H2"
  answer_length: "150 to 250 words"
  answer_format: "comparison table preferred"
  internal_link_target: "/topics/answer-engine-optimization/"

The answer length and format depend on what the SERP rewards. See framework-featuredsnippets.md Section 4 for paragraph, list, and table snippet optimization.


12. Intent Drift Over Time

Search intent is not static. The same query string can mean different things in different years, and the SERP composition will reflect the drift before the keyword volume does.

12.1 Examples of Drift

"AI SEO"

In 2023, "AI SEO" returned mostly tool comparison content: ChatGPT for SEO, Jasper for content, MarketMuse for optimization. Intent was commercial investigation centered on tools.

In 2025, "AI SEO" returned a mix of tool content and methodology content: how AI changes SEO, what to do about AI Overviews, GEO vs AEO definitions. Intent shifted toward informational and strategic.

In 2026, "AI SEO" returns predominantly AI Overview content with citations to authoritative methodology sources. Tool listings have receded. Intent is now defined by AIO synthesis quality rather than ranker click-through.

A page that ranked in 2023 because it compared 12 AI SEO tools is unlikely to rank in 2026 unless rewritten as a methodology resource that AIO can cite.

"Schema markup"

In 2022, "schema markup" returned developer documentation. Intent was informational, audience was technical implementers.

In 2024, intent expanded to include strategic content: which schemas matter for which page types, schema for SEO benefit, schema generators for non-developers.

In 2026, "schema markup" returns AIO synthesis at the top, then a mix of generator tools and strategic guides. Pure developer documentation has been demoted in many SERPs because AIO answers the implementation questions directly.

"Web hosting"

Stable intent over years: commercial investigation. Top 10 has been listicles since 2018. Drift in this query is minimal because the underlying user need (compare hosting options before buying) hasn't changed.

12.2 Detecting Drift

Three primary signals:

SERP composition shift: Compare the top 10 results for a target query year-over-year. If 6 of 10 results in 2026 are different page types than in 2024, intent has drifted. Use Wayback Machine to capture the 2024 SERP snapshot if you don't have a manual archive.

Query refinement patterns: When users refine "AI SEO" in 2026, the refinements ("AI SEO methodology", "AI SEO without tools", "AI SEO for small business") signal where the underlying intent is moving. Track refinements via Google Trends related queries.

AI Overview prevalence shifts: A query that didn't show AIO in 2024 but does in 2026 has, by definition, moved toward informational synthesis intent in Google's classifier. Track AIO prevalence per query monthly. See framework-aioverviews.md Section 5 for the AIO tracking workflow.

12.3 Annual Intent Reassessment

Schedule an annual review:

  1. Pull the top 50 priority queries from the keyword strategy.
  2. For each, manually check the current SERP.
  3. Classify the current intent vs the previously documented intent.
  4. Flag drift.
  5. For drifted queries, decide: rewrite the targeting page, reassign to a different page, or accept the page will lose rankings on that query.

Drift detection without a reassessment cadence is wasted analysis. The reassessment is what converts drift detection into action.


13. AI Overview Query Identification

In 2026, AI Overviews appear on a non-trivial fraction of queries. Identifying which queries surface AIO is a primary input to content strategy because AIO presence changes the entire optimization target. Cross-ref framework-aioverviews.md for the on-page optimization once AIO queries are identified.

13.1 Manual Sampling

The simplest method is manual SERP inspection. Take the top 50 priority queries. Search each from a clean browser profile in the target country. Record:

Manual sampling is the most accurate method because tool-based AIO tracking still misses about 15% of AIO instances due to triggering variability (Surfer SEO AIO study, Dec 2025, 11,256 keywords sampled).

13.2 Tool-Based Tracking

Surfer SEO AIO Tracking

Added to Surfer in late 2025. The Content Editor and Topical Maps both display an AIO probability score per query, derived from historical and live SERP scrapes. Use Surfer for at-scale tracking across hundreds of queries.

BrightEdge AI Catalyst

Enterprise tool that monitors AIO presence per query and tracks citation behavior. Reports AIO trigger rate, citation share, and competitive AIO citation patterns. Best for enterprise sites with 1K+ tracked queries (BrightEdge AI Catalyst product documentation, 2025).

Semrush and Ahrefs

Both added AIO presence indicators in their SERP feature columns during 2025. Less detailed than Surfer or BrightEdge but adequate for spot-checking.

GSC Performance

GSC began surfacing AIO impressions and clicks in early 2026 as a separate metric. The most reliable signal for queries where your site already has visibility. Use the AIO impression-to-click ratio to identify which AIO queries actually convert.

13.3 Adjusting Strategy When AIO Dominates

Surfer's Dec 2025 finding: when 70% or more of priority queries in a portfolio show AIO, the keyword strategy needs to shift from ranking-position optimization to citation optimization. The shift includes:

For TDG client portfolios, the Dec 2025 prevalence threshold has been hit on most knowledge-stage topics. AIO citation is now the default optimization target for those, not a secondary one.

13.4 Queries Where AIO Doesn't Dominate

Not every query shows AIO. As of mid-2026, AIO is rare or absent on:

These remain conventional organic ranking targets. The keyword strategy should explicitly classify each priority query as AIO-dominant, AIO-occasional, or AIO-absent, and assign different optimization targets per class.


14. Sub-Query Fan-Out Research

The AI Overview era introduced a new layer to keyword research: the sub-queries that AIO and AI Mode generate behind a single user query. The user types one query; the system fans out into 8 to 16 sub-queries to compose the answer. Each sub-query is a potential citation opportunity.

14.1 The Fan-Out Pattern

Google's AI Overview system, based on public statements and observed behavior, decomposes a user query into related sub-queries before synthesizing the AIO panel. Typical pattern:

If the user query is "best practices for federated SQLite replication in a 3-node cluster", the sub-query fan-out might include:

A page that gets cited in the AIO panel for the user query is one whose content is the best available answer to several of those sub-queries, not necessarily the headline query itself.

14.2 Identifying Sub-Queries

Method 1: AI Mode panel inspection

When AI Mode is available in the target locale, expand the AI Mode response panel. The panel typically displays a "Searched for" section listing the sub-queries the model generated. Capture verbatim.

This is the most direct method. It reveals exactly which sub-queries Google's classifier generated for the parent query.

Method 2: AIO citation reverse-engineering

If AI Mode isn't available, use the AIO panel itself:

  1. Capture the AIO synthesis for a target query.
  2. Identify the cited URLs.
  3. For each cited URL, infer what sub-query it satisfies. Often the URL's title or H1 reveals the sub-query directly.
  4. Aggregate across 5 to 10 priority queries. The patterns reveal Google's sub-query universe for your topic.

Method 3: Related searches and PAA

Less precise but easier. The bottom-of-SERP "Related searches" and the PAA boxes are a superset of common sub-queries. Map AlsoAsked output (Section 11) against the AIO citation reverse-engineering to identify true sub-queries.

Method 4: Direct AI assistant questioning

Ask Claude, ChatGPT, or Perplexity: "If a user searches for [parent query], what sub-questions would you need to answer to produce a comprehensive response?" The response is a model's view of the sub-query fan-out, which approximates Google's behavior since the underlying decomposition is similar across models.

14.3 Reverse-Engineering Sub-Query Coverage

Once the sub-queries are identified, evaluate your existing content against them:

parent_query: "AEO for small business"
sub_queries:
  - q: "What is AEO"
    coverage: "Pillar page H2"
  - q: "AEO vs SEO"
    coverage: "Pillar page H2"
  - q: "How AI answer engines find content"
    coverage: "Supporting article published"
  - q: "AEO best practices for small sites"
    coverage: "GAP - no content yet"
  - q: "AEO ROI for small business"
    coverage: "GAP - no content yet"
  - q: "AEO tools for non-technical users"
    coverage: "Supporting article published"
  - q: "AEO schema markup essentials"
    coverage: "Cross-cluster link to schema framework"
  - q: "AEO content brief template"
    coverage: "GAP - no content yet"

Each GAP becomes a content brief. Each covered sub-query becomes a candidate for AIO citation if the answer is high-quality.

14.4 Cluster Completeness via Sub-Query Coverage

The metric that matters: percentage of sub-queries for which your site is the best available answer. Manual scoring or AIO citation tracking (Section 13) provides the ground truth.

Cluster completeness target: 70% or more sub-queries covered with content that has cited or could plausibly be cited in AIO. Below 50% completeness, the cluster is unlikely to win AIO citations regardless of individual page quality.


15. Long-Tail Keyword Strategy in the AI Overview Era

The classic long-tail strategy was: target many low-volume queries, each easy to rank for, aggregate to meaningful traffic. That strategy still works for queries AIO doesn't answer. For AIO-dominant long-tails, the strategy has changed.

15.1 The Click Erosion Problem

Long-tail informational queries are exactly the queries AIO answers most aggressively. "How do I configure modernc.org/sqlite for CGO_ENABLED=0 builds" is the kind of query where AIO synthesizes a complete answer, and the user often gets what they need without clicking.

Aggregate click-through rates on long-tail informational queries dropped 38% between Q2 2024 and Q1 2026 across the BrightEdge enterprise sample (BrightEdge AI Catalyst quarterly report, Q1 2026, 412 enterprise sites tracked, 2.4M queries monitored). Click counts are down even when ranking position is unchanged.

The implication: ranking #1 on a long-tail in 2026 is worth substantially less than ranking #1 was worth in 2023.

15.2 The New Value: AIO Citation

The replacement value is AIO citation. When the AIO panel cites your URL inline, three things happen:

15.3 Selecting Long-Tails by AIO Citation Probability

Old selection criteria for long-tails:

New selection criteria for long-tails in the AIO era:

Volume becomes a third-order signal. A long-tail with monthly volume of 5 that triggers AIO and cites high-conversion pages may outperform a long-tail with monthly volume of 500 that triggers AIO but cites informational competitors.

15.4 The Long-Tail Workflow Update

  1. Generate the long-tail keyword universe via traditional methods. See Section 4.
  2. For each long-tail, check AIO presence. Use Surfer AIO tracking or manual sampling.
  3. For AIO-present long-tails, identify the cited URL patterns. Are they listicles, single-source authorities, or aggregator sites?
  4. Score each long-tail by citation probability for your content type.
  5. Prioritize long-tails where citation probability is high and your conversion path matches the cited URL pattern.
  6. Deprioritize long-tails where AIO answers fully without citing detail-rich sources.

A long-tail strategy in 2026 ends up with about 40% the candidate queries it would have had in 2023, but the prioritization is more rigorous and the queries selected have higher expected value per ranking position.


16. Branded vs Non-Branded Distribution

The ratio of branded to non-branded search traffic is a diagnostic signal about brand awareness and SEO funnel health.

16.1 The Healthy Ratio

There is no single correct ratio because it depends on business stage:

Early-stage business: 5% to 15% branded. Most search traffic comes from non-brand queries because the brand has minimal awareness. SEO effort is correctly aimed at awareness-stage non-brand queries.

Established niche business: 20% to 35% branded. The brand has visibility within its niche and direct-search demand is meaningful, but the majority of acquisition still depends on non-brand discovery.

Mature high-awareness brand: 35% to 60% branded. The brand is recognized broadly and substantial search traffic is people looking specifically for the brand or its products.

Dominant category brand: 50%+ branded. The brand is the category. Most search traffic is brand-defense rather than acquisition.

16.2 When Brand Search Dominates

If brand search exceeds 50% on a business that isn't a dominant category brand, the signal is:

Action: audit the non-brand opportunity set. Identify high-volume awareness-stage queries the brand doesn't currently rank for. Build content to address them.

16.3 When Non-Brand Dominates

If brand search is below 5% on a business that isn't brand new, the signal is:

Action: invest in awareness-stage and brand-building activity. Press, partnerships, community presence, distinctive thought leadership. The SEO strategy should still be working, but it's working without the brand-trust multiplier that lifts conversion rates on every page.

16.4 Measuring the Ratio

From GSC:

  1. Performance report, set date range to last 12 months.
  2. Export all queries.
  3. Tag each query as branded or non-branded. Branded queries contain the brand name, common misspellings, product names, founder names, or distinctive brand phrases.
  4. Sum impressions or clicks for each category. Calculate ratio.

For sites without GSC history, estimate via direct traffic share (high direct traffic implies high brand recognition) and via branded search volume in Ahrefs or Semrush.

16.5 Branded Query Defense

Regardless of ratio, branded queries should always be defended. Brand-defense content includes:

Each branded query pattern should have an owned page that captures the click before a third-party page intercepts it.


17. Zero-Volume Keyword Research

Keyword tools report zero monthly search volume when their sample size for a query falls below the tool's minimum-reporting threshold. This does not mean the query has zero searches. It means the tool can't measure them confidently.

17.1 GSC as the True Volume Source

Google Search Console reports actual impressions and clicks per query, not estimated volume. For queries where the site already has some ranking, GSC reveals real volume. Pattern observed across the TDG client roster:

Practical implication: GSC is the only ground truth for query volume on a site that already exists. For pre-launch sites or new content areas, zero-volume in tools is genuinely zero-volume more often, but the false-negative rate is still around 15% to 20%.

17.2 When Zero-Volume Queries Matter

Early-stage trends

A new technology or methodology will have zero reported volume until enough searches accumulate. Catching the trend early means ranking before competitors notice. Recent examples: "AEO" had zero reported volume in most tools through Q2 2024 despite being actively searched. Sites that built content in late 2023 captured the wave.

Technical specialty terms

Specific product names, library names, configuration patterns, error messages. These often have zero reported volume but the queries that do happen are extremely high-intent. A user searching "CGO_ENABLED=0 SQLite driver pure Go" is a developer with a specific problem and high conversion potential if the site sells relevant infrastructure or services.

Brand defense

Misspellings of the brand, alternate brand renderings, product code names. Often zero-volume but defended for completeness.

Long-tail conversational queries

Conversational queries are growing as AI Mode and voice search increase. Many conversational queries are zero-volume by traditional measurement but accumulate to meaningful traffic.

17.3 Zero-Volume Workflow

  1. After running standard keyword research, do not delete zero-volume queries from the working list.
  2. Tag zero-volume queries by hypothesis: early-stage trend, technical specialty, brand defense, conversational variant.
  3. For each tag, decide a coverage threshold. Early-stage trends might warrant a dedicated page if the underlying signal is strong. Technical specialty queries might warrant an FAQ entry. Brand defense queries might warrant a section on an existing page.
  4. Monitor GSC quarterly. Zero-volume queries that start generating impressions are validation. Promote them to active strategy.
  5. Zero-volume queries that never generate impressions after 12 months can be deprioritized, but not before.

17.4 Tool Limitations to Remember

Different tools have different zero-volume thresholds. A query reported as zero in Ahrefs may show 30 monthly volume in Semrush. Cross-checking across at least two tools and GSC is essential before classifying a query as genuinely zero.


18. Programmatic Keyword Research

Programmatic SEO generates pages at scale by varying inputs across a template. The keyword research input becomes a matrix rather than a list. Cross-ref framework-saas-seo.md for the programmatic SaaS context.

18.1 The Matrix Approach

Each programmatic strategy is defined by two or more axes. Each cell in the matrix is a unique page.

Two-axis examples:

Three-axis examples:

18.2 Sizing the Matrix

The total cell count is the product of axis sizes. A 100-city x 12-service matrix produces 1,200 pages. A 3-axis 100 x 50 x 12 produces 60,000 pages.

Page count by itself is not a quality signal. The question is whether each cell has:

A matrix that produces 60,000 pages where 50,000 are thin and undemanded is worse than a 1,200-page matrix where every page has clear demand.

18.3 When Programmatic Justifies Pages

Programmatic is appropriate when:

18.4 When Programmatic Produces Thin Content

Programmatic fails when:

See framework-hcs.md for the helpful content thresholds programmatic pages need to clear.

18.5 The Demand Validation Workflow

Before generating a programmatic matrix, validate demand:

  1. Pick 5 to 10 sample cells across the matrix range. Include high-population, mid-population, and low-population cells if location is an axis. Include common and uncommon combinations.
  2. Check search volume for each sample cell across Ahrefs, Semrush, and GSC if any data exists.
  3. If 70% or more sample cells show meaningful volume or strong intent signals, the matrix is viable.
  4. If fewer than 50% show demand, restructure: reduce one axis, combine cells, or kill the matrix.
  5. If results are mixed, generate the demand-validated subset only. Don't auto-generate the full matrix.

18.6 Per-Cell Differentiation Requirements

Each programmatic page needs three differentiation layers minimum:

Pages that meet only the first layer are thin. Pages with all three layers can pass HCS thresholds and rank.


19. Competitor Keyword Gap Analysis 2026

Gap analysis identifies queries competitors rank for that you don't. In the AI Overview era, the gap matrix needs additional dimensions.

19.1 Standard Gap Tools

Ahrefs Content Gap

Input: 1 to 3 competitor domains plus your domain. Output: queries where competitors rank in top 10 and you don't rank in top 100. Sortable by volume, difficulty, intent.

Best use: standard ranking-gap analysis. The output is a candidate query list.

Semrush Keyword Gap

Same function as Ahrefs Content Gap with a different data set. Often surfaces different queries than Ahrefs because the underlying SERP databases differ. Running both and merging the outputs catches more candidates than either alone.

Moz Keyword Explorer

Less comprehensive but offers the Priority Score, which blends volume, difficulty, opportunity, and brand authority into a single metric. Useful when you need a one-number prioritization for a presentation.

19.2 The AI Overview Gap Matrix

Beyond ranking gaps, the 2026 matrix tracks:

AIO Citation Gap

Queries where competitors are cited in AIO panels and you are not. Different from ranking gap because a competitor might rank #4 but be cited in AIO, while a different competitor ranks #1 but is not cited.

Method: pull 50 priority queries. Manually inspect AIO citations across each. Tag each query with: competitor cited, you cited, neither, both. The "competitor cited, you not cited" cell is the AIO citation gap.

Cluster Coverage Gap

Queries where competitors have multiple cluster pages and you have one or none. This reveals topical authority gaps even when individual rankings look comparable.

Method: for each cluster in your strategy, count the pages each competitor publishes that target queries in the cluster. If a competitor has 12 pages on a cluster and you have 4, the cluster coverage gap is 8 pages, and the cluster is the strategic priority regardless of which individual queries you target first.

Schema and Entity Gap

Queries where competitors have richer schema markup or entity coverage. Less easy to surface from gap tools directly. Method: inspect competitor pages for top 20 priority queries, catalog the schemas used, identify schemas you don't use.

Velocity Gap

Queries where competitors are publishing new pages or updating existing ones at high cadence. A competitor publishing weekly on a topic is signaling intent to dominate. Velocity is measured via competitor sitemap or blog inspection.

19.3 The Combined Gap Score

For each candidate query, build a combined gap score:

query: "AEO best practices"
ranking_gap: 6   # competitor positions 3, 5, 8; you position 47
aio_citation_gap: 1   # competitor cited, you not
cluster_coverage_gap: 1   # competitor has 9 cluster pages, you have 2
schema_gap: 1   # competitor uses Article+FAQPage+HowTo, you use Article only
velocity_gap: 1   # competitor updated last month, your page is 18 months old
combined: 10

Sort candidate queries by combined gap score. Highest scores are the priority workload.

19.4 The Decision Matrix

For each high-gap query, decide:

The acceptance option is real. Some gaps require capabilities you don't have or investment that won't return. Skip those explicitly rather than pretending to address them.


20. Audit Mode

Three-tier audit rubric: per-keyword-research-project (15 items), site-wide keyword strategy (10 items), first 90 days subset (5 items).

20.1 Per Keyword Research Project

# Criterion Pass/Fail
KP1 Seed keywords documented with rationale
KP2 Keyword expansion from at least 3 sources (tools, SERP, competitors)
KP3 Intent classified for each candidate query
KP4 SERP composition verified for top 25 priority queries
KP5 AIO presence checked for top 25 priority queries
KP6 Long-tail subset extracted and AIO-tagged
KP7 Branded keyword set defended with owned pages
KP8 Zero-volume queries reviewed with hypothesis tagging
KP9 Sub-query fan-out documented for top 10 priority queries
KP10 Question keywords mined from AlsoAsked or equivalent
KP11 PAA tree mapped to depth 3 for pillar topics
KP12 Semantic clusters generated via SERP-overlap method
KP13 Cluster cannibalization checked across the strategy
KP14 Competitor gap matrix run with AIO citation column
KP15 Final keyword map approved and version-stamped

Score: 15. World-class: 14+/15.

20.2 Site-Wide Keyword Strategy

# Criterion Pass/Fail
KS1 Documented keyword strategy exists and is current
KS2 Primary keywords mapped per important page
KS3 No detected keyword cannibalization or active resolution plan
KS4 Topic clusters defined with pillar and supporting pages
KS5 Branded vs non-branded ratio measured and appropriate for stage
KS6 AIO query classification applied (dominant, occasional, absent)
KS7 Sub-query coverage tracked at cluster level
KS8 Programmatic matrix demand-validated before generation
KS9 GSC monitored monthly for query performance and drift
KS10 Annual intent reassessment scheduled and last completed within 12 months

Score: 10. World-class: 9+/10.

20.3 First 90 Days

# Criterion Pass/Fail
KF1 Seed keywords and competitor list documented
KF2 Top 50 priority queries identified with intent classified
KF3 AIO presence checked manually on top 50 priority queries
KF4 Keyword map for top 25 pages in place
KF5 Cannibalization audit run with at least one resolution applied

Score: 5. World-class: 5/5.


21. Common Mistakes

  1. Optimizing for high volume regardless of intent — ranking for queries you can't convert wastes effort
  2. Ignoring search intent — content type doesn't match what users want
  3. Single-keyword focus instead of topic — modern SEO favors topical depth
  4. Keyword cannibalization unaddressed — multiple pages competing for same query
  5. No keyword mapping — content created without strategic alignment
  6. Volume tool numbers treated as exact — they're approximations
  7. Ignoring branded keywords — easy wins missed
  8. No long-tail strategy — depending only on head terms
  9. No SERP feature targeting — missing snippets, PAAs, AI Overviews
  10. Static keyword strategy — not updating as queries evolve
  11. Treating AIO-dominant queries like conventional rankings, where citation is the new metric for those queries
  12. Generating programmatic pages without demand validation, which produces thin pages at scale
  13. Discarding zero-volume queries reflexively, since many are early-stage trends or specialty terms with real demand
  14. Ignoring sub-query fan-out, because a parent query gets cited when the page answers sub-queries, not the parent string
  15. Skipping annual intent reassessment, because intent drift erodes rankings silently

22. Maintenance

Monthly: GSC query review. New queries to target. Ranking changes. AIO citation share tracking on priority queries.

Quarterly: Comprehensive keyword strategy review. Cannibalization audit. New cluster opportunities. Cluster coverage gap analysis against top 2 competitors.

Annually: Strategic keyword research refresh. Competitive landscape analysis. Full intent reassessment on top 50 priority queries per Section 12.3. Sub-query fan-out re-mapped for pillar clusters.


23. Quick Validation Script

For a quick site-wide audit of keyword coverage on a static site, the following bash script walks /var/www/sites/[domain]/ and flags pages missing primary keyword markers:

#!/bin/bash
SITE="$1"
ROOT="/var/www/sites/${SITE}"
if [ ! -d "$ROOT" ]; then
  echo "Site root not found: $ROOT"
  exit 1
fi
echo "Pages missing H1:"
find "$ROOT" -name "*.html" -type f | while read -r f; do
  if ! grep -q "<h1" "$f"; then
    echo "  $f"
  fi
done
echo ""
echo "Pages missing title tag:"
find "$ROOT" -name "*.html" -type f | while read -r f; do
  if ! grep -q "<title>" "$f"; then
    echo "  $f"
  fi
done
echo ""
echo "Pages with duplicate H1 candidates:"
find "$ROOT" -name "*.html" -type f -exec grep -l "<h1" {} \; | while read -r f; do
  count=$(grep -c "<h1" "$f")
  if [ "$count" -gt 1 ]; then
    echo "  $f ($count H1 tags)"
  fi
done

This script is a starting point for the keyword mapping audit. Multiple H1 tags often indicate template issues that interfere with primary keyword signaling. Missing title tags mean a page can't compete for any query.


Companion documents:

Want this framework implemented on your site?

ThatDevPro ships these frameworks as productized services. SDVOSB-certified veteran owned. Cassville, Missouri.

See Engine Optimization service ›