SEO & AI Engine Optimization Framework · May 2026

Content-First Architecture: semantic HTML as substrate, visual as projection

By Joseph W. Anady — Founder & Lead Engineer, ThatDevPro (BA Computer Engineering, MA Cybersecurity) · Updated May 2026

The Architectural Doctrine — Semantic HTML as Substrate, Visual Layer as Projection, and the Build Philosophy That All Other Frameworks Serve

A comprehensive doctrine reference establishing the architectural axiom that governs every other framework in this library. Content First Architecture is the position that the structural, semantic, machine readable HTML document is the substrate of every site, and the visual frontend (CSS, JavaScript, animations, graphics, interactivity) is a projection that sits on top of that substrate without obstructing it. Crawlers and AI engines read the substrate. Humans see the projection. Both are served by the same HTML file. This document codifies the principle, the empirical research that justifies it, the five hard rules that enforce it, the substrate and projection layer specifications, and the audit methodology that verifies a build adheres to it. Dual purpose: installation doctrine and architectural audit.

Cross stack implementation note: the code samples in this framework are written in plain HTML for clarity. For React, Vue, Svelte, Next.js, Nuxt, SvelteKit, Astro, Hugo, 11ty, Remix, WordPress, Shopify, and Webflow equivalents of every pattern below, see framework-cross-stack-implementation.md. For pure client rendered SPAs (no SSR/SSG), see framework-react.md. The doctrine applies regardless of stack. The implementation differs.

Quick answer

Content-First Architecture is the doctrine that a semantic, machine-readable HTML document is the substrate of every site, and the visual frontend (CSS, JS, animations) is a projection layered on top without obstructing it. Crawlers and AI engines read the substrate; humans see the projection; both ship in one HTML file. You author the substrate first, then composite the visual layer, keeping all content in the first-byte server response.

1. Document Purpose

This is Level 0. It precedes every other framework in the library.

Content First Architecture is not a technique. It is the architectural axiom that determines whether the rest of the library has any effect. Schema markup, internal linking, E-E-A-T signaling, entity salience, AI citation optimization, technical SEO, page experience: every framework in this library makes a foundational assumption that crawlers and AI engines can actually read what the site delivers. That assumption is only true when the site is built content first. When a site is built visual first (the default in modern web tooling), the frameworks still install correctly, the audit rubrics still pass on paper, and the result is invisible to AI extraction anyway because the content never reaches the bot.

The frameworks library cannot defend itself against this failure mode without an explicit doctrine. Individual frameworks contain technical guidance against client side rendering. None of them name the inversion as the build philosophy. Without a doctrine document, a builder (developer or AI assistant) can follow the dependency graph, install every framework in the prescribed order, and ship a beautifully styled React SPA that delivers an empty <div id="root"> to GPTBot. Every framework was implemented. Zero citations result.

This document closes that gap. Read it before any client engagement. Reference it during every build decision. Apply it as the gating criterion before any other framework is invoked.

1.1 Required Understanding

Before reading this framework, the builder should be familiar with:

HTML semantics — <article>, <section>, <main>, <nav>, <aside>, <header>, <footer>, heading hierarchy H1 through H6, and the distinction between semantic and presentational markup
CSS layout fundamentals — Grid, Flexbox, positioning, pseudo elements (::before, ::after), background images, transforms
JavaScript rendering models — SSG, SSR, ISR, CSR, hydration, and the difference between content delivery and interaction enhancement
Crawler behavior — how Googlebot, GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Applebot, and Google-Extended handle JavaScript (or do not)
Schema.org JSON-LD — the @id graph pattern and where structured data lives in the document

1.2 Document Scope

Covers: the inversion principle, the empirical research justifying it, the five hard rules, substrate layer specification (everything in the content document), projection layer specification (everything in the visual layer), stack specific implementation patterns, validation methodology, common violations, audit rubric. Touches but does not exhaust: schema implementation (framework-schema.md), technical SEO foundations (framework-technicalseo.md), JavaScript rendering decisions (framework-react.md), accessibility (framework-accessibility.md), AI citation mechanics (framework-aicitations.md).

2. Client Variables Intake

content_first_intake:

  current_architecture: ""                # "content_first" | "visual_first" | "hybrid" | "unknown"
  rendering_strategy: ""                  # "ssg" | "ssr" | "isr" | "csr" | "hybrid" | "static_html"
  primary_stack: ""                       # plain_html | nextjs | astro | hugo | nuxt | sveltekit | remix | gatsby | wordpress | shopify | webflow | react_spa | vue_spa
  hydration_model: ""                     # "none" | "selective" | "full" | "islands"
  content_in_first_byte: false            # does the server return content in initial HTML
  schema_in_first_byte: false             # is JSON-LD in initial HTML or injected
  js_required_for_content: false          # is JS execution required to see primary content
  retrofit_or_greenfield: ""              # "retrofit_existing" | "greenfield_build"
  existing_traffic_at_risk: false         # are there rankings to protect during transition
  build_team: ""                          # "joseph_solo" | "joseph_plus_ai" | "client_dev_team" | "mixed"
  estimated_pages: 0                      # total page count
  audit_baseline_score: 0                 # current G score per GEO16 framework (if known)

The first three values determine whether the engagement is a doctrine installation (greenfield) or a doctrine retrofit (existing site). The fourth through sixth determine the technical severity. Retrofits with significant traffic at risk require staged migration. Greenfield builds get the doctrine applied from the first commit.

3. The Inversion Principle

3.1 The Default and Why It Fails

The default architecture in 2026 modern web development is visual first. A designer produces a visual concept. A developer translates that concept into component scaffolding (React, Vue, Svelte). Content is then injected into the components at runtime through props, fetch calls, or CMS integration. The final HTML delivered to the browser is an empty shell. The visible page is constructed by JavaScript execution in the client.

This default works for human users because their browsers execute JavaScript. The user experiences the constructed page. To the user, the architecture is invisible.

The default does not work for the actors that matter most for citation and ranking in the AI era:

AI crawlers do not execute JavaScript. GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Applebot Extended, Google-Extended, and every AI specific bot consume the first byte server response and nothing else. If the first byte is an empty shell, the bot sees an empty page.

Search crawlers execute JavaScript with delay and partial coverage. Googlebot now renders JavaScript reliably for most sites within hours, but indexing decisions are still influenced by the unrendered HTML. Bing renders less reliably. AI Overviews pull from Google's index but may apply their own extraction passes that do not wait for client side rendering.

Screen readers operate on DOM, not on rendered pixels. Content delivered via JavaScript can be read by screen readers, but only if the JS has executed and the accessibility tree is current. Sites that depend on JS for content fail accessibility audits and degrade user experience for people using assistive technology.

The visual first default produces sites that are invisible to the surface where AI citation, ranking decoupling, and zero click search are happening. Every framework in this library is wasted on a visual first architecture because the bot never sees the content the frameworks optimize.

3.2 The Inversion

Content First Architecture inverts the build order:

Step one. A semantic, machine readable HTML document is authored. This document contains the complete content, heading hierarchy, schema markup, internal linking, entity declarations, FAQ blocks, comparison tables, and metadata. It is the substrate. It is what the bot sees. It is what would be indexed and cited even if no styles were applied at all.

Step two. A visual layer is composed on top of the substrate. CSS positions elements, applies typography, paints backgrounds, layers decorative graphics. JavaScript adds animation, micro interaction, form handling, lazy media loading. The visual layer does not deliver content. The visual layer projects the existing content into a human pleasing experience.

Step three. Both layers ship together in a single HTML response from the server. The bot reads the substrate. The human browser composites the substrate with the visual layer and presents the rendered result.

The inversion is not a tradeoff. It does not sacrifice visual quality for crawlability. Modern CSS (Grid, Flexbox, custom properties, container queries, anchor positioning, scroll driven animations) and modern JavaScript (GSAP, Three.js, Motion One, Lottie) are fully capable of building epic visual experiences on top of a semantic substrate. The constraint is architectural discipline, not visual capability.

The thataiguy.org site in this agency's portfolio is the proof of concept. Static HTML substrate with Three.js WebGL, GSAP cinematic animations, scroll triggered transitions, and a full visual identity, all projected on top of content that GPTBot can read in the first byte.

3.3 The Substrate / Projection Model

The mental model:

Substrate = the structural, semantic HTML document. The content. The meaning. The schema. What the bot extracts. What the screen reader announces. What the AI engine cites.

Projection = the visual experience composited on top of the substrate. The typography. The animations. The colors. The layout. What the human sees.

The substrate is the truth. The projection is the presentation of that truth. The substrate exists with or without the projection. The projection cannot exist without the substrate. This asymmetry is the architecture.

When a developer asks "where does this content go," the answer is always: in the substrate. When a developer asks "how should this look," the answer is always: in the projection. When the question is "where does this schema markup go," the answer is the substrate, in the document head, server rendered, present in the first byte.

3.4 Why This Matters Now

The architecture would be defensible on accessibility and resilience grounds even in 2010. What makes it the gating axiom in 2026 is the empirical surface AI search now represents.

AI search citation rates exceed traditional ranking for many query categories. Independent research found that only around 12% of ChatGPT citations match URLs on Google's first page. The AI surface is a separate competitive channel.

Google AI Overview citation patterns have decoupled from Google rankings. In July 2025, 76% of AIO cited URLs ranked in the organic top 10. By February 2026, only 38% came from the top 10. The remaining 62% came from positions 11 through beyond 100. AIO is no longer a top of SERP feature. It is its own ranking surface, and the inputs that drive it favor structural extractability over ranking position.

The cost of being invisible compounds. Every month a site is invisible to AI extraction is a month of lost citation accrual. Citations build entity authority. Entity authority improves future citation probability. Sites that started content first in 2024 now have two years of compounding advantage over visual first competitors. The window to catch up is not infinite.

This is why Content First Architecture is Level 0 in this library. Every other framework presupposes the bot can read the page. This framework is what makes that presupposition true.

4. The Empirical Foundation

4.1 AI Crawler Behavior

The dominant AI crawlers and their JavaScript handling, as of 2026 baseline observation:

ai_crawler_js_capability:

  gptbot:
    operator: "OpenAI"
    javascript_execution: "none"
    consumes: "first byte server response only"
    purpose: "training data and ChatGPT Search index refresh"

  oai_searchbot:
    operator: "OpenAI"
    javascript_execution: "none"
    consumes: "first byte server response only"
    purpose: "ChatGPT Search live retrieval"

  chatgpt_user:
    operator: "OpenAI"
    javascript_execution: "none"
    consumes: "first byte server response only"
    purpose: "real time fetch when user invokes browse capability"
    note: "highest priority bot for live citation"

  claudebot:
    operator: "Anthropic"
    javascript_execution: "none"
    consumes: "first byte server response only"
    purpose: "training data and Claude search retrieval"

  perplexitybot:
    operator: "Perplexity"
    javascript_execution: "none"
    consumes: "first byte server response only"
    purpose: "Perplexity search index"
    note: "Cloudflare evidence August 2025 of undeclared crawlers using generic Chrome user agents to bypass blocks"

  google_extended:
    operator: "Google"
    javascript_execution: "limited"
    consumes: "primarily first byte; partial JS execution"
    purpose: "Gemini, Bard, AI Overviews training and retrieval"

  applebot_extended:
    operator: "Apple"
    javascript_execution: "none"
    consumes: "first byte server response only"
    purpose: "Apple Intelligence training data"

  googlebot:
    operator: "Google"
    javascript_execution: "full (with delay)"
    consumes: "first byte for index decisions, rendered HTML for content"
    note: "two wave model mostly obsolete since 2025 but indexing decisions still weighted toward unrendered HTML"

  bingbot:
    operator: "Microsoft"
    javascript_execution: "partial"
    consumes: "first byte + some rendered content"
    note: "feeds Microsoft Copilot in addition to Bing search"

The pattern: every named AI crawler operates on the first byte server response. Search crawlers that render JavaScript still weight the unrendered HTML for index decisions. The first byte is the universal extraction surface.

4.2 Citation Multipliers

Independent research across 2024 through early 2026 has produced reproducible multipliers for specific structural patterns in extraction friendly HTML. The most cited findings:

citation_multipliers:

  tables_vs_prose:
    multiplier: 4.2
    source: "kime.ai 2026 analysis of 10,000 AI citations"
    pattern: "HTML <table> with <thead>, <tbody>, <th>, <td> for comparison data"
    why: "tables map directly to structured data the model can paraphrase or reformat"

  faqpage_schema:
    multiplier: 3.2
    source: "Frase.io 2025 study"
    pattern: "FAQPage JSON-LD with Question/acceptedAnswer pairs matching visible H2 questions and answer paragraphs"
    why: "explicit machine readable Q&A relationship; AI engines prefer this for direct answer extraction"

  numbered_lists_for_processes:
    multiplier: 2.7
    source: "kime.ai 2026 analysis"
    pattern: "<ol><li>...</li></ol> for sequential steps"
    why: "models cite ordered lists when query implies sequence"

  bullet_lists:
    multiplier: 1.8
    source: "kime.ai 2026 analysis"
    pattern: "<ul><li>...</li></ul> for unordered enumerations"
    why: "scannable structure beats prose paragraphs for feature/option queries"

  expert_quotes:
    multiplier: 1.41
    source: "Princeton GEO study (SIGKDD 2024)"
    pattern: "<blockquote> with cite attribute, attributed to named expert"
    why: "model treats quotation as a credibility signal"

  statistics_citations:
    multiplier: 1.30
    source: "Princeton GEO study"
    pattern: "specific numerical claims with source attribution"
    why: "specific numbers are extractable and citeable as facts"

  citation_count_increase:
    multiplier: 1.30
    source: "Princeton GEO study"
    pattern: "inline outbound citations to authoritative sources"
    why: "outbound citations to .gov/.edu/standards bodies increase the model's confidence in the page"

  fully_populated_product_review_schema:
    citation_rate: 0.617
    source: "industry study cited across multiple sources"
    pattern: "Product schema with AggregateRating, Review, Offer, all properties populated"
    why: "ecommerce extraction relies on schema completeness for shopping queries"

  structured_data_in_ai_overviews:
    visibility_uplift: 0.27
    source: "Google Search Central 2024 internal study"
    pattern: "any valid JSON-LD on the page"
    why: "structured data signals trustworthiness and parseability to AIO synthesis"

These are not soft signals. They are reproducible multipliers documented across multiple independent studies. A page that includes all of these patterns extractably (substrate side, server rendered, in first byte) has a fundamentally higher citation surface than a page that delivers the same content prose only in a hydrated SPA.

4.3 The GEO16 Citation Study

The most rigorous citation study to date is the GEO16 framework analysis published as an arxiv preprint in September 2025. The study measured citation behavior across Brave Summary, Google AI Overviews, and Perplexity, scoring 16 content pillars against actual citation outcomes.

Key findings:

The three pillars most strongly correlated with citation: Metadata and Freshness, Semantic HTML Structure, Valid Structured Data.

Pages scoring G ≥ 0.70 with ≥ 12 of the 16 pillars active achieved a 78% cross engine citation rate (cited by two or more of the three engines studied).

Cross engine cited URLs scored 71% higher in overall quality than single engine cited URLs.

The implication for Content First Architecture is direct. The three top citation pillars all live in the substrate layer. Metadata in the document head. Semantic HTML in the body structure. Structured data as JSON-LD in the head. A site built content first hits all three pillars by default. A site built visual first hits zero of them in the first byte.

4.4 Google AI Overview Decoupling

Citation distribution in Google AI Overviews has shifted significantly between July 2025 and February 2026:

aio_citation_source_distribution:

  july_2025:
    organic_top_10_share: 0.76
    positions_11_through_100: 0.24
    beyond_top_100: ~0

  february_2026:
    organic_top_10_share: 0.38
    positions_11_through_100: 0.312
    beyond_top_100: 0.31
    youtube_share_of_beyond_100: 0.182

The shift means AIO citation is no longer determined primarily by ranking position. A site at position 47 in Google organic can be cited in AIO. A site at position 3 may not be cited if the extraction structure fails. The signal that determines extraction is the structural readability of the HTML, not the ranking authority.

This decoupling creates a strategic opening for sites that ship content first regardless of where they sit in traditional rankings.

4.5 Why Server Rendered HTML Wins

The substrate must be in the first byte server response. Not in a hydration payload. Not in an XHR fetch. Not in a useEffect hook. In the HTML that the server returns to the GET request.

The reason is mechanical. AI crawlers send a GET request, receive the response body, parse it as HTML, and extract from the parse tree. There is no JavaScript runtime. There is no hydration step. There is no second pass. Whatever is in the parsed HTML at the moment of parse is the entire surface available for extraction.

Server rendered HTML wins because it puts the substrate in the first byte. Static site generation (SSG), server side rendering (SSR), incremental static regeneration (ISR), and plain static HTML all qualify. Client side rendering (CSR) does not. Selective hydration is acceptable when the unhydrated HTML already contains all primary content. Full hydration where content is injected client side is the failure mode.

The rule that follows: in any rendering decision, if the chosen strategy delivers primary content via client side execution, the strategy is wrong for content. CSR is acceptable only for interactive widgets, logged in interfaces, and content that is intentionally not indexable (admin panels, dashboards behind auth). For anything that should be cited or ranked, the content lives in the first byte.

5. The Five Hard Rules

These are non negotiable. A violation of any of these rules invalidates the architecture and degrades the entire framework library that builds on top of it.

Rule 1: No content via JavaScript

Primary content (headings, body copy, FAQs, tables, schema, internal links) must be present in the first byte server response. JavaScript may enhance the experience but must not deliver content.

Permitted: JS animations, scroll effects, form validation, lazy media loading, micro interactions, GSAP/Three.js/Motion One overlays.

Forbidden: fetching content from an API after page load to display in the visible page. React components that render content from props passed at runtime when those props are not server rendered. useEffect hooks that populate visible content. JS frameworks shipping <div id="root"></div> and constructing the page client side.

Validation: curl -A "GPTBot" https://example.com/page | grep "<h1>" must return the page's H1. If it does not, the page violates Rule 1.

Rule 2: DOM order is reading order is citation order

Source order in the HTML determines the order in which screen readers announce content, the order in which keyboard navigation tabs through interactive elements, and the order in which extraction parses the content. CSS may visually reorder elements (via Grid grid-template-areas, Flexbox order, absolute positioning, transforms), but the underlying DOM order must remain logical.

Permitted: CSS Grid positioning that visually places the H1 in the bottom right of the hero. Flexbox flex-direction: row-reverse for visual reasons when source order is the logical reading sequence. Absolute positioning of decorative graphics.

Forbidden: Reordering DOM via order: -1 to make a logically later element appear first when the logical reading order should have placed it first in source. Setting tab index to compensate for broken DOM order. Visual hierarchies that imply a different reading sequence than the source.

WCAG technique C27 is the formal specification: visual order should match source order, and where they diverge, source order is what assistive technology and bots receive.

Rule 3: Semantic HTML is the foundation, not styled divs

The document body uses semantic HTML5 elements (<article>, <section>, <main>, <nav>, <aside>, <header>, <footer>, <h1> through <h6>, <ul>, <ol>, <table>, <figure>, <blockquote>, <details>, <summary>) for their semantic meaning, not generic <div> and <span> with styled appearance.

Permitted: Using <div> for layout containers that have no semantic meaning. Using <span> for inline styling targets within text. Adding CSS classes for visual treatment.

Forbidden: Marking up an article with <div class="article"> instead of <article>. Replacing headings with <div class="heading"> to bypass default styling. Constructing a table from nested <div> grid cells. Using <div role="button"> instead of <button>.

Crawlers and AI engines weight semantic structure as a citation signal. The GEO16 pillar Semantic HTML measures this directly. Pages that use semantic elements receive significantly higher extraction priority than visually identical pages built from generic divs.

Rule 4: Schema lives in head, server rendered, in the first byte

JSON-LD structured data is placed in the document <head> (or at the start of <body> before primary content) and is server rendered. It must be present in the first byte response. Schema injected by JavaScript at runtime is invisible to AI crawlers.

Permitted: Multiple <script type="application/ld+json"> blocks. Single graph using the @id pattern with sameAs relationships. Schema generated server side from CMS data at build or request time. Validated through Google Rich Results Test before deploy.

Forbidden: Schema generated client side by JavaScript libraries. Schema injected via Google Tag Manager (this is JS execution, AI crawlers do not run GTM). Schema present only in browser DevTools after page load. Schema that uses Microdata or RDFa attributes on visible HTML (legacy formats, less reliable extraction).

Cross reference: framework-schema.md for the complete schema specification including the @id graph pattern.

Rule 5: Visual layer never blocks extraction

The visual layer (CSS, JavaScript, animations, graphics, interactive components) must operate on top of the substrate without obstructing, hiding, or replacing content meant to be extracted.

Permitted: CSS position: absolute to layer decorative graphics over content. Background images on section containers behind text. <details> and <summary> for collapsed FAQ content (content stays in DOM, just visually collapsed by default). Lazy loaded images and embeds below the fold.

Forbidden: display: none on content meant to be indexed. Off screen positioning (position: absolute; left: -9999px) to hide content visually while keeping it in DOM (treated as cloaking by Google). Tabs and accordions that require JavaScript to reveal content. Carousels where slides 2 through N are not in the DOM until JS rotates them in. Modal overlays that contain primary content only revealed by click.

The <details> and <summary> pattern is the safe FAQ accordion: content is in the DOM, visually collapsed by default, expanded by browser native click handling without JS. Bots see the full content. Humans get the expand interaction.

6. The Substrate Layer

The substrate is the structural HTML document. This section specifies what lives in it and how.

6.1 Document Head Requirements

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">

  <!-- Title and description -->
  <title>Page Title 50 to 60 Characters With Primary Keyword Forward</title>
  <meta name="description" content="Page description 140 to 160 characters with active voice and call to action.">

  <!-- Canonical -->
  <link rel="canonical" href="https://example.com/exact-current-url/">

  <!-- Robots -->
  <meta name="robots" content="index, follow, max-image-preview:large, max-snippet:-1">

  <!-- Open Graph -->
  <meta property="og:type" content="website">
  <meta property="og:title" content="Page Title">
  <meta property="og:description" content="Page description">
  <meta property="og:image" content="https://example.com/og-image-1200x630.jpg">
  <meta property="og:url" content="https://example.com/exact-current-url/">

  <!-- Twitter Card -->
  <meta name="twitter:card" content="summary_large_image">
  <meta name="twitter:title" content="Page Title">
  <meta name="twitter:description" content="Page description">
  <meta name="twitter:image" content="https://example.com/twitter-image-1200x675.jpg">

  <!-- Theme -->
  <meta name="theme-color" content="#10647C">

  <!-- Icons -->
  <link rel="icon" type="image/svg+xml" href="/favicon.svg">
  <link rel="icon" type="image/png" sizes="32x32" href="/icon-32.png">
  <link rel="apple-touch-icon" href="/apple-touch-icon.png">
  <link rel="manifest" href="/site.webmanifest">

  <!-- JSON-LD Schema (the graph pattern) -->
  <script type="application/ld+json">
  {
    "@context": "https://schema.org",
    "@graph": [
      {
        "@type": "Organization",
        "@id": "https://example.com/#organization",
        "name": "Example Business",
        "url": "https://example.com/",
        "sameAs": [
          "https://www.linkedin.com/company/example",
          "https://www.facebook.com/example",
          "https://www.wikidata.org/wiki/Q0000000"
        ]
      },
      {
        "@type": "WebSite",
        "@id": "https://example.com/#website",
        "url": "https://example.com/",
        "publisher": { "@id": "https://example.com/#organization" }
      },
      {
        "@type": "WebPage",
        "@id": "https://example.com/exact-current-url/#webpage",
        "url": "https://example.com/exact-current-url/",
        "isPartOf": { "@id": "https://example.com/#website" }
      }
    ]
  }
  </script>

  <!-- Stylesheet (projection layer entry point) -->
  <link rel="stylesheet" href="/styles.css">
</head>

Every page on every site has this head block, populated per page. Cross reference framework-schema.md for the complete schema graph pattern. Cross reference framework-technicalseo.md for the meta tag matrix.

6.2 Semantic Body Skeleton

<body>
  <a href="#main-content" class="skip-link">Skip to content</a>

  <header role="banner">
    <nav aria-label="Primary navigation">
      <ul>
        <li><a href="/">Home</a></li>
        <li><a href="/services/">Services</a></li>
        <li><a href="/about/">About</a></li>
        <li><a href="/contact/">Contact</a></li>
      </ul>
    </nav>
  </header>

  <main id="main-content">
    <article>
      <header>
        <h1>Entity Rich Page Title Containing The Primary Topic</h1>
        <p class="lead">Answer first paragraph 40 to 75 words placed
           immediately under the H1, leading with the most extractable
           statement of what the page is about. This is the passage
           AI engines preferentially extract for direct answer queries.</p>
      </header>

      <section aria-labelledby="section-1-heading">
        <h2 id="section-1-heading">Descriptive Question Style H2</h2>
        <p>Answer first paragraph under each H2, 40 to 75 words,
           directly addressing the H2 question with a citeable
           statement followed by supporting detail.</p>
      </section>

      <section aria-labelledby="comparison-heading">
        <h2 id="comparison-heading">When Comparison Is The Right Format</h2>
        <table>
          <caption>Comparison of relevant options</caption>
          <thead>
            <tr><th scope="col">Option</th><th scope="col">Property A</th><th scope="col">Property B</th></tr>
          </thead>
          <tbody>
            <tr><td>Choice 1</td><td>Value</td><td>Value</td></tr>
            <tr><td>Choice 2</td><td>Value</td><td>Value</td></tr>
          </tbody>
        </table>
      </section>

      <section aria-labelledby="faq-heading">
        <h2 id="faq-heading">Frequently Asked Questions</h2>
        <details>
          <summary>How does the first question phrase the query as users ask it?</summary>
          <p>40 to 75 word answer that directly addresses the question with
             specific facts, dates, prices, or other extractable claims.</p>
        </details>
        <details>
          <summary>How does the second question phrase another natural query?</summary>
          <p>Another 40 to 75 word answer following the same pattern.</p>
        </details>
      </section>
    </article>

    <aside aria-label="Related content">
      <h2>Related</h2>
      <ul>
        <li><a href="/related-1/">Related page one descriptive anchor</a></li>
        <li><a href="/related-2/">Related page two descriptive anchor</a></li>
      </ul>
    </aside>
  </main>

  <footer role="contentinfo">
    <p>Crafted by <a href="https://thatdeveloperguy.com/">ThatDeveloperGuy.com</a></p>
    <address>
      <p>Business Name</p>
      <p>Street Address, City, State ZIP</p>
      <p>Phone: <a href="tel:+14176712606">417 671 2606</a></p>
    </address>
  </footer>

  <!-- FAQPage schema mirroring visible content above -->
  <script type="application/ld+json">
  {
    "@context": "https://schema.org",
    "@type": "FAQPage",
    "mainEntity": [
      {
        "@type": "Question",
        "name": "How does the first question phrase the query as users ask it?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "40 to 75 word answer that directly addresses the question..."
        }
      }
    ]
  }
  </script>
</body>
</html>

This skeleton is the canonical substrate pattern. Every page on every site is built against this skeleton. Content varies per page; structure does not.

6.3 Content Blocks: The Answer First Pattern

Under every H2, the first paragraph is the answer. 40 to 75 words. Direct. Citeable. Specific.

Bad: "Let's talk about pricing. There are many factors to consider when determining the cost of a service. Different providers offer different rates. Some include extras. Some do not. Pricing varies..."

Good: "Half day guided fishing trips on Lake Taneycomo start at $300 for up to two anglers, including all tackle and bait. Full day trips run $500. We offer trout focused trips year round and bass trips on Table Rock seasonally. Book at 555 123 4567."

The good version is extractable. The bad version is not. The good version supplies dates, prices, locations, and a contact method. AI engines preferentially extract specifics. The bad version supplies hedges and generalities. AI engines deprioritize hedges.

This pattern repeats under every H2 in the document. The answer first paragraph is followed by supporting detail, examples, tables, lists, or expanded discussion. The first 40 to 75 words carry the citation weight.

6.4 Internal Link Architecture

Every page is a hub. Every page links to:

The pillar page for its topical cluster (when the current page is a spoke). The spoke pages within its topical cluster (when the current page is a pillar). Adjacent related pages that share entities or topics. Authoritative outbound sources for cited facts (.gov, .edu, standards bodies, primary sources).

Anchor text is descriptive. No "click here." No "read more" without context. The anchor itself communicates the topic of the destination page. This is both an accessibility requirement and a citation signal.

Cross reference: framework-internallinking.md for the full hub and spoke topology and anchor text discipline.

6.5 Entity Declarations

The substrate explicitly names entities: business name, owner name, service area, named services, partner brands, geographic boundaries, certifications, awards. AI engines use these as entity matches for entity salience scoring.

<p>Greenough's Guide Service operates on Lake Taneycomo near Branson,
   Missouri, in Taney County. Guide Captain Keith Greenough is licensed
   by the Missouri Department of Conservation and has been guiding trout
   fishing in the White River system since 1998.</p>

This paragraph contains: business name, water body, geographic locality, county, geographic state, owner name, role, licensing authority, license period start, named river system, named species. Every entity is explicit. Every entity is matchable against external knowledge graphs.

Cross reference: framework-entitysalience.md for entity engineering depth.

6.6 Freshness Signals

Use <time> elements with datetime attributes for any date relevant to the content. Schema includes datePublished and dateModified. Visible last updated stamps where appropriate.

<p>Last updated <time datetime="2026-05-11">May 11, 2026</time></p>

AI engines weight recency for many query categories. Time stamps that are both visible and machine readable in <time datetime> format reinforce freshness signal.

6.7 Provenance And Citations

Inline outbound citations to authoritative sources, with explicit reference sections where appropriate. Princeton GEO study showed inline citations boost extraction probability by approximately 30%.

<p>The Missouri Department of Conservation 
   <a href="https://mdc.mo.gov/fishing/regulations">stocks Lake Taneycomo
   with rainbow trout</a> at a rate that supports a year round trout fishery.</p>

The link itself is the citation. The destination is authoritative. The anchor text is descriptive. AI engines treat this pattern as evidence based content.

7. The Projection Layer

The projection is everything that composites on top of the substrate to produce the human experience.

7.1 CSS As The Visual Composer

CSS does the heavy lifting of the projection. Modern CSS can produce nearly any visual experience without requiring JavaScript:

Layout: Grid, Flexbox, container queries, grid-template-areas, subgrid, anchor positioning. Visual depth: backdrop filter, mask image, clip path, gradients, multiple backgrounds, blend modes. Motion: CSS transitions, keyframe animations, scroll driven animations, view transitions API. Typography: variable fonts, font feature settings, text wrap balance, custom properties for fluid scales. Decoration: pseudo elements (::before, ::after), SVG masks, generated content.

The projection layer can be aggressive in visual styling without touching the substrate. The H1 can be repositioned visually via Grid placement. Background images can layer behind text via background-image. Decorative graphics can be added via SVG siblings or pseudo elements without entering the content DOM.

7.2 JavaScript As Enhancement Only

JavaScript in a content first architecture is enhancement, never delivery. The five categories of permitted JS:

Interaction: form submission, modal triggers, accordion expansion (when the content is already in DOM). Animation: GSAP, Three.js, Motion One, Lottie, scroll triggered transitions, parallax, particle effects. Lazy media: defer loading of below the fold images, embeds, video. Micro UX: tooltips, copy to clipboard buttons, smooth scroll, focus management. Analytics and tracking: GA4, GSC verification, conversion pixels.

What JavaScript must never do:

Inject primary content into the page after load. Construct headings, body copy, FAQs, tables, or schema client side. Fetch content from an API as the source of truth for what appears on the page. Wrap the entire page as a JS framework root element that requires hydration to display content.

7.3 Permitted Patterns

Concrete examples of the projection working correctly:

/* Hero section with substrate content composited under a cinematic visual layer */
.hero {
  position: relative;
  min-height: 100vh;
  display: grid;
  place-items: center;
  isolation: isolate;
}

.hero::before {
  content: "";
  position: absolute;
  inset: 0;
  background: url("/hero-image.jpg") center/cover;
  z-index: -2;
}

.hero::after {
  content: "";
  position: absolute;
  inset: 0;
  background: linear-gradient(
    180deg,
    rgba(0, 0, 0, 0) 0%,
    rgba(16, 100, 124, 0.7) 100%
  );
  z-index: -1;
}

.hero h1 {
  font: 300 clamp(3rem, 8vw, 8rem) / 0.9 "Cormorant Garamond", serif;
  font-style: italic;
  color: white;
  text-align: center;
  max-inline-size: 20ch;
}

.hero .lead {
  font: 400 clamp(1rem, 1.5vw, 1.25rem) / 1.6 "Manrope", sans-serif;
  color: rgba(255, 255, 255, 0.9);
  max-inline-size: 60ch;
  margin-block-start: 2rem;
}

The HTML for this hero is the simple substrate from section 6.2. The visual transformation lives entirely in CSS. GPTBot reads the H1 and lead paragraph. The human sees the cinematic composition.

7.4 Forbidden Patterns

Concrete examples of the projection working incorrectly:

// FORBIDDEN: content delivered via JS
function ServicePage() {
  const [services, setServices] = useState([]);

  useEffect(() => {
    fetch("/api/services")
      .then(res => res.json())
      .then(data => setServices(data));
  }, []);

  return (
    <div>
      <h1>Our Services</h1>
      {services.map(s => <ServiceCard key={s.id} {...s} />)}
    </div>
  );
}

GPTBot sees: <div><h1>Our Services</h1></div>. No services. The page is empty to extraction.

<!-- FORBIDDEN: content hidden behind JS-only interaction -->
<div class="tabs">
  <button onclick="showTab(1)">FAQ</button>
  <div id="tab1" style="display:none">
    <p>Question and answer content here.</p>
  </div>
</div>

The content is in the DOM but display: none. Crawlers can detect hidden content and Google penalizes cloaking. The fix is <details> and <summary> which keep content extractable while collapsed.

<!-- FORBIDDEN: visual reorder breaking reading order -->
<div style="display: flex; flex-direction: row-reverse">
  <h1>The conclusion appears first visually</h1>
  <p>The introduction appears last visually but first in source</p>
</div>

Reading order is broken. Screen readers announce introduction first, conclusion second. Visually the human sees conclusion first. The fix is to put the elements in the intended reading order in source and let the visual layer position them logically.

8. Implementation Patterns By Stack

The doctrine applies regardless of stack. Implementation differs.

8.1 Plain HTML / Static Generation

Easiest case. Author HTML directly or generate at build time. Every byte of the substrate is in the file. No JavaScript runtime concerns.

Recommended for: marketing sites, local business sites, portfolio sites, content sites under 1,000 pages, any project where the team has the discipline to write HTML directly.

Tooling: Hugo, 11ty, Eleventy, plain HTML files, Pandoc, custom shell scripts. Reference: framework-astrohugo.md.

8.2 Astro

Astro produces static HTML by default. Components run at build time. JavaScript hydration is opt in per component (Astro Islands). The architecture aligns with content first by default.

Patterns:

Page level: write .astro components that produce semantic HTML. No client side JS by default. Hydration: use client:load, client:visible, client:idle directives only for interactive widgets, never for content delivery. Schema: server render JSON-LD in the page frontmatter or in a layout component.

Cross reference: framework-astrohugo.md.

8.3 Next.js / Nuxt / SvelteKit

These frameworks support content first when configured correctly. Use static generation (SSG) or server side rendering (SSR) for content pages. Avoid pure client side rendering for primary content.

Patterns:

App Router (Next.js 13+): server components by default. Use 'use client' only for components that genuinely need client side state. Pages Router: getStaticProps or getServerSideProps for content pages. Never fetch content in useEffect for primary page content. Schema: generate JSON-LD in getStaticProps or in a Server Component, never in client components. ISR: incremental static regeneration is fine for content that updates. The substrate is still server rendered.

Cross reference: framework-nextjs.md.

8.4 React / Vue / Svelte SPA

Last resort. Pure client side rendering should be reserved for logged in applications, admin dashboards, and tools that are not meant to be indexed.

When SPA is unavoidable for a public site:

Prerendering: use prerender.io, Vercel's automatic prerendering, or a build time prerender pass. The substrate must be in the first byte for indexable routes. Static export: if the framework supports static export (Next.js, Nuxt), use it. Dynamic rendering: detect bots by user agent and serve a server rendered version. This is a fallback, not a strategy.

Cross reference: framework-react.md.

8.5 WordPress

WordPress is server side by default. Content first works naturally. The risks are theme dependent (some themes inject content via JS), plugin dependent (some page builders inject content via JS), and configuration dependent (some hosts add aggressive caching that breaks freshness).

Patterns:

Theme: use themes that produce semantic HTML in templates. Avoid heavy page builders that produce nested div structures. Plugins: use Yoast or Rank Math for SEO output. Schema lives server side, in head. Caching: full page cache fine. Verify cached HTML contains content (curl test against the cache).

Cross reference: framework-wordpress.md.

8.6 Shopify

Shopify Liquid renders server side. Content first works by default. The risks are theme dependent (some themes use JS heavy components) and app dependent (some apps inject content via JS).

Cross reference: framework-shopify.md.

8.7 Webflow

Webflow renders server side. Content first works by default. The visual editor produces semantic HTML when the builder uses semantic elements (heading vs div) and avoids visual hierarchies that imply different reading orders.

Cross reference: framework-webflow.md.

8.8 Cross Stack Translation

Every pattern in this framework translates to every supported stack. The bridge document is framework-cross-stack-implementation.md, which provides side by side examples of each pattern in HTML, React, Vue, Svelte, Next.js, Nuxt, SvelteKit, Astro, Hugo, 11ty, Remix, WordPress, Shopify, and Webflow.

9. Validation Methodology

A site claiming content first architecture must be verified. Five validation methods, each catching different failure modes.

9.1 The curl Test

The fastest, most definitive test. If curl sees the content, AI crawlers see the content.

# Test the page sees the expected content
curl -A "GPTBot" -s https://example.com/page | grep -o "<h1>[^<]*</h1>"

# Test the page contains the schema
curl -A "ClaudeBot" -s https://example.com/page | grep -c "application/ld+json"

# Test the page contains the FAQPage schema specifically
curl -A "PerplexityBot" -s https://example.com/page | grep -o "FAQPage"

# Full body inspection
curl -A "GPTBot" https://example.com/page > /tmp/page.html
wc -l /tmp/page.html
grep -E "<h1>|<h2>|<h3>" /tmp/page.html

If the H1 is not in the curl output, the page violates Rule 1. Fix the rendering strategy before any other work.

9.2 The Disable JavaScript Test

Browser based test for visual verification. In Chrome DevTools: Cmd+Shift+P → "Disable JavaScript" → reload page.

What should still work with JS disabled:

All primary content visible. All navigation links functional. All headings present. All FAQ content readable (collapsed via <details> but expandable by click). All tables and lists present. Visual styling intact (CSS still loads).

What can break with JS disabled:

Animations. Form submission (graceful degradation: forms can submit via standard POST without JS). Carousels (graceful degradation: show all slides as a static list). Interactive widgets that require JS by definition (these should not contain primary content).

If primary content is missing with JS disabled, the page violates Rule 1.

9.3 The View Source Test

Right click → View Page Source. The actual HTML the server returned. Not the rendered DOM in DevTools (which shows post hydration state).

Compare the view source to the rendered DOM. The substrate should be in view source. The projection (computed styles, applied animations, hydrated state) is in the rendered DOM. If primary content is only in the rendered DOM and not in view source, the page violates Rule 1.

9.4 The AI Crawler Simulation

Use a tool that fetches a URL as a specified user agent and shows the resulting HTML.

Google Search Console URL Inspection → Test Live URL → View Tested Page. Shows the HTML Googlebot received. Screaming Frog with custom user agent set to GPTBot, ClaudeBot, etc. Crawl the site and inspect what each bot sees. curl -A "[bot user agent]" as above. PromptWatch AI Search Crawler Inspector or similar tools that show what AI bots extract.

Verify the rendered output for each major bot user agent contains the expected content.

9.5 Lighthouse and Validation Tools

Lighthouse SEO audit: should show 100/100 with no issues for indexable pages. Google Rich Results Test: validates schema and shows extracted structured data. Schema.org Validator: validates JSON-LD syntax and property correctness. axe DevTools: validates accessibility, which correlates strongly with semantic HTML quality. W3C HTML Validator: validates the HTML structure for spec compliance.

A content first page should pass all five with zero or near zero violations.

10. Common Violations And Fixes

common_violations:

  empty_react_root:
    symptom: "curl returns <div id='root'></div> as the body content"
    fix: "Migrate to Next.js with SSG/SSR, or add prerender.io to the build, or convert to Astro"
    severity: "critical"

  schema_via_gtm:
    symptom: "Schema appears in DevTools but not in view source"
    fix: "Move schema to server rendered HTML in document head, output by template engine"
    severity: "critical"

  faq_in_js_accordion:
    symptom: "FAQ content visible after clicking but missing from initial DOM"
    fix: "Replace JS accordion with <details>/<summary>, keep content in DOM"
    severity: "high"

  tabbed_content:
    symptom: "Tabs 2 through N missing from initial DOM, loaded on click"
    fix: "Render all tab content in DOM, use CSS to show/hide, or use anchor links per tab"
    severity: "high"

  hero_image_as_img_in_overlay:
    symptom: "Hero <img> tag wraps content, creating unnecessary DOM nesting"
    fix: "Move hero image to CSS background-image, leave content semantic and unwrapped"
    severity: "medium"

  divs_replacing_semantic_elements:
    symptom: "<div class='article'>, <div class='heading'>, <div class='nav'>"
    fix: "Replace with <article>, <h1>-<h6>, <nav> respectively"
    severity: "high"

  visual_reorder_breaking_reading_order:
    symptom: "Screen reader announces content in different order than visual presentation"
    fix: "Reorder source HTML to match logical reading order; use CSS only for visual presentation"
    severity: "high"

  client_side_canonical_injection:
    symptom: "Canonical URL set by JavaScript after page load"
    fix: "Server render <link rel='canonical'> in head"
    severity: "critical"

  delayed_content_fetch:
    symptom: "Page loads with skeleton, then fetches content via XHR"
    fix: "Server render content in first byte; skeleton states are acceptable for slow APIs only on non indexable pages"
    severity: "critical"

  cloaking_via_display_none:
    symptom: "Content present in DOM but display:none for visual reasons"
    fix: "Either remove from DOM (if not meant to be indexed) or display it (if meant to be indexed); never both"
    severity: "critical"

  off_screen_positioning_for_hiding:
    symptom: "position: absolute; left: -9999px on content"
    fix: "Use <details>/<summary> for collapsed content; never off screen positioning to hide indexable content"
    severity: "critical"

  carousel_with_lazy_slides:
    symptom: "Carousel slide 1 in DOM; slides 2+ only loaded when navigated to"
    fix: "Render all carousel content in DOM, use CSS transform/opacity for transitions"
    severity: "medium"

  modal_with_primary_content:
    symptom: "Important content (pricing, features) only in JS triggered modal"
    fix: "Move content to the main page; modals are for confirmations, not primary content"
    severity: "high"

  image_as_text:
    symptom: "Text rendered as a graphic (often hero headlines as PNG)"
    fix: "Use actual text in HTML with CSS typography; reserve images for actual images"
    severity: "high"

  iframe_blocked_extraction:
    symptom: "Primary content inside iframe (third party widget, embedded form)"
    fix: "Inline the content; iframes are extraction barriers"
    severity: "medium"

11. The Content First Audit Rubric

A 30 point rubric. Score each page or representative sample of pages. World class content first implementation: 27+/30.

#	Criterion	Pass/Fail
CF1	curl returns the H1 text in the response body
CF2	curl returns all H2 headings in the response body
CF3	curl returns primary body content (lead paragraph, FAQ answers)
CF4	curl returns JSON-LD schema in the response body
CF5	curl returns FAQPage schema with Question/Answer pairs (if FAQ present)
CF6	View source matches rendered DOM for content (not just shell)
CF7	JavaScript disabled: page is fully readable with styles intact
CF8	JavaScript disabled: navigation works (anchor links functional)
CF9	JavaScript disabled: FAQ expand/collapse works (via `<details>`)
CF10	Semantic elements used: `<article>`, `<section>`, `<main>`, `<nav>`, etc. (not all divs)
CF11	One `<h1>` per page; logical H2 to H6 hierarchy; no skipped levels
CF12	DOM order matches logical reading order; CSS visual reorder does not break reading
CF13	Lead paragraph 40 to 75 words placed immediately under H1
CF14	Answer first paragraphs (40 to 75 words) under every H2
CF15	Comparison data uses `<table>` not divs
CF16	Numbered processes use `<ol>` not `<ul>` not prose
CF17	FAQs use `<details>` and `<summary>` with content in DOM
CF18	Schema graph with @id and sameAs in document head
CF19	Canonical, robots, Open Graph, Twitter Card meta tags all server rendered
CF20	Hero images via CSS background, not `<img>` wrapper around content
CF21	Decorative graphics via pseudo elements or SVG siblings, not content wrappers
CF22	Visual animations enhance the projection without blocking the substrate
CF23	All internal links use descriptive anchor text
CF24	All explicit entities (business name, owner, location, services) declared in text
CF25	datePublished and dateModified present in schema and visible where appropriate
CF26	Inline outbound citations to authoritative sources where claims warrant
CF27	Lighthouse SEO score 95+
CF28	Rich Results Test validates all schema with zero errors
CF29	Manual inspection: no cloaking patterns, no off screen hiding, no JS only content
CF30	GSC URL Inspection: rendered HTML matches view source for primary content

Score: 30. Tiers:

27 to 30: World class content first implementation. Ready for AI extraction. 22 to 26: Good. Most patterns correct, minor cleanup needed. 17 to 21: Acceptable for visual first legacy site. Plan retrofit. Below 17: Failing the doctrine. Content is not extractable. Stop other framework work until fixed.

12. Retrofit Strategy For Existing Visual First Sites

Existing sites built visual first cannot always be migrated overnight. A staged retrofit strategy:

12.1 Assess Current State

Run the audit rubric on the existing site. Identify which violations are critical (Rules 1, 4, 5) versus which are stylistic (Rules 2, 3 in cosmetic ways). Critical violations get priority.

12.2 Retrofit Priority Order

retrofit_priority:

  phase_1_emergency_fixes:
    - Move schema from JS injection to server rendered head
    - Convert FAQ accordions from JS to <details>/<summary>
    - Server render canonical, robots, meta tags
    - Remove display:none from content meant to be indexed

  phase_2_substrate_fixes:
    - Migrate rendering strategy from CSR to SSG/SSR/ISR
    - Replace divs with semantic elements
    - Add lead paragraph and answer first patterns under each H2
    - Fix DOM order where visual reorder broke reading order

  phase_3_substrate_enrichment:
    - Add comparison tables where prose can be tabularized
    - Add numbered lists where processes are described
    - Add entity declarations in text
    - Add inline outbound citations

  phase_4_validation:
    - Run full audit rubric
    - GSC URL Inspection on representative pages
    - curl tests as GPTBot, ClaudeBot, PerplexityBot
    - Lighthouse and Rich Results Test on every priority page

12.3 Risk Management

Sites with existing rankings should retrofit in staging first, validate citation impact on a small set of pages before rolling out portfolio wide. Migrations covered in framework-migration.md.

13. The Build Sequence

For every new build, in order:

1. READ THIS DOCUMENT.
   Internalize the doctrine before opening an editor.

2. AUTHOR THE SUBSTRATE.
   Write semantic HTML with all content, schema, and structure.
   Verify it makes sense as a document with no CSS or JS.

3. APPLY THE PROJECTION.
   Add CSS for visual treatment.
   Add JavaScript for enhancement.
   Verify the substrate is still intact after the projection.

4. VALIDATE.
   Run the audit rubric.
   curl as GPTBot. Inspect the response.
   Disable JavaScript and reload. Verify content remains.

5. APPLY OTHER FRAMEWORKS.
   Now Level 1 Foundation frameworks. Then Level 2.
   Then specialized, authority, monitoring, conversion.

Order matters. Applying Level 1 Foundation before authoring the substrate is the failure mode this doctrine prevents.

14. End Of Framework Document

This is Level 0. Every other framework in this library serves this doctrine. Schema implementation, internal linking, AI citations, E-E-A-T, entity salience, all of it presupposes a content first substrate. Without the substrate, the rest is decoration applied to a site no bot can read.

Apply this doctrine first. Then everything else compounds.

Companion documents:

framework-masterindex.md — Library navigation and dependency graph (Level 0 at top, then Level 1 through 6)
framework-technicalseo.md — Technical foundation, crawl access, indexing (Level 1)
framework-schema.md — Schema implementation patterns (Level 1)
framework-pageexperience.md — Performance and Core Web Vitals (Level 1)
framework-internallinking.md — Hub and spoke architecture (Level 2)
framework-aicitations.md — AI citation mechanics across engines
framework-headless.md — Headless CMS implementation patterns
framework-react.md — Pure CSR SPA SEO when SSR/SSG is not in scope
framework-cross-stack-implementation.md — Pattern translation across stacks
framework-accessibility.md — A11y patterns that align with semantic HTML

Document version: 1.0 Created: 2026-05-11 Maintained by: ThatDeveloperGuy

This document is the architectural axiom. Read it before any other framework. Apply it before any build decision. Verify adherence before considering any site ready for the Level 1 Foundation frameworks.

Owner: Joseph W. Anady — ThatDeveloperGuy — SDVOSB Contact: 417 671 2606 | joseph.w.anady@icloud.com

Frequently asked questions

What is content-first architecture and how is it different from visual-first?

Content-first inverts the default build order. You first author a semantic HTML document containing all content, headings, schema, and links (the substrate), then composite CSS and JavaScript on top (the projection). Visual-first, the 2026 default, builds component scaffolding and injects content at runtime, shipping an empty shell. AI crawlers read only the first-byte substrate, so visual-first sites are invisible to AI extraction even when every framework is installed.

Why can't AI crawlers see JavaScript-rendered content?

AI crawlers like GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, and Applebot Extended execute no JavaScript. They send a GET request, parse the response body as HTML, and extract from that parse tree. There is no JS runtime, hydration step, or second pass. Whatever is in the first-byte server response is the entire extraction surface. If the first byte is an empty <div id="root">, the bot sees an empty page.

What are the five hard rules of content-first architecture?

Rule 1: No content via JavaScript; primary content must be in the first byte. Rule 2: DOM order is reading order is citation order; CSS may reorder visually but source order stays logical. Rule 3: Use semantic HTML5 elements, not styled divs. Rule 4: Schema lives in head, server-rendered, in the first byte. Rule 5: The visual layer never blocks extraction, no display:none or off-screen hiding of indexable content.

How do I test whether a page is truly content-first?

Use the curl test: run curl -A "GPTBot" -s https://example.com/page | grep "<h1>". If the H1 is not returned, the page violates Rule 1. The page also offers a Disable JavaScript test (content must remain readable with styles intact), a View Source test (substrate must be in source, not just rendered DOM), AI crawler simulation via GSC URL Inspection, and Lighthouse/Rich Results validation.

Which HTML structures get cited most by AI engines?

The page cites reproducible citation multipliers: HTML tables versus prose at 4.2x (kime.ai), FAQPage JSON-LD at 3.2x (Frase.io), numbered lists for processes at 2.7x, bullet lists at 1.8x, expert quotes at 1.41x and statistics at 1.30x (Princeton GEO study). Inline outbound citations also boost extraction roughly 30%. The GEO16 study found Metadata/Freshness, Semantic HTML Structure, and Valid Structured Data most correlated with citation, all living in the substrate.

Want this framework implemented on your site?

ThatDevPro ships these frameworks as productized services. SDVOSB-certified veteran owned. Cassville, Missouri.

See Engine Optimization service ›