Content-First Architecture: semantic HTML as substrate, visual as projection
A comprehensive doctrine reference establishing the architectural axiom that governs every other framework in this library. Content First Architecture is the position that the structural, semantic,…
The Architectural Doctrine — Semantic HTML as Substrate, Visual Layer as Projection, and the Build Philosophy That All Other Frameworks Serve
A comprehensive doctrine reference establishing the architectural axiom that governs every other framework in this library. Content First Architecture is the position that the structural, semantic, machine readable HTML document is the substrate of every site, and the visual frontend (CSS, JavaScript, animations, graphics, interactivity) is a projection that sits on top of that substrate without obstructing it. Crawlers and AI engines read the substrate. Humans see the projection. Both are served by the same HTML file. This document codifies the principle, the empirical research that justifies it, the five hard rules that enforce it, the substrate and projection layer specifications, and the audit methodology that verifies a build adheres to it. Dual purpose: installation doctrine and architectural audit.
Cross stack implementation note: the code samples in this framework are written in plain HTML for clarity. For React, Vue, Svelte, Next.js, Nuxt, SvelteKit, Astro, Hugo, 11ty, Remix, WordPress, Shopify, and Webflow equivalents of every pattern below, see
framework-cross-stack-implementation.md. For pure client rendered SPAs (no SSR/SSG), seeframework-react.md. The doctrine applies regardless of stack. The implementation differs.
1. Document Purpose
This is Level 0. It precedes every other framework in the library.
Content First Architecture is not a technique. It is the architectural axiom that determines whether the rest of the library has any effect. Schema markup, internal linking, E-E-A-T signaling, entity salience, AI citation optimization, technical SEO, page experience: every framework in this library makes a foundational assumption that crawlers and AI engines can actually read what the site delivers. That assumption is only true when the site is built content first. When a site is built visual first (the default in modern web tooling), the frameworks still install correctly, the audit rubrics still pass on paper, and the result is invisible to AI extraction anyway because the content never reaches the bot.
The frameworks library cannot defend itself against this failure mode without an explicit doctrine. Individual frameworks contain technical guidance against client side rendering. None of them name the inversion as the build philosophy. Without a doctrine document, a builder (developer or AI assistant) can follow the dependency graph, install every framework in the prescribed order, and ship a beautifully styled React SPA that delivers an empty <div id="root"> to GPTBot. Every framework was implemented. Zero citations result.
This document closes that gap. Read it before any client engagement. Reference it during every build decision. Apply it as the gating criterion before any other framework is invoked.
1.1 Required Understanding
Before reading this framework, the builder should be familiar with:
- HTML semantics —
<article>,<section>,<main>,<nav>,<aside>,<header>,<footer>, heading hierarchy H1 through H6, and the distinction between semantic and presentational markup - CSS layout fundamentals — Grid, Flexbox, positioning, pseudo elements (
::before,::after), background images, transforms - JavaScript rendering models — SSG, SSR, ISR, CSR, hydration, and the difference between content delivery and interaction enhancement
- Crawler behavior — how Googlebot, GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Applebot, and Google-Extended handle JavaScript (or do not)
- Schema.org JSON-LD — the @id graph pattern and where structured data lives in the document
1.2 Document Scope
Covers: the inversion principle, the empirical research justifying it, the five hard rules, substrate layer specification (everything in the content document), projection layer specification (everything in the visual layer), stack specific implementation patterns, validation methodology, common violations, audit rubric. Touches but does not exhaust: schema implementation (framework-schema.md), technical SEO foundations (framework-technicalseo.md), JavaScript rendering decisions (framework-react.md), accessibility (framework-accessibility.md), AI citation mechanics (framework-aicitations.md).
2. Client Variables Intake
content_first_intake:
current_architecture: "" # "content_first" | "visual_first" | "hybrid" | "unknown"
rendering_strategy: "" # "ssg" | "ssr" | "isr" | "csr" | "hybrid" | "static_html"
primary_stack: "" # plain_html | nextjs | astro | hugo | nuxt | sveltekit | remix | gatsby | wordpress | shopify | webflow | react_spa | vue_spa
hydration_model: "" # "none" | "selective" | "full" | "islands"
content_in_first_byte: false # does the server return content in initial HTML
schema_in_first_byte: false # is JSON-LD in initial HTML or injected
js_required_for_content: false # is JS execution required to see primary content
retrofit_or_greenfield: "" # "retrofit_existing" | "greenfield_build"
existing_traffic_at_risk: false # are there rankings to protect during transition
build_team: "" # "joseph_solo" | "joseph_plus_ai" | "client_dev_team" | "mixed"
estimated_pages: 0 # total page count
audit_baseline_score: 0 # current G score per GEO16 framework (if known)
The first three values determine whether the engagement is a doctrine installation (greenfield) or a doctrine retrofit (existing site). The fourth through sixth determine the technical severity. Retrofits with significant traffic at risk require staged migration. Greenfield builds get the doctrine applied from the first commit.
3. The Inversion Principle
3.1 The Default and Why It Fails
The default architecture in 2026 modern web development is visual first. A designer produces a visual concept. A developer translates that concept into component scaffolding (React, Vue, Svelte). Content is then injected into the components at runtime through props, fetch calls, or CMS integration. The final HTML delivered to the browser is an empty shell. The visible page is constructed by JavaScript execution in the client.
This default works for human users because their browsers execute JavaScript. The user experiences the constructed page. To the user, the architecture is invisible.
The default does not work for the actors that matter most for citation and ranking in the AI era:
AI crawlers do not execute JavaScript. GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Applebot Extended, Google-Extended, and every AI specific bot consume the first byte server response and nothing else. If the first byte is an empty shell, the bot sees an empty page.
Search crawlers execute JavaScript with delay and partial coverage. Googlebot now renders JavaScript reliably for most sites within hours, but indexing decisions are still influenced by the unrendered HTML. Bing renders less reliably. AI Overviews pull from Google's index but may apply their own extraction passes that do not wait for client side rendering.
Screen readers operate on DOM, not on rendered pixels. Content delivered via JavaScript can be read by screen readers, but only if the JS has executed and the accessibility tree is current. Sites that depend on JS for content fail accessibility audits and degrade user experience for people using assistive technology.
The visual first default produces sites that are invisible to the surface where AI citation, ranking decoupling, and zero click search are happening. Every framework in this library is wasted on a visual first architecture because the bot never sees the content the frameworks optimize.
3.2 The Inversion
Content First Architecture inverts the build order:
Step one. A semantic, machine readable HTML document is authored. This document contains the complete content, heading hierarchy, schema markup, internal linking, entity declarations, FAQ blocks, comparison tables, and metadata. It is the substrate. It is what the bot sees. It is what would be indexed and cited even if no styles were applied at all.
Step two. A visual layer is composed on top of the substrate. CSS positions elements, applies typography, paints backgrounds, layers decorative graphics. JavaScript adds animation, micro interaction, form handling, lazy media loading. The visual layer does not deliver content. The visual layer projects the existing content into a human pleasing experience.
Step three. Both layers ship together in a single HTML response from the server. The bot reads the substrate. The human browser composites the substrate with the visual layer and presents the rendered result.
The inversion is not a tradeoff. It does not sacrifice visual quality for crawlability. Modern CSS (Grid, Flexbox, custom properties, container queries, anchor positioning, scroll driven animations) and modern JavaScript (GSAP, Three.js, Motion One, Lottie) are fully capable of building epic visual experiences on top of a semantic substrate. The constraint is architectural discipline, not visual capability.
The thataiguy.org site in this agency's portfolio is the proof of concept. Static HTML substrate with Three.js WebGL, GSAP cinematic animations, scroll triggered transitions, and a full visual identity, all projected on top of content that GPTBot can read in the first byte.
3.3 The Substrate / Projection Model
The mental model:
Substrate = the structural, semantic HTML document. The content. The meaning. The schema. What the bot extracts. What the screen reader announces. What the AI engine cites.
Projection = the visual experience composited on top of the substrate. The typography. The animations. The colors. The layout. What the human sees.
The substrate is the truth. The projection is the presentation of that truth. The substrate exists with or without the projection. The projection cannot exist without the substrate. This asymmetry is the architecture.
When a developer asks "where does this content go," the answer is always: in the substrate. When a developer asks "how should this look," the answer is always: in the projection. When the question is "where does this schema markup go," the answer is the substrate, in the document head, server rendered, present in the first byte.
3.4 Why This Matters Now
The architecture would be defensible on accessibility and resilience grounds even in 2010. What makes it the gating axiom in 2026 is the empirical surface AI search now represents.
AI search citation rates exceed traditional ranking for many query categories. Independent research found that only around 12% of ChatGPT citations match URLs on Google's first page. The AI surface is a separate competitive channel.
Google AI Overview citation patterns have decoupled from Google rankings. In July 2025, 76% of AIO cited URLs ranked in the organic top 10. By February 2026, only 38% came from the top 10. The remaining 62% came from positions 11 through beyond 100. AIO is no longer a top of SERP feature. It is its own ranking surface, and the inputs that drive it favor structural extractability over ranking position.
The cost of being invisible compounds. Every month a site is invisible to AI extraction is a month of lost citation accrual. Citations build entity authority. Entity authority improves future citation probability. Sites that started content first in 2024 now have two years of compounding advantage over visual first competitors. The window to catch up is not infinite.
This is why Content First Architecture is Level 0 in this library. Every other framework presupposes the bot can read the page. This framework is what makes that presupposition true.
4. The Empirical Foundation
4.1 AI Crawler Behavior
The dominant AI crawlers and their JavaScript handling, as of 2026 baseline observation:
ai_crawler_js_capability:
gptbot:
operator: "OpenAI"
javascript_execution: "none"
consumes: "first byte server response only"
purpose: "training data and ChatGPT Search index refresh"
oai_searchbot:
operator: "OpenAI"
javascript_execution: "none"
consumes: "first byte server response only"
purpose: "ChatGPT Search live retrieval"
chatgpt_user:
operator: "OpenAI"
javascript_execution: "none"
consumes: "first byte server response only"
purpose: "real time fetch when user invokes browse capability"
note: "highest priority bot for live citation"
claudebot:
operator: "Anthropic"
javascript_execution: "none"
consumes: "first byte server response only"
purpose: "training data and Claude search retrieval"
perplexitybot:
operator: "Perplexity"
javascript_execution: "none"
consumes: "first byte server response only"
purpose: "Perplexity search index"
note: "Cloudflare evidence August 2025 of undeclared crawlers using generic Chrome user agents to bypass blocks"
google_extended:
operator: "Google"
javascript_execution: "limited"
consumes: "primarily first byte; partial JS execution"
purpose: "Gemini, Bard, AI Overviews training and retrieval"
applebot_extended:
operator: "Apple"
javascript_execution: "none"
consumes: "first byte server response only"
purpose: "Apple Intelligence training data"
googlebot:
operator: "Google"
javascript_execution: "full (with delay)"
consumes: "first byte for index decisions, rendered HTML for content"
note: "two wave model mostly obsolete since 2025 but indexing decisions still weighted toward unrendered HTML"
bingbot:
operator: "Microsoft"
javascript_execution: "partial"
consumes: "first byte + some rendered content"
note: "feeds Microsoft Copilot in addition to Bing search"
The pattern: every named AI crawler operates on the first byte server response. Search crawlers that render JavaScript still weight the unrendered HTML for index decisions. The first byte is the universal extraction surface.
4.2 Citation Multipliers
Independent research across 2024 through early 2026 has produced reproducible multipliers for specific structural patterns in extraction friendly HTML. The most cited findings:
citation_multipliers:
tables_vs_prose:
multiplier: 4.2
source: "kime.ai 2026 analysis of 10,000 AI citations"
pattern: "HTML <table> with <thead>, <tbody>, <th>, <td> for comparison data"
why: "tables map directly to structured data the model can paraphrase or reformat"
faqpage_schema:
multiplier: 3.2
source: "Frase.io 2025 study"
pattern: "FAQPage JSON-LD with Question/acceptedAnswer pairs matching visible H2 questions and answer paragraphs"
why: "explicit machine readable Q&A relationship; AI engines prefer this for direct answer extraction"
numbered_lists_for_processes:
multiplier: 2.7
source: "kime.ai 2026 analysis"
pattern: "<ol><li>...</li></ol> for sequential steps"
why: "models cite ordered lists when query implies sequence"
bullet_lists:
multiplier: 1.8
source: "kime.ai 2026 analysis"
pattern: "<ul><li>...</li></ul> for unordered enumerations"
why: "scannable structure beats prose paragraphs for feature/option queries"
expert_quotes:
multiplier: 1.41
source: "Princeton GEO study (SIGKDD 2024)"
pattern: "<blockquote> with cite attribute, attributed to named expert"
why: "model treats quotation as a credibility signal"
statistics_citations:
multiplier: 1.30
source: "Princeton GEO study"
pattern: "specific numerical claims with source attribution"
why: "specific numbers are extractable and citeable as facts"
citation_count_increase:
multiplier: 1.30
source: "Princeton GEO study"
pattern: "inline outbound citations to authoritative sources"
why: "outbound citations to .gov/.edu/standards bodies increase the model's confidence in the page"
fully_populated_product_review_schema:
citation_rate: 0.617
source: "industry study cited across multiple sources"
pattern: "Product schema with AggregateRating, Review, Offer, all properties populated"
why: "ecommerce extraction relies on schema completeness for shopping queries"
structured_data_in_ai_overviews:
visibility_uplift: 0.27
source: "Google Search Central 2024 internal study"
pattern: "any valid JSON-LD on the page"
why: "structured data signals trustworthiness and parseability to AIO synthesis"
These are not soft signals. They are reproducible multipliers documented across multiple independent studies. A page that includes all of these patterns extractably (substrate side, server rendered, in first byte) has a fundamentally higher citation surface than a page that delivers the same content prose only in a hydrated SPA.
4.3 The GEO16 Citation Study
The most rigorous citation study to date is the GEO16 framework analysis published as an arxiv preprint in September 2025. The study measured citation behavior across Brave Summary, Google AI Overviews, and Perplexity, scoring 16 content pillars against actual citation outcomes.
Key findings:
The three pillars most strongly correlated with citation: Metadata and Freshness, Semantic HTML Structure, Valid Structured Data.
Pages scoring G ≥ 0.70 with ≥ 12 of the 16 pillars active achieved a 78% cross engine citation rate (cited by two or more of the three engines studied).
Cross engine cited URLs scored 71% higher in overall quality than single engine cited URLs.
The implication for Content First Architecture is direct. The three top citation pillars all live in the substrate layer. Metadata in the document head. Semantic HTML in the body structure. Structured data as JSON-LD in the head. A site built content first hits all three pillars by default. A site built visual first hits zero of them in the first byte.
4.4 Google AI Overview Decoupling
Citation distribution in Google AI Overviews has shifted significantly between July 2025 and February 2026:
aio_citation_source_distribution:
july_2025:
organic_top_10_share: 0.76
positions_11_through_100: 0.24
beyond_top_100: ~0
february_2026:
organic_top_10_share: 0.38
positions_11_through_100: 0.312
beyond_top_100: 0.31
youtube_share_of_beyond_100: 0.182
The shift means AIO citation is no longer determined primarily by ranking position. A site at position 47 in Google organic can be cited in AIO. A site at position 3 may not be cited if the extraction structure fails. The signal that determines extraction is the structural readability of the HTML, not the ranking authority.
This decoupling creates a strategic opening for sites that ship content first regardless of where they sit in traditional rankings.
4.5 Why Server Rendered HTML Wins
The substrate must be in the first byte server response. Not in a hydration payload. Not in an XHR fetch. Not in a useEffect hook. In the HTML that the server returns to the GET request.
The reason is mechanical. AI crawlers send a GET request, receive the response body, parse it as HTML, and extract from the parse tree. There is no JavaScript runtime. There is no hydration step. There is no second pass. Whatever is in the parsed HTML at the moment of parse is the entire surface available for extraction.
Server rendered HTML wins because it puts the substrate in the first byte. Static site generation (SSG), server side rendering (SSR), incremental static regeneration (ISR), and plain static HTML all qualify. Client side rendering (CSR) does not. Selective hydration is acceptable when the unhydrated HTML already contains all primary content. Full hydration where content is injected client side is the failure mode.
The rule that follows: in any rendering decision, if the chosen strategy delivers primary content via client side execution, the strategy is wrong for content. CSR is acceptable only for interactive widgets, logged in interfaces, and content that is intentionally not indexable (admin panels, dashboards behind auth). For anything that should be cited or ranked, the content lives in the first byte.
5. The Five Hard Rules
These are non negotiable. A violation of any of these rules invalidates the architecture and degrades the entire framework library that builds on top of it.
Rule 1: No content via JavaScript
Primary content (headings, body copy, FAQs, tables, schema, internal links) must be present in the first byte server response. JavaScript may enhance the experience but must not deliver content.
Permitted: JS animations, scroll effects, form validation, lazy media loading, micro interactions, GSAP/Three.js/Motion One overlays.
Forbidden: fetching content from an API after page load to display in the visible page. React components that render content from props passed at runtime when those props are not server rendered.
useEffecthooks that populate visible content. JS frameworks shipping<div id="root"></div>and constructing the page client side.
Validation: curl -A "GPTBot" https://example.com/page | grep "<h1>" must return the page's H1. If it does not, the page violates Rule 1.
Rule 2: DOM order is reading order is citation order
Source order in the HTML determines the order in which screen readers announce content, the order in which keyboard navigation tabs through interactive elements, and the order in which extraction parses the content. CSS may visually reorder elements (via Grid grid-template-areas, Flexbox order, absolute positioning, transforms), but the underlying DOM order must remain logical.
Permitted: CSS Grid positioning that visually places the H1 in the bottom right of the hero. Flexbox
flex-direction: row-reversefor visual reasons when source order is the logical reading sequence. Absolute positioning of decorative graphics.Forbidden: Reordering DOM via
order: -1to make a logically later element appear first when the logical reading order should have placed it first in source. Setting tab index to compensate for broken DOM order. Visual hierarchies that imply a different reading sequence than the source.
WCAG technique C27 is the formal specification: visual order should match source order, and where they diverge, source order is what assistive technology and bots receive.
Rule 3: Semantic HTML is the foundation, not styled divs
The document body uses semantic HTML5 elements (<article>, <section>, <main>, <nav>, <aside>, <header>, <footer>, <h1> through <h6>, <ul>, <ol>, <table>, <figure>, <blockquote>, <details>, <summary>) for their semantic meaning, not generic <div> and <span> with styled appearance.
Permitted: Using
<div>for layout containers that have no semantic meaning. Using<span>for inline styling targets within text. Adding CSS classes for visual treatment.Forbidden: Marking up an article with
<div class="article">instead of<article>. Replacing headings with<div class="heading">to bypass default styling. Constructing a table from nested<div>grid cells. Using<div role="button">instead of<button>.
Crawlers and AI engines weight semantic structure as a citation signal. The GEO16 pillar Semantic HTML measures this directly. Pages that use semantic elements receive significantly higher extraction priority than visually identical pages built from generic divs.
Rule 4: Schema lives in head, server rendered, in the first byte
JSON-LD structured data is placed in the document <head> (or at the start of <body> before primary content) and is server rendered. It must be present in the first byte response. Schema injected by JavaScript at runtime is invisible to AI crawlers.
Permitted: Multiple
<script type="application/ld+json">blocks. Single graph using the @id pattern with sameAs relationships. Schema generated server side from CMS data at build or request time. Validated through Google Rich Results Test before deploy.Forbidden: Schema generated client side by JavaScript libraries. Schema injected via Google Tag Manager (this is JS execution, AI crawlers do not run GTM). Schema present only in browser DevTools after page load. Schema that uses Microdata or RDFa attributes on visible HTML (legacy formats, less reliable extraction).
Cross reference: framework-schema.md for the complete schema specification including the @id graph pattern.
Rule 5: Visual layer never blocks extraction
The visual layer (CSS, JavaScript, animations, graphics, interactive components) must operate on top of the substrate without obstructing, hiding, or replacing content meant to be extracted.
Permitted: CSS
position: absoluteto layer decorative graphics over content. Background images on section containers behind text.<details>and<summary>for collapsed FAQ content (content stays in DOM, just visually collapsed by default). Lazy loaded images and embeds below the fold.Forbidden:
display: noneon content meant to be indexed. Off screen positioning (position: absolute; left: -9999px) to hide content visually while keeping it in DOM (treated as cloaking by Google). Tabs and accordions that require JavaScript to reveal content. Carousels where slides 2 through N are not in the DOM until JS rotates them in. Modal overlays that contain primary content only revealed by click.
The <details> and <summary> pattern is the safe FAQ accordion: content is in the DOM, visually collapsed by default, expanded by browser native click handling without JS. Bots see the full content. Humans get the expand interaction.
6. The Substrate Layer
The substrate is the structural HTML document. This section specifies what lives in it and how.
6.1 Document Head Requirements
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- Title and description -->
<title>Page Title 50 to 60 Characters With Primary Keyword Forward</title>
<meta name="description" content="Page description 140 to 160 characters with active voice and call to action.">
<!-- Canonical -->
<link rel="canonical" href="https://example.com/exact-current-url/">
<!-- Robots -->
<meta name="robots" content="index, follow, max-image-preview:large, max-snippet:-1">
<!-- Open Graph -->
<meta property="og:type" content="website">
<meta property="og:title" content="Page Title">
<meta property="og:description" content="Page description">
<meta property="og:image" content="https://example.com/og-image-1200x630.jpg">
<meta property="og:url" content="https://example.com/exact-current-url/">
<!-- Twitter Card -->
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Page Title">
<meta name="twitter:description" content="Page description">
<meta name="twitter:image" content="https://example.com/twitter-image-1200x675.jpg">
<!-- Theme -->
<meta name="theme-color" content="#10647C">
<!-- Icons -->
<link rel="icon" type="image/svg+xml" href="/favicon.svg">
<link rel="icon" type="image/png" sizes="32x32" href="/icon-32.png">
<link rel="apple-touch-icon" href="/apple-touch-icon.png">
<link rel="manifest" href="/site.webmanifest">
<!-- JSON-LD Schema (the graph pattern) -->
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@graph": [
{
"@type": "Organization",
"@id": "https://example.com/#organization",
"name": "Example Business",
"url": "https://example.com/",
"sameAs": [
"https://www.linkedin.com/company/example",
"https://www.facebook.com/example",
"https://www.wikidata.org/wiki/Q0000000"
]
},
{
"@type": "WebSite",
"@id": "https://example.com/#website",
"url": "https://example.com/",
"publisher": { "@id": "https://example.com/#organization" }
},
{
"@type": "WebPage",
"@id": "https://example.com/exact-current-url/#webpage",
"url": "https://example.com/exact-current-url/",
"isPartOf": { "@id": "https://example.com/#website" }
}
]
}
</script>
<!-- Stylesheet (projection layer entry point) -->
<link rel="stylesheet" href="/styles.css">
</head>
Every page on every site has this head block, populated per page. Cross reference framework-schema.md for the complete schema graph pattern. Cross reference framework-technicalseo.md for the meta tag matrix.
6.2 Semantic Body Skeleton
<body>
<a href="#main-content" class="skip-link">Skip to content</a>
<header role="banner">
<nav aria-label="Primary navigation">
<ul>
<li><a href="/">Home</a></li>
<li><a href="/services/">Services</a></li>
<li><a href="/about/">About</a></li>
<li><a href="/contact/">Contact</a></li>
</ul>
</nav>
</header>
<main id="main-content">
<article>
<header>
<h1>Entity Rich Page Title Containing The Primary Topic</h1>
<p class="lead">Answer first paragraph 40 to 75 words placed
immediately under the H1, leading with the most extractable
statement of what the page is about. This is the passage
AI engines preferentially extract for direct answer queries.</p>
</header>
<section aria-labelledby="section-1-heading">
<h2 id="section-1-heading">Descriptive Question Style H2</h2>
<p>Answer first paragraph under each H2, 40 to 75 words,
directly addressing the H2 question with a citeable
statement followed by supporting detail.</p>
</section>
<section aria-labelledby="comparison-heading">
<h2 id="comparison-heading">When Comparison Is The Right Format</h2>
<table>
<caption>Comparison of relevant options</caption>
<thead>
<tr><th scope="col">Option</th><th scope="col">Property A</th><th scope="col">Property B</th></tr>
</thead>
<tbody>
<tr><td>Choice 1</td><td>Value</td><td>Value</td></tr>
<tr><td>Choice 2</td><td>Value</td><td>Value</td></tr>
</tbody>
</table>
</section>
<section aria-labelledby="faq-heading">
<h2 id="faq-heading">Frequently Asked Questions</h2>
<details>
<summary>How does the first question phrase the query as users ask it?</summary>
<p>40 to 75 word answer that directly addresses the question with
specific facts, dates, prices, or other extractable claims.</p>
</details>
<details>
<summary>How does the second question phrase another natural query?</summary>
<p>Another 40 to 75 word answer following the same pattern.</p>
</details>
</section>
</article>
<aside aria-label="Related content">
<h2>Related</h2>
<ul>
<li><a href="/related-1/">Related page one descriptive anchor</a></li>
<li><a href="/related-2/">Related page two descriptive anchor</a></li>
</ul>
</aside>
</main>
<footer role="contentinfo">
<p>Crafted by <a href="https://thatdeveloperguy.com/">ThatDeveloperGuy.com</a></p>
<address>
<p>Business Name</p>
<p>Street Address, City, State ZIP</p>
<p>Phone: <a href="tel:+15055123662">505 512 3662</a></p>
</address>
</footer>
<!-- FAQPage schema mirroring visible content above -->
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "How does the first question phrase the query as users ask it?",
"acceptedAnswer": {
"@type": "Answer",
"text": "40 to 75 word answer that directly addresses the question..."
}
}
]
}
</script>
</body>
</html>
This skeleton is the canonical substrate pattern. Every page on every site is built against this skeleton. Content varies per page; structure does not.
6.3 Content Blocks: The Answer First Pattern
Under every H2, the first paragraph is the answer. 40 to 75 words. Direct. Citeable. Specific.
Bad: "Let's talk about pricing. There are many factors to consider when determining the cost of a service. Different providers offer different rates. Some include extras. Some do not. Pricing varies..."
Good: "Half day guided fishing trips on Lake Taneycomo start at $300 for up to two anglers, including all tackle and bait. Full day trips run $500. We offer trout focused trips year round and bass trips on Table Rock seasonally. Book at 555 123 4567."
The good version is extractable. The bad version is not. The good version supplies dates, prices, locations, and a contact method. AI engines preferentially extract specifics. The bad version supplies hedges and generalities. AI engines deprioritize hedges.
This pattern repeats under every H2 in the document. The answer first paragraph is followed by supporting detail, examples, tables, lists, or expanded discussion. The first 40 to 75 words carry the citation weight.
6.4 Internal Link Architecture
Every page is a hub. Every page links to:
The pillar page for its topical cluster (when the current page is a spoke). The spoke pages within its topical cluster (when the current page is a pillar). Adjacent related pages that share entities or topics. Authoritative outbound sources for cited facts (.gov, .edu, standards bodies, primary sources).
Anchor text is descriptive. No "click here." No "read more" without context. The anchor itself communicates the topic of the destination page. This is both an accessibility requirement and a citation signal.
Cross reference: framework-internallinking.md for the full hub and spoke topology and anchor text discipline.
6.5 Entity Declarations
The substrate explicitly names entities: business name, owner name, service area, named services, partner brands, geographic boundaries, certifications, awards. AI engines use these as entity matches for entity salience scoring.
<p>Greenough's Guide Service operates on Lake Taneycomo near Branson,
Missouri, in Taney County. Guide Captain Keith Greenough is licensed
by the Missouri Department of Conservation and has been guiding trout
fishing in the White River system since 1998.</p>
This paragraph contains: business name, water body, geographic locality, county, geographic state, owner name, role, licensing authority, license period start, named river system, named species. Every entity is explicit. Every entity is matchable against external knowledge graphs.
Cross reference: framework-entitysalience.md for entity engineering depth.
6.6 Freshness Signals
Use <time> elements with datetime attributes for any date relevant to the content. Schema includes datePublished and dateModified. Visible last updated stamps where appropriate.
<p>Last updated <time datetime="2026-05-11">May 11, 2026</time></p>
AI engines weight recency for many query categories. Time stamps that are both visible and machine readable in <time datetime> format reinforce freshness signal.
6.7 Provenance And Citations
Inline outbound citations to authoritative sources, with explicit reference sections where appropriate. Princeton GEO study showed inline citations boost extraction probability by approximately 30%.
<p>The Missouri Department of Conservation
<a href="https://mdc.mo.gov/fishing/regulations">stocks Lake Taneycomo
with rainbow trout</a> at a rate that supports a year round trout fishery.</p>
The link itself is the citation. The destination is authoritative. The anchor text is descriptive. AI engines treat this pattern as evidence based content.
7. The Projection Layer
The projection is everything that composites on top of the substrate to produce the human experience.
7.1 CSS As The Visual Composer
CSS does the heavy lifting of the projection. Modern CSS can produce nearly any visual experience without requiring JavaScript:
Layout: Grid, Flexbox, container queries,
grid-template-areas, subgrid, anchor positioning. Visual depth: backdrop filter, mask image, clip path, gradients, multiple backgrounds, blend modes. Motion: CSS transitions, keyframe animations, scroll driven animations, view transitions API. Typography: variable fonts, font feature settings, text wrap balance, custom properties for fluid scales. Decoration: pseudo elements (::before,::after), SVG masks, generated content.
The projection layer can be aggressive in visual styling without touching the substrate. The H1 can be repositioned visually via Grid placement. Background images can layer behind text via background-image. Decorative graphics can be added via SVG siblings or pseudo elements without entering the content DOM.
7.2 JavaScript As Enhancement Only
JavaScript in a content first architecture is enhancement, never delivery. The five categories of permitted JS:
Interaction: form submission, modal triggers, accordion expansion (when the content is already in DOM). Animation: GSAP, Three.js, Motion One, Lottie, scroll triggered transitions, parallax, particle effects. Lazy media: defer loading of below the fold images, embeds, video. Micro UX: tooltips, copy to clipboard buttons, smooth scroll, focus management. Analytics and tracking: GA4, GSC verification, conversion pixels.
What JavaScript must never do:
Inject primary content into the page after load. Construct headings, body copy, FAQs, tables, or schema client side. Fetch content from an API as the source of truth for what appears on the page. Wrap the entire page as a JS framework root element that requires hydration to display content.
7.3 Permitted Patterns
Concrete examples of the projection working correctly:
/* Hero section with substrate content composited under a cinematic visual layer */
.hero {
position: relative;
min-height: 100vh;
display: grid;
place-items: center;
isolation: isolate;
}
.hero::before {
content: "";
position: absolute;
inset: 0;
background: url("/hero-image.jpg") center/cover;
z-index: -2;
}
.hero::after {
content: "";
position: absolute;
inset: 0;
background: linear-gradient(
180deg,
rgba(0, 0, 0, 0) 0%,
rgba(16, 100, 124, 0.7) 100%
);
z-index: -1;
}
.hero h1 {
font: 300 clamp(3rem, 8vw, 8rem) / 0.9 "Cormorant Garamond", serif;
font-style: italic;
color: white;
text-align: center;
max-inline-size: 20ch;
}
.hero .lead {
font: 400 clamp(1rem, 1.5vw, 1.25rem) / 1.6 "Manrope", sans-serif;
color: rgba(255, 255, 255, 0.9);
max-inline-size: 60ch;
margin-block-start: 2rem;
}
The HTML for this hero is the simple substrate from section 6.2. The visual transformation lives entirely in CSS. GPTBot reads the H1 and lead paragraph. The human sees the cinematic composition.
7.4 Forbidden Patterns
Concrete examples of the projection working incorrectly:
// FORBIDDEN: content delivered via JS
function ServicePage() {
const [services, setServices] = useState([]);
useEffect(() => {
fetch("/api/services")
.then(res => res.json())
.then(data => setServices(data));
}, []);
return (
<div>
<h1>Our Services</h1>
{services.map(s => <ServiceCard key={s.id} {...s} />)}
</div>
);
}
GPTBot sees: <div><h1>Our Services</h1></div>. No services. The page is empty to extraction.
<!-- FORBIDDEN: content hidden behind JS-only interaction -->
<div class="tabs">
<button onclick="showTab(1)">FAQ</button>
<div id="tab1" style="display:none">
<p>Question and answer content here.</p>
</div>
</div>
The content is in the DOM but display: none. Crawlers can detect hidden content and Google penalizes cloaking. The fix is <details> and <summary> which keep content extractable while collapsed.
<!-- FORBIDDEN: visual reorder breaking reading order -->
<div style="display: flex; flex-direction: row-reverse">
<h1>The conclusion appears first visually</h1>
<p>The introduction appears last visually but first in source</p>
</div>
Reading order is broken. Screen readers announce introduction first, conclusion second. Visually the human sees conclusion first. The fix is to put the elements in the intended reading order in source and let the visual layer position them logically.
8. Implementation Patterns By Stack
The doctrine applies regardless of stack. Implementation differs.
8.1 Plain HTML / Static Generation
Easiest case. Author HTML directly or generate at build time. Every byte of the substrate is in the file. No JavaScript runtime concerns.
Recommended for: marketing sites, local business sites, portfolio sites, content sites under 1,000 pages, any project where the team has the discipline to write HTML directly.
Tooling: Hugo, 11ty, Eleventy, plain HTML files, Pandoc, custom shell scripts. Reference: framework-astrohugo.md.
8.2 Astro
Astro produces static HTML by default. Components run at build time. JavaScript hydration is opt in per component (Astro Islands). The architecture aligns with content first by default.
Patterns:
Page level: write
.astrocomponents that produce semantic HTML. No client side JS by default. Hydration: useclient:load,client:visible,client:idledirectives only for interactive widgets, never for content delivery. Schema: server render JSON-LD in the page frontmatter or in a layout component.
Cross reference: framework-astrohugo.md.
8.3 Next.js / Nuxt / SvelteKit
These frameworks support content first when configured correctly. Use static generation (SSG) or server side rendering (SSR) for content pages. Avoid pure client side rendering for primary content.
Patterns:
App Router (Next.js 13+): server components by default. Use
'use client'only for components that genuinely need client side state. Pages Router:getStaticPropsorgetServerSidePropsfor content pages. Never fetch content inuseEffectfor primary page content. Schema: generate JSON-LD ingetStaticPropsor in a Server Component, never in client components. ISR: incremental static regeneration is fine for content that updates. The substrate is still server rendered.
Cross reference: framework-nextjs.md.
8.4 React / Vue / Svelte SPA
Last resort. Pure client side rendering should be reserved for logged in applications, admin dashboards, and tools that are not meant to be indexed.
When SPA is unavoidable for a public site:
Prerendering: use
prerender.io, Vercel's automatic prerendering, or a build time prerender pass. The substrate must be in the first byte for indexable routes. Static export: if the framework supports static export (Next.js, Nuxt), use it. Dynamic rendering: detect bots by user agent and serve a server rendered version. This is a fallback, not a strategy.
Cross reference: framework-react.md.
8.5 WordPress
WordPress is server side by default. Content first works naturally. The risks are theme dependent (some themes inject content via JS), plugin dependent (some page builders inject content via JS), and configuration dependent (some hosts add aggressive caching that breaks freshness).
Patterns:
Theme: use themes that produce semantic HTML in templates. Avoid heavy page builders that produce nested div structures. Plugins: use Yoast or Rank Math for SEO output. Schema lives server side, in head. Caching: full page cache fine. Verify cached HTML contains content (curl test against the cache).
Cross reference: framework-wordpress.md.
8.6 Shopify
Shopify Liquid renders server side. Content first works by default. The risks are theme dependent (some themes use JS heavy components) and app dependent (some apps inject content via JS).
Cross reference: framework-shopify.md.
8.7 Webflow
Webflow renders server side. Content first works by default. The visual editor produces semantic HTML when the builder uses semantic elements (heading vs div) and avoids visual hierarchies that imply different reading orders.
Cross reference: framework-webflow.md.
8.8 Cross Stack Translation
Every pattern in this framework translates to every supported stack. The bridge document is framework-cross-stack-implementation.md, which provides side by side examples of each pattern in HTML, React, Vue, Svelte, Next.js, Nuxt, SvelteKit, Astro, Hugo, 11ty, Remix, WordPress, Shopify, and Webflow.
9. Validation Methodology
A site claiming content first architecture must be verified. Five validation methods, each catching different failure modes.
9.1 The curl Test
The fastest, most definitive test. If curl sees the content, AI crawlers see the content.
# Test the page sees the expected content
curl -A "GPTBot" -s https://example.com/page | grep -o "<h1>[^<]*</h1>"
# Test the page contains the schema
curl -A "ClaudeBot" -s https://example.com/page | grep -c "application/ld+json"
# Test the page contains the FAQPage schema specifically
curl -A "PerplexityBot" -s https://example.com/page | grep -o "FAQPage"
# Full body inspection
curl -A "GPTBot" https://example.com/page > /tmp/page.html
wc -l /tmp/page.html
grep -E "<h1>|<h2>|<h3>" /tmp/page.html
If the H1 is not in the curl output, the page violates Rule 1. Fix the rendering strategy before any other work.
9.2 The Disable JavaScript Test
Browser based test for visual verification. In Chrome DevTools: Cmd+Shift+P → "Disable JavaScript" → reload page.
What should still work with JS disabled:
All primary content visible. All navigation links functional. All headings present. All FAQ content readable (collapsed via
<details>but expandable by click). All tables and lists present. Visual styling intact (CSS still loads).
What can break with JS disabled:
Animations. Form submission (graceful degradation: forms can submit via standard POST without JS). Carousels (graceful degradation: show all slides as a static list). Interactive widgets that require JS by definition (these should not contain primary content).
If primary content is missing with JS disabled, the page violates Rule 1.
9.3 The View Source Test
Right click → View Page Source. The actual HTML the server returned. Not the rendered DOM in DevTools (which shows post hydration state).
Compare the view source to the rendered DOM. The substrate should be in view source. The projection (computed styles, applied animations, hydrated state) is in the rendered DOM. If primary content is only in the rendered DOM and not in view source, the page violates Rule 1.
9.4 The AI Crawler Simulation
Use a tool that fetches a URL as a specified user agent and shows the resulting HTML.
Google Search Console URL Inspection → Test Live URL → View Tested Page. Shows the HTML Googlebot received. Screaming Frog with custom user agent set to GPTBot, ClaudeBot, etc. Crawl the site and inspect what each bot sees.
curl -A "[bot user agent]"as above. PromptWatch AI Search Crawler Inspector or similar tools that show what AI bots extract.
Verify the rendered output for each major bot user agent contains the expected content.
9.5 Lighthouse and Validation Tools
Lighthouse SEO audit: should show 100/100 with no issues for indexable pages. Google Rich Results Test: validates schema and shows extracted structured data. Schema.org Validator: validates JSON-LD syntax and property correctness. axe DevTools: validates accessibility, which correlates strongly with semantic HTML quality. W3C HTML Validator: validates the HTML structure for spec compliance.
A content first page should pass all five with zero or near zero violations.
10. Common Violations And Fixes
common_violations:
empty_react_root:
symptom: "curl returns <div id='root'></div> as the body content"
fix: "Migrate to Next.js with SSG/SSR, or add prerender.io to the build, or convert to Astro"
severity: "critical"
schema_via_gtm:
symptom: "Schema appears in DevTools but not in view source"
fix: "Move schema to server rendered HTML in document head, output by template engine"
severity: "critical"
faq_in_js_accordion:
symptom: "FAQ content visible after clicking but missing from initial DOM"
fix: "Replace JS accordion with <details>/<summary>, keep content in DOM"
severity: "high"
tabbed_content:
symptom: "Tabs 2 through N missing from initial DOM, loaded on click"
fix: "Render all tab content in DOM, use CSS to show/hide, or use anchor links per tab"
severity: "high"
hero_image_as_img_in_overlay:
symptom: "Hero <img> tag wraps content, creating unnecessary DOM nesting"
fix: "Move hero image to CSS background-image, leave content semantic and unwrapped"
severity: "medium"
divs_replacing_semantic_elements:
symptom: "<div class='article'>, <div class='heading'>, <div class='nav'>"
fix: "Replace with <article>, <h1>-<h6>, <nav> respectively"
severity: "high"
visual_reorder_breaking_reading_order:
symptom: "Screen reader announces content in different order than visual presentation"
fix: "Reorder source HTML to match logical reading order; use CSS only for visual presentation"
severity: "high"
client_side_canonical_injection:
symptom: "Canonical URL set by JavaScript after page load"
fix: "Server render <link rel='canonical'> in head"
severity: "critical"
delayed_content_fetch:
symptom: "Page loads with skeleton, then fetches content via XHR"
fix: "Server render content in first byte; skeleton states are acceptable for slow APIs only on non indexable pages"
severity: "critical"
cloaking_via_display_none:
symptom: "Content present in DOM but display:none for visual reasons"
fix: "Either remove from DOM (if not meant to be indexed) or display it (if meant to be indexed); never both"
severity: "critical"
off_screen_positioning_for_hiding:
symptom: "position: absolute; left: -9999px on content"
fix: "Use <details>/<summary> for collapsed content; never off screen positioning to hide indexable content"
severity: "critical"
carousel_with_lazy_slides:
symptom: "Carousel slide 1 in DOM; slides 2+ only loaded when navigated to"
fix: "Render all carousel content in DOM, use CSS transform/opacity for transitions"
severity: "medium"
modal_with_primary_content:
symptom: "Important content (pricing, features) only in JS triggered modal"
fix: "Move content to the main page; modals are for confirmations, not primary content"
severity: "high"
image_as_text:
symptom: "Text rendered as a graphic (often hero headlines as PNG)"
fix: "Use actual text in HTML with CSS typography; reserve images for actual images"
severity: "high"
iframe_blocked_extraction:
symptom: "Primary content inside iframe (third party widget, embedded form)"
fix: "Inline the content; iframes are extraction barriers"
severity: "medium"
11. The Content First Audit Rubric
A 30 point rubric. Score each page or representative sample of pages. World class content first implementation: 27+/30.
| # | Criterion | Pass/Fail |
|---|---|---|
| CF1 | curl returns the H1 text in the response body | |
| CF2 | curl returns all H2 headings in the response body | |
| CF3 | curl returns primary body content (lead paragraph, FAQ answers) | |
| CF4 | curl returns JSON-LD schema in the response body | |
| CF5 | curl returns FAQPage schema with Question/Answer pairs (if FAQ present) | |
| CF6 | View source matches rendered DOM for content (not just shell) | |
| CF7 | JavaScript disabled: page is fully readable with styles intact | |
| CF8 | JavaScript disabled: navigation works (anchor links functional) | |
| CF9 | JavaScript disabled: FAQ expand/collapse works (via <details>) |
|
| CF10 | Semantic elements used: <article>, <section>, <main>, <nav>, etc. (not all divs) |
|
| CF11 | One <h1> per page; logical H2 to H6 hierarchy; no skipped levels |
|
| CF12 | DOM order matches logical reading order; CSS visual reorder does not break reading | |
| CF13 | Lead paragraph 40 to 75 words placed immediately under H1 | |
| CF14 | Answer first paragraphs (40 to 75 words) under every H2 | |
| CF15 | Comparison data uses <table> not divs |
|
| CF16 | Numbered processes use <ol> not <ul> not prose |
|
| CF17 | FAQs use <details> and <summary> with content in DOM |
|
| CF18 | Schema graph with @id and sameAs in document head | |
| CF19 | Canonical, robots, Open Graph, Twitter Card meta tags all server rendered | |
| CF20 | Hero images via CSS background, not <img> wrapper around content |
|
| CF21 | Decorative graphics via pseudo elements or SVG siblings, not content wrappers | |
| CF22 | Visual animations enhance the projection without blocking the substrate | |
| CF23 | All internal links use descriptive anchor text | |
| CF24 | All explicit entities (business name, owner, location, services) declared in text | |
| CF25 | datePublished and dateModified present in schema and visible where appropriate | |
| CF26 | Inline outbound citations to authoritative sources where claims warrant | |
| CF27 | Lighthouse SEO score 95+ | |
| CF28 | Rich Results Test validates all schema with zero errors | |
| CF29 | Manual inspection: no cloaking patterns, no off screen hiding, no JS only content | |
| CF30 | GSC URL Inspection: rendered HTML matches view source for primary content |
Score: 30. Tiers:
27 to 30: World class content first implementation. Ready for AI extraction. 22 to 26: Good. Most patterns correct, minor cleanup needed. 17 to 21: Acceptable for visual first legacy site. Plan retrofit. Below 17: Failing the doctrine. Content is not extractable. Stop other framework work until fixed.
12. Retrofit Strategy For Existing Visual First Sites
Existing sites built visual first cannot always be migrated overnight. A staged retrofit strategy:
12.1 Assess Current State
Run the audit rubric on the existing site. Identify which violations are critical (Rules 1, 4, 5) versus which are stylistic (Rules 2, 3 in cosmetic ways). Critical violations get priority.
12.2 Retrofit Priority Order
retrofit_priority:
phase_1_emergency_fixes:
- Move schema from JS injection to server rendered head
- Convert FAQ accordions from JS to <details>/<summary>
- Server render canonical, robots, meta tags
- Remove display:none from content meant to be indexed
phase_2_substrate_fixes:
- Migrate rendering strategy from CSR to SSG/SSR/ISR
- Replace divs with semantic elements
- Add lead paragraph and answer first patterns under each H2
- Fix DOM order where visual reorder broke reading order
phase_3_substrate_enrichment:
- Add comparison tables where prose can be tabularized
- Add numbered lists where processes are described
- Add entity declarations in text
- Add inline outbound citations
phase_4_validation:
- Run full audit rubric
- GSC URL Inspection on representative pages
- curl tests as GPTBot, ClaudeBot, PerplexityBot
- Lighthouse and Rich Results Test on every priority page
12.3 Risk Management
Sites with existing rankings should retrofit in staging first, validate citation impact on a small set of pages before rolling out portfolio wide. Migrations covered in framework-migration.md.
13. The Build Sequence
For every new build, in order:
1. READ THIS DOCUMENT.
Internalize the doctrine before opening an editor.
2. AUTHOR THE SUBSTRATE.
Write semantic HTML with all content, schema, and structure.
Verify it makes sense as a document with no CSS or JS.
3. APPLY THE PROJECTION.
Add CSS for visual treatment.
Add JavaScript for enhancement.
Verify the substrate is still intact after the projection.
4. VALIDATE.
Run the audit rubric.
curl as GPTBot. Inspect the response.
Disable JavaScript and reload. Verify content remains.
5. APPLY OTHER FRAMEWORKS.
Now Level 1 Foundation frameworks. Then Level 2.
Then specialized, authority, monitoring, conversion.
Order matters. Applying Level 1 Foundation before authoring the substrate is the failure mode this doctrine prevents.
14. End Of Framework Document
This is Level 0. Every other framework in this library serves this doctrine. Schema implementation, internal linking, AI citations, E-E-A-T, entity salience, all of it presupposes a content first substrate. Without the substrate, the rest is decoration applied to a site no bot can read.
Apply this doctrine first. Then everything else compounds.
Companion documents:
framework-masterindex.md— Library navigation and dependency graph (Level 0 at top, then Level 1 through 6)framework-technicalseo.md— Technical foundation, crawl access, indexing (Level 1)framework-schema.md— Schema implementation patterns (Level 1)framework-pageexperience.md— Performance and Core Web Vitals (Level 1)framework-internallinking.md— Hub and spoke architecture (Level 2)framework-aicitations.md— AI citation mechanics across enginesframework-headless.md— Headless CMS implementation patternsframework-react.md— Pure CSR SPA SEO when SSR/SSG is not in scopeframework-cross-stack-implementation.md— Pattern translation across stacksframework-accessibility.md— A11y patterns that align with semantic HTML
Document version: 1.0 Created: 2026-05-11 Maintained by: ThatDeveloperGuy
This document is the architectural axiom. Read it before any other framework. Apply it before any build decision. Verify adherence before considering any site ready for the Level 1 Foundation frameworks.
Owner: Joseph W. Anady — ThatDeveloperGuy — SDVOSB Contact: 505-512-3662 | joseph.w.anady@icloud.com
Want this framework implemented on your site?
ThatDevPro ships these frameworks as productized services. SDVOSB-certified veteran owned. Cassville, Missouri.
See Engine Optimization service ›