SEO & AI Engine Optimization Framework · May 2026

News SEO: Google News, Top Stories, news structured data

A canonical reference for news publisher SEO in 2026. News SEO is a distinct sub discipline. Editorial standards become technical signals, freshness becomes a measurable ranking input, schema becomes…

Google News Inclusion, Top Stories Carousel, Discover Surface, News Showcase, NewsArticle Schema, Editorial Trust Signals, AI Engine News Citation, and the Discipline of Optimizing Time Sensitive Content

A canonical reference for news publisher SEO in 2026. News SEO is a distinct sub discipline. Editorial standards become technical signals, freshness becomes a measurable ranking input, schema becomes a contract with Google News classifiers, and AI engines (Perplexity News, ChatGPT news mode, SearchGPT, Gemini news) operate trusted source lists that determine whether a publication appears in synthesized news answers at all. The 2026 surface map includes Google News, the Top Stories carousel that has shed the AMP requirement, Google Discover as a mobile only algorithmic feed, News Showcase as a contractual program, and AI curated news feeds inside ChatGPT and Perplexity that have begun displacing direct visits for breaking topics.

Cross stack note: code samples are plain HTML for clarity. For React, Vue, Svelte, Next.js, Nuxt, SvelteKit, Astro, Hugo, 11ty, Remix, WordPress, Shopify, and Webflow equivalents of every pattern below, see framework-cross-stack-implementation.md. For pure client rendered SPAs see framework-react.md.


1. Document Purpose

The operational reference for News SEO in the post AMP, AI surfaced, decoupled citation era. The four legacy surfaces (Google News, Top Stories carousel, Google Discover, News Showcase) continue to drive newsroom traffic. The four AI surfaces (Perplexity News, ChatGPT news mode, SearchGPT news, Gemini news) have moved from experimental to material since Q4 2025. Both pull from the same publisher pool but weight signals differently. A publication can dominate Top Stories and remain invisible inside ChatGPT, and vice versa.

News SEO is not a layer on general SEO. The freshness signal economy operates on hours and minutes. The trust signal economy requires masthead, corrections, ethics, ownership, and fact checking pages that general business sites never produce. The schema economy requires NewsArticle (not Article). The crawl economy operates against Googlebot News, OAI SearchBot, ClaudeBot, and PerplexityBot at cadences that demand a sitemap update path measured in minutes rather than days.

Mode A, Install. Build news SEO infrastructure on a new or existing publication. Follow Sections 2 through 14 in order. Site wide infrastructure precedes per article work precedes ongoing publication cadence.

Mode B, Audit. Evaluate an existing publication against the framework. Skip to Section 15. Apply Mode A to any pillar that scores below threshold.

Mode C, Hybrid. Audit first, then install for failing items.

Relationship to neighboring frameworks. E-E-A-T pillar work lives in framework-eeat.md. Masthead, corrections, ownership infrastructure detail lives in framework-trustsignals.md. NewsArticle schema integrates with framework-schema.md. AI Overview news citation lives in framework-aioverviews.md. The broader AI engine citation surface lives in framework-aicitations.md. Perplexity News routes to framework-perplexityspaces.md. SearchGPT news routes to framework-searchgpt.md. Multi engine trade offs route to framework-multiengine-tradeoffs.md. Freshness as a generic concern lives in framework-contentrefresh.md. Beat coverage routes to framework-topicalauthority.md. International routes to framework-international.md and framework-hreflang.md. Local news overlaps with framework-localseo.md. Mobile gates Discover via framework-mobileseo.md.

What counts as news for Google. Time sensitive original reporting, recent events coverage, industry news and analysis, breaking news. Press releases sometimes qualify if they meet editorial standards. Evergreen, service/product pages, generic blog posts, user generated content, and aggregated content without added value do not qualify. The publisher quality factors that gate inclusion: editorial responsibility, original reporting, bylined authors with expertise, editorial standards, publication consistency, contact, corrections process.


2. Client Variables Intake

Before any installation or audit, gather every variable below. Do not proceed until the intake is complete.

# NEWS SEO FRAMEWORK CLIENT VARIABLES

# --- Publication Identity (REQUIRED) ---
publication_name: ""
publication_type: ""               # "national", "regional", "local", "trade", "specialty"
publication_founded_year: ""
publication_url: ""

# --- Beat Coverage (REQUIRED) ---
primary_beats: []                  # 3 to 7 topical beats
geographic_scope: ""               # "global", "national", "regional", "city"
geographic_locations_covered: []

# --- Publishing Cadence (REQUIRED) ---
articles_per_day_average: 0
breaking_news_capability: false    # Within 30 minutes of an event
deadline_publication_pattern: ""   # "morning", "evening", "rolling", "weekly"

# --- Editorial Team (REQUIRED) ---
editorial_team_size: 0
named_reporters_count: 0
named_editors_count: 0
fact_checkers_count: 0
ombudsperson_present: false

# --- Editorial Standards (REQUIRED) ---
ap_stylebook_adherence: false      # AP Stylebook is the de facto news standard
corrections_policy_published: false
ethics_policy_published: false
fact_checking_policy_published: false
masthead_published: false
ownership_disclosure_published: false
funding_disclosure_published: false

# --- Monetization Model (REQUIRED) ---
monetization_model: ""             # "ad_supported", "subscription", "metered", "donation", "hybrid"
paywall_type: ""                   # "none", "hard", "metered", "registration"
metered_articles_per_month: 0

# --- Google News Status (REQUIRED) ---
google_news_publisher_center_configured: false
google_news_publication_approved: false
top_stories_appearances_baseline_28d: 0
discover_traffic_baseline_28d: 0

# --- Schema Coverage (REQUIRED) ---
newsarticle_schema_on_every_article: false
publisher_organization_schema_present: false
person_schema_per_author: false

# --- AI Engine Crawler Access (REQUIRED) ---
gptbot_allowed_in_robots_txt: false
oai_searchbot_allowed_in_robots_txt: false
claudebot_allowed_in_robots_txt: false
perplexitybot_allowed_in_robots_txt: false
google_extended_allowed_in_robots_txt: false

# --- Hosting and Indexing (REQUIRED) ---
hosting_environment: ""            # "bubbles-debian-nginx", "vps", "managed-wordpress"
news_sitemap_update_frequency_minutes: 0
indexnow_configured: false
rss_feed_present: false

# --- Subscriber and Trust Data (RECOMMENDED) ---
verified_subscriber_count: 0
press_association_memberships: []  # "AP", "Reuters", "SPJ", "IRE", "INMA", "LMA"
journalism_awards: []
fact_check_partnerships: []

Citation defense work for AI engine news surfaces cannot start until the eight policy and disclosure pages under Editorial Standards are published. Sites failing those dependencies route back to framework-trustsignals.md first.


3. Google News Inclusion 2026

3.1 The 2019 Policy Change

Google removed the formal Google News application requirement in December 2019. Before that change, publications applied through a dedicated producer form and waited weeks to months for review. After the change, any site that publishes news content is technically eligible based on automated quality assessment alone. The Google News Publisher Center remains as the visibility, branding, and metadata management tool. Publisher Center is no longer required for inclusion but is required for control over how the publication appears once included.

The Publisher Center supplies the publication name, logo, language, country, and content categorization that Google News uses to render publication cards. Without Publisher Center, Google News surfaces articles algorithmically using whatever signals it can derive from the page but does not present a coherent publication identity on news.google.com. Publishers serious about news traffic configure Publisher Center.

3.2 The Five Inclusion Criteria

Google News quality assessment in 2026 evaluates publishers against five criteria. Each is a separate gate.

Original Reporting. The publication produces content that originates with its own reporters rather than aggregating, paraphrasing, or syndicating from other publications. Publications that aggregate more than 60 percent of their output without substantial added value typically do not earn Google News promotion regardless of other signals.

Expertise. Bylines link to author pages with credentials, beat coverage history, and bio depth. The author signal is heavier in 2026 than at any prior point. Person schema with knowsAbout, hasCredential, and sameAs linking to journalism social profiles (X, LinkedIn, Mastodon, Bluesky) and professional press association profiles is the practical expression of the expertise criterion.

Authority. The publication is referenced by other authoritative sources. Inbound links from established news organizations, citations in academic journalism research, press association memberships, and journalism awards all feed into the authority signal. New publications without authority signals can earn surface time on niche beats but rarely break into Top Stories on national news without 12 to 18 months of cited original reporting first.

Transparency. Masthead, corrections policy, ethics policy, ownership disclosure, funding disclosure, fact checking policy, and editor contact are visible on the site. The transparency criterion is the most binary of the five. Publications without a masthead and corrections policy on visible URLs face categorical exclusion from premium news surfaces.

Accessibility. Pages are crawlable by Googlebot News, mobile friendly, and fast. Pages behind aggressive bot challenges, anti scraping infrastructure, or third party proxy layers that interfere with crawl frequently fail accessibility. This is the criterion that bites publishers who deploy bot mitigation without whitelisting Google's verifiable IP ranges.

3.3 Publisher Center Configuration

The Publisher Center workflow uses a Google Workspace account tied to the publication. Verify in Google Search Console first; the same Google account streamlines verification. Add the publication with name, primary URL, language, and country. The name must match the og:site_name, the Organization schema name field, and the publisher.name field in NewsArticle schema across the site. Upload a square logo (minimum 1000 x 1000 px) and a rectangular logo (minimum 1000 x 200 px, transparent background preferred). Logos must match the publisher.logo URL used in NewsArticle schema. Map publication sections to URL patterns. Verify content ownership; Search Console verification satisfies this step. Configure distribution and region targeting. Approval typically lands within 7 to 14 days for first time submitters.

3.4 Common Inclusion Failures

Aggregated wire copy without added reporting fails Original Reporting. Add reporter bylines contributing context, local angle, or follow up on every wire piece, or remove wire content from the news sitemap. Generic "Editorial Team" or "Staff" bylines fail Expertise; replace with named persons. New publications without inbound press citations fail Authority; this is not a quick win, requiring 12 to 18 months of cited original reporting plus outreach to industry trade press and awards programs. Missing masthead or corrections policy fails Transparency; the fix is a 30 minute build of the eight policy pages reviewed against framework-trustsignals.md. Pages blocked by bot challenges fail Accessibility; whitelist Googlebot, Googlebot News, and AI engine crawlers via verifiable reverse DNS lookups rather than user agent matching, which is trivially spoofed.


4. Top Stories Carousel

4.1 The Post AMP Carousel

Google deprecated the AMP requirement for Top Stories eligibility in June 2021. The carousel that previously demanded AMP HTML now accepts standard mobile responsive pages. The transition shifted competition from "publishers willing to maintain a separate AMP build" to "publishers with the best Core Web Vitals on canonical mobile pages." The result expanded Top Stories eligibility while raising the technical bar on mobile performance.

Current Top Stories requirements: mobile friendly, fast (LCP under 2.5 s, INP under 200 ms, CLS under 0.1), valid NewsArticle schema, sufficient publishing velocity on the relevant beat, sufficient site authority on the relevant beat. The carousel rotates fast for high velocity topics. A breaking story carousel may cycle through 12 to 20 publications across 6 hours. A slower developing story carousel may stabilize on 5 to 8 publications for 24 to 48 hours.

4.2 Carousel Slot Mechanics

Top Stories typically displays 3 articles on mobile and 3 to 4 on desktop. Tapping expands to a Top Stories results page with 10 to 20 additional articles. Citation in the visible 3 slot earns 10 to 15 percent of all clicks on that SERP. Citation in the expanded page earns 2 to 4 percent. Failing to appear in the carousel on a query that shows Top Stories is functionally equivalent to ranking on page 2 of organic results.

top_stories_optimization_priorities:
  publisher_authority:        # weight: high
    - Build named author authority on the beat over 12 to 18 months
    - Earn inbound press citations from established news orgs
    - Maintain press association memberships
    - Earn or self report journalism awards
  freshness:                  # weight: very_high
    - Publish within 30 minutes of breaking news on covered beats
    - Update developing stories with dateModified every 2 to 4 hours
    - Use IndexNow to push immediate notification to participating engines
    - Update news sitemap within 5 minutes of publication
  schema_correctness:         # weight: very_high
    - NewsArticle schema present with all required fields
    - Publisher Organization schema with logo at correct dimensions
    - Author Person schema with sameAs and credential properties
    - dateModified updated for any substantive change
  technical_quality:          # weight: high
    - LCP under 2.5 seconds on mobile
    - INP under 200 milliseconds
    - CLS under 0.1
    - Server rendered HTML; crawlable without bot challenges

4.3 The Speed to Carousel Latency

The latency from publication to Top Stories appearance varies by query velocity and publisher authority. For high authority publishers on breaking news the latency is typically 8 to 25 minutes; for mid tier 30 to 90 minutes; for new or low authority several hours or never. Latency is driven by Googlebot News crawl frequency, which scales with established publishing cadence. A publisher with 4 articles per day for 6 months is crawled with cadence that discovers new articles within minutes; a publisher with 1 article per week is not. Accelerants: news sitemap updates, IndexNow push notifications, RSS feed pings, and URL Inspection live submission for the highest priority breaking stories. None substitute for established cadence over time but all reduce latency at the margin.

4.4 Beat Specialization

Publishers that own a beat earn carousel slots faster than generalists. A publication with 200 articles on local elections over 18 months appears in Top Stories for local election queries faster than a national generalist with one election article. Google measures beat coverage depth implicitly through internal linking, topic cluster cohesion, named entity overlap across articles, and consistent author bylines. Cross reference framework-topicalauthority.md for the beat coverage cluster mechanics.


5. Google Discover Optimization

5.1 What Discover Is

Google Discover is a mobile only algorithmic feed surfaced inside the Google app, the Google homepage on mobile browsers, and the Pixel Launcher on Android. Discover does not respond to user queries. It surfaces content algorithmically based on user interest signals (search history, location, app usage, follows) and publisher signals (engagement velocity, content category match, image quality, mobile experience).

Discover traffic is bimodal. A Discover hit can generate 10,000 to 500,000 sessions for a single article in 24 to 72 hours and then decay to zero. Discover misses generate near zero Discover traffic regardless of how well the article ranks in standard search. Discover is a high variance amplifier, not a steady traffic channel.

5.2 Discover vs Top Stories vs Google News

Dimension Google News Top Stories Google Discover
Trigger User visits news.google.com Search query with news intent Passive feed scroll
Query dependency None (feed) Yes (news SERP) None (feed)
Surface Dedicated news.google.com Top of standard SERP Mobile Google homepage and app
Mobile only No No Yes
Personalization High (topic follows) Low (query level) Very high (interest model)
Latency to surface Hours Minutes to hours Hours to days
Lift on hit Steady Peak Bimodal spike
AMP required No No (since 2021) No

5.3 The Discover Image Requirement

Discover surfaces a large hero image for every entry. The image quality signal is heavier on Discover than on any other Google surface. Minimum 1200 pixels wide. Images smaller than 1200 px wide are disqualified from large preview rendering. High resolution and visual quality. Stock photo aesthetics that scream stock photo are filtered out. Original photography, editorial photography, or graphic design that does not look like stock photo wins. The image must visually match what the article is about. Declared in NewsArticle schema image property as an ImageObject with width and height properties. Declared in og:image with matching URL. Not the publisher logo. The hero image must be article specific, not the brand mark.

<meta property="og:image" content="https://example-news.com/2026/05/hero.jpg">
<meta property="og:image:width" content="1600">
<meta property="og:image:height" content="900">

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "NewsArticle",
  "image": [{
    "@type": "ImageObject",
    "url": "https://example-news.com/2026/05/hero.jpg",
    "width": 1600,
    "height": 900,
    "caption": "Voters at a polling place in downtown Springfield"
  }]
}
</script>

5.4 Discover Engagement Signals

Discover ranking after initial surface is driven by engagement velocity. Click through rate from the feed, time on page, scroll depth, and return to feed pattern feed back into whether the article continues to surface to similar users. Articles with high CTR and long time on page get amplified. Articles with low CTR or short time on page decay rapidly. Engagement is measured on the canonical article page; slow loads, intrusive interstitials, or headlines that fail to deliver lose engagement quickly.

5.5 What Disqualifies Articles from Discover

Clickbait headlines that overpromise, hateful or harassing content, spam, AI generated content without disclosure, misleading thumbnails that do not match article content, YMYL content without proper E-E-A-T credentials, heavy ad load or intrusive interstitials that block content, and slow mobile performance (LCP over 4 seconds typically loses Discover).

5.6 The Discover Traffic Spike Pattern

A typical Discover hit follows a recognizable curve. The article surfaces to a small test cohort within 4 to 12 hours of publication. If engagement is positive, surface expands progressively over the next 12 to 24 hours, peaking between hour 18 and hour 36. Traffic decays over the following 24 to 48 hours and typically reaches zero between hour 60 and hour 96. Republishing with substantive updates (and a dateModified bump) can extend decay but compresses the curve. The first hit is typically the largest. Cross reference framework-mobileseo.md for the mobile performance specifics that gate Discover eligibility.


6. NewsArticle Schema

6.1 The Full Property Set

NewsArticle extends Article with news specific properties. Use NewsArticle (not Article) on news content.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "NewsArticle",
  "@id": "https://example-news.com/2026/05/article-slug/#article",
  "headline": "Specific Article Headline Under 110 Characters",
  "alternativeHeadline": "Optional Subhead",
  "description": "Article dek or lede summary, 50 to 200 characters.",
  "datePublished": "2026-05-14T08:30:00-05:00",
  "dateModified": "2026-05-14T14:45:00-05:00",
  "author": [{
    "@type": "Person",
    "@id": "https://example-news.com/authors/jane-doe/#person",
    "name": "Jane Doe",
    "url": "https://example-news.com/authors/jane-doe/",
    "jobTitle": "Senior Politics Reporter",
    "knowsAbout": ["State politics", "Election law"]
  }],
  "publisher": {
    "@type": "Organization",
    "@id": "https://example-news.com/#organization",
    "name": "Example News",
    "url": "https://example-news.com/",
    "logo": {
      "@type": "ImageObject",
      "url": "https://example-news.com/brand/logo-google-news.png",
      "width": 600, "height": 60
    }
  },
  "image": [{
    "@type": "ImageObject",
    "url": "https://example-news.com/2026/05/article-hero.jpg",
    "width": 1600, "height": 900
  }],
  "mainEntityOfPage": {"@type": "WebPage", "@id": "https://example-news.com/2026/05/article-slug/"},
  "articleSection": "Politics",
  "articleBody": "Full article body text.",
  "wordCount": 1247,
  "keywords": ["state politics", "election law"],
  "isAccessibleForFree": true,
  "inLanguage": "en-US",
  "dateline": "SPRINGFIELD, Mo."
}
</script>

6.2 Required vs Recommended vs Conditional

Required for NewsArticle rich result and Top Stories eligibility: headline (matches article H1, maximum 110 characters), datePublished (ISO 8601 with timezone), dateModified (required if substantively updated), author (Person heavily preferred for news), publisher (Organization with name and logo; 600 x 60 minimum for rectangular logo), image (minimum 1200 px wide), mainEntityOfPage (canonical URL).

Strongly recommended: description (social sharing fallback), articleSection (Google News topic classifiers), articleBody (plain text rendering; not treated as duplicate content), wordCount, keywords, dateline (publication location, important for local), inLanguage (BCP 47, critical for international).

Conditional: isAccessibleForFree (required if paywalled; set false), hasPart (required for paywalled portions; CSS selector), alternativeHeadline (print vs web headline differ), correction (article corrected), retraction (article retracted).

6.3 Common Schema Mistakes

Headline length over 110 characters disqualifies some carousel positions; trim to fit. Schema headline that does not match the H1 triggers a Search Console structured data warning and may disqualify rich results. Missing publisher logo or wrong dimensions fails Top Stories eligibility. Author as Organization instead of Person ("Editorial Team", "Staff") signals lower expertise; replace with named Person authors. datePublished re dated on an old article violates Google's spam policies and triggers manual actions in repeat cases; use dateModified for legitimate updates. Missing dateModified after a substantive update means freshness signals fail to register. Images under 1200 px wide will not surface in rich results. isAccessibleForFree: true on paywalled content sends a misleading signal to Google; set to false and use hasPart to specify the paywalled portion. Cross reference framework-schema.md for the wider schema graph integration.


7. The Living Article Pattern

7.1 Why dateModified Is Critical

News articles are rarely finished at first publication. Breaking news develops. Initial reports get corrected. Quotes are added. The dateModified property is how Google distinguishes an article that is actively maintained from one that was published once and abandoned. For Top Stories and Discover, dateModified is the primary freshness signal once datePublished is more than a few hours old. An article published 18 hours ago with dateModified of 30 minutes ago competes for Top Stories slots on the recent modification timestamp, not the older publication timestamp. Without the bump, the same article looks 18 hours stale.

7.2 What Counts as a Substantive Update

substantive_updates_dateModified_bump:
  - New factual information added
  - Initial facts proven wrong and corrected
  - New quoted source added
  - New photo or graphic that materially changes the visual story
  - Significant new context that changes the article's framing

trivial_updates_no_bump:
  - Comma or period fixes
  - Single word typo on a body sentence
  - Internal link added
  - Image alt text adjustment
  - CSS or layout adjustment

Google's spam policies treat repeated trivial updates intended to game freshness as a violation. The practical workflow: every substantive update logs an editor's note at the top of the article, updates the dateModified schema property, regenerates the news sitemap entry, and pushes an IndexNow notification if the change is significant enough to warrant immediate re crawl.

7.3 Correction Notation

Corrections follow AP Stylebook format. Place a visible <aside class="correction" role="note"> at the top of the article with <strong>Correction, [date and time]</strong> followed by the substance of what was wrong and what was changed. Pair the visible correction with a schema update. The article schema gets a correction property pointing to the corrections policy URL, and the corrections policy page logs the correction in a public archive.

7.4 Retraction Notation

A retraction is a stronger correction. The article is no longer reliable and has been formally withdrawn. Place a visible <aside class="retraction" role="note"> at the top with <strong>Retracted, [date]</strong> followed by what could not be verified and why the retraction was issued. The original article is preserved below for transparency. Add a retraction schema property pointing to a retraction notice at the corrections policy URL.

7.5 The Republication Decision

Some updates are large enough that the article is no longer the same article. Criteria for warranting a new article: headline would need to change substantially, lead paragraph would need entire rewrite, new content is more than 50 percent of the final article, original is no longer accurate without major rewriting, or a meaningful time period has passed (30 plus days). Otherwise update in place. When publishing a new article, cross link to the original. Cross reference framework-contentrefresh.md for the wider patterns that apply outside news contexts.


8. Author Profiles for News

8.1 The Heavier Author Signal in 2026

The author signal in news SEO has tightened across two cycles of E-E-A-T algorithm updates. Where 2023 era signals were about presence (does a byline exist), 2026 signals are about depth (does the author have a credentialed bio, beat coverage history, social proof, press affiliations). The cost of failing the author signal is exclusion from Top Stories and Discover even when other signals are strong. Every byline must link to an author page. Every author page must satisfy a minimum credential threshold for journalism.

author_page_minimum_requirements:
  identity:        Real full name, high quality headshot, pronouns optional
  bio:             Minimum 200 word professional bio, beats specified, years of experience, notable stories
  credentials:     Education (university and degree), press association memberships, awards, prior publications
  contact:         Professional email on publication domain, Signal or Telegram for tips, PGP for investigative
  social:          X/Twitter, LinkedIn, Mastodon or Bluesky
  schema:          Person with @id, sameAs array, knowsAbout array, hasCredential array
  publication_history: Paginated list of authored articles, sorted by date desc

8.2 The Author Person Schema

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "@id": "https://example-news.com/authors/jane-doe/#person",
  "name": "Jane Doe",
  "url": "https://example-news.com/authors/jane-doe/",
  "image": "https://example-news.com/authors/jane-doe/headshot-1200.jpg",
  "jobTitle": "Senior Politics Reporter",
  "worksFor": {"@id": "https://example-news.com/#organization"},
  "knowsAbout": ["State politics", "Election law", "Municipal government"],
  "hasCredential": [
    {"@type": "EducationalOccupationalCredential",
     "credentialCategory": "Degree",
     "name": "BA Journalism, Northwestern Medill"},
    {"@type": "EducationalOccupationalCredential",
     "credentialCategory": "Award",
     "name": "Missouri Press Association Investigative Reporting, 2023"}
  ],
  "memberOf": [
    {"@type": "Organization", "name": "Society of Professional Journalists"},
    {"@type": "Organization", "name": "Investigative Reporters and Editors"}
  ],
  "sameAs": [
    "https://x.com/janedoereporter",
    "https://linkedin.com/in/janedoereporter",
    "https://bsky.app/profile/janedoe.example-news.com"
  ]
}
</script>

8.3 Editor and Multi Author Profiles

Editors get profile pages too. The editor profile is shorter than a reporter profile but no less important. Requirements: real full name, headshot, title, 150 word bio, editorial philosophy, areas of responsibility, contact email, LinkedIn, Person schema with worksFor. The masthead links to every editor and reporter profile organized by beat.

For multi author stories, use the schema author array with each Person entry. The visible byline matches the schema order: By Jane Doe and John Smith. Contributing reporters who do not earn a byline still get a contributor schema entry for transparency. Cross reference framework-eeat.md Section 7 for the wider author infrastructure pattern.


9. Editorial Trust Signals

9.1 The Required Policy Pages

News publishers must publish the following pages at canonical URLs. Missing any one degrades editorial trust signals and frequently blocks Top Stories eligibility.

required_policy_pages:
  /about/         : Publication identity, history, mission
  /masthead/      : Editorial team roster organized by role
  /corrections/   : Corrections policy and public correction log
  /ethics/        : Editorial ethics policy
  /ownership/     : Ownership and corporate structure disclosure
  /funding/       : Revenue model disclosure
  /fact-checking/ : Fact checking methodology
  /contact/       : Editorial and corporate contact
  /privacy/       : Privacy policy with journalism specific clauses
  /terms/         : Terms of use including republication and licensing

9.2 The Corrections Policy

The corrections policy is the most heavily weighted trust signal of the policy set. Google's news quality classifiers explicitly look for corrections infrastructure. The minimum policy specifies how errors are handled (correction notice at top of the affected article, original incorrect text preserved where appropriate, logged in a public archive), the four types distinguished (correction for factual error; clarification for unclear original; update for changed circumstances; retraction for unsupportable original), the channel for reporting (email, phone, mail), and a link to the public log. The corrections log is a paginated public archive listing the date, article corrected, what was wrong, what was changed, and who reported the error if external.

9.3 The Masthead Page

The masthead establishes the publication's editorial chain. The structural pattern lists editorial leadership (Editor in Chief, managing editors, beat editors) first, then reporters organized by beat, then editorial staff (fact checkers, copy editors, designers, photographers). Each entry has a headshot, name linking to the author page, title, contact email for leadership, and beats for reporters.

9.4 Ownership and Funding Disclosure

Honest ownership and funding disclosure distinguishes credible publishers from astroturfed publishers. Ownership specifies the publishing entity (LLC, partnership, nonprofit), the controlling parties, any parent or holding company, any political or advocacy affiliations, and any board of advisors with relationships. Funding specifies the sources of revenue (subscriptions, advertising, grants, donations), percentage breakdown when available, major grants and donors over a disclosed threshold (typically $1,000 individual or $5,000 institutional), and any conditions or restrictions on the funding. Cross reference framework-trustsignals.md for the wider trust signal architecture that applies beyond news contexts.


10. AI Engine News Citation 2026

10.1 The AI News Surface Map

Four AI engines operate dedicated news surfaces or news weighted retrieval modes in 2026.

Perplexity News. Live since late 2024 as a dedicated news tab with topic feeds (Politics, Tech, Sports, Business, Entertainment). Pulls from a trusted publisher pool Perplexity maintains internally. Citation rate inside the news tab is approximately 96 percent. Trusted pool publishers receive disproportionate citation share.

ChatGPT News Mode. Live since July 2025. Citation rate on news queries is approximately 80 percent. ChatGPT pulls from OAI SearchBot indexed content plus a partner publisher pool (Axel Springer, AP, News Corp, Vox Media) under content licensing deals. Non partner publishers earn citation when OAI SearchBot has indexed the article and it matches query intent; partners receive structural advantage.

SearchGPT News. OpenAI's standalone search interface. News queries route through specialized news retrieval that weights recency and source reliability heavily. Citation patterns mirror ChatGPT news mode with somewhat higher non partner rate.

Gemini News. Google's news weighted Gemini mode. Pulls from Google News index plus the wider Gemini retrieval surface. Citation rate approximately 88 percent. Heavily weighted toward Publisher Center approved publications.

10.2 The Citation Premium for Breaking News

AI engines cite news sources at higher rates than non news sources because news queries are inherently recency dependent and AI engines have no internal training data on events that happened after the model's knowledge cutoff. The training data cutoffs as of May 2026:

training_data_cutoff_by_engine:
  gpt_5: "October 2024 plus continuous web retrieval"
  gpt_5_mini: "October 2024 plus continuous web retrieval"
  claude_opus_4_7: "January 2026 plus continuous web retrieval"
  claude_sonnet_4_5: "April 2025 plus continuous web retrieval"
  gemini_3_pro: "January 2026 plus continuous web retrieval"
  perplexity_sonar: "Real time retrieval on every query"

The continuous web retrieval layer means recent news is retrieved at query time. The retrieval system needs to find the news source. The signals that drive retrieval: news sitemap currency (the news sitemap reflects the publication within minutes of publication), NewsArticle schema completeness (the article is parseable as a news object), author authority on the beat (the byline links to a credentialed author with beat coverage history), publisher trust signals (the publication has masthead, corrections, ethics, ownership), topical entity match (the article's named entities match the query's named entities).

10.3 The Trusted Source Lists

Each AI engine maintains an internal trusted source list for news. Inclusion is partially documented (Perplexity has acknowledged a publisher quality tier) and partially opaque. Qualifying signals across engines: Publisher Center approval, press association memberships, International Fact Checking Network signatory status, journalism awards, Wikipedia article (Wikidata entity), valid HTTPS, NewsArticle schema, author Person schema with credentials, publisher Organization schema with sameAs, masthead with named editors, corrections policy with public log, ethics policy, ownership and funding disclosure, fact checking policy, AP Stylebook adherence statement, editor contact, inbound citations from established news orgs, mentions in academic journalism research, llms.txt present and current, GPTBot, OAI SearchBot, ClaudeBot, and PerplexityBot all allowed, and content licensing deals with major AI providers (Axel Springer, AP, News Corp pattern).

The path is not single signal. It is accumulation of credentials over 12 to 24 months. A new publication publishing the right policy pages on day one earns trusted status more slowly than a 10 year old publication adding them later.

10.4 AI Crawler Cadence for News

The primary AI engine crawlers operate on news publications at meaningfully higher cadence than on non news sites.

ai_crawler_news_cadence:
  GPTBot:             Every 15 to 60 minutes for high cadence publishers
  OAI-SearchBot:      Every 5 to 30 minutes; powers SearchGPT and ChatGPT search
  ClaudeBot:          Every 30 to 120 minutes; powers Claude web search
  PerplexityBot:      Every 10 to 45 minutes; powers Perplexity News
  Google-Extended:    Same as Googlebot; powers Gemini training and retrieval
  Applebot-Extended:  2 to 12 hours; powers Apple Intelligence news features

All respect robots.txt.

Publishers serious about AI news citation allow all of these in robots.txt and verify they are not blocked by upstream bot mitigation. The verification pattern:

# Verify each crawler is unblocked
for bot in "GPTBot" "OAI-SearchBot" "ClaudeBot" "PerplexityBot" "Google-Extended"; do
  status=$(curl -s -o /dev/null -w "%{http_code}" \
    -A "Mozilla/5.0 (compatible; ${bot})" https://example-news.com/)
  echo "${bot}: ${status}"
done

All six should return 200. Any 403 or 429 indicates upstream bot mitigation is blocking the crawler and the publication will not appear in that engine's citations. Cross reference framework-aicitations.md for the broader AI citation pillar and framework-perplexityspaces.md for Perplexity specifically.

10.5 The llms.txt Pattern for News

llms.txt is the emerging convention for declaring AI relevant content surfaces. For news publishers, llms.txt at the site root declares the publication identity in a > blockquote description, followed by sections linking to the publication policy pages (About, Masthead, Editorial Standards, Corrections, Ownership, Contact), content feeds (news sitemap, standard sitemap, RSS, Atom), beat hubs, and the author roster. Each link uses standard Markdown syntax. AI crawlers reading llms.txt use it as a high signal directory of the publication's organizational structure.


11. News Showcase and Subscription Walls

11.1 Google News Showcase

Google News Showcase is a contractual licensing program. Participating publishers receive payment from Google for licensing news content that appears in dedicated Showcase panels on Google News and Discover. Panels feature multiple articles from a single publisher with publisher branding, hero images, and panel level navigation. Showcase is not open enrollment; Google invites publishers based on internal quality assessment. The publicly stated criteria: established publisher with consistent original reporting, strong editorial standards (masthead, corrections, ethics policies required), sustainable monetization, compliance with Google News content policies, demonstrated commitment to news rather than aggregation. Approached publishers receive a contract specifying licensing terms, payment structure, and content delivery requirements. Technical integration delivers content through a dedicated CMS interface in Publisher Center.

11.2 Subscription Wall Handling for SEO

Publishers with paywalls face a tension: indexing requires Googlebot to read the content; monetization requires users to hit the paywall. The legitimate path is Google's Flexible Sampling plus the isAccessibleForFree schema property.

Flexible Sampling. Google permits publishers to allow Googlebot to see full content while users see a paywall after a configurable threshold (typically 5 to 10 free articles per month). Configuration in Publisher Center. Articles indexed via flexible sampling are not considered cloaking; Google has signed off on the pattern.

Meta tag. Add <meta name="article:isAccessibleForFree" content="false"> to paywalled pages.

Schema property. Set in NewsArticle:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "NewsArticle",
  "isAccessibleForFree": false,
  "hasPart": {
    "@type": "WebPageElement",
    "isAccessibleForFree": false,
    "cssSelector": ".paywalled-content"
  }
}
</script>

hasPart specifies which CSS selector contains paywalled content. Anything outside is freely accessible. The pattern allows mixed access: the first paragraphs free, the rest paywalled, and Google indexes both without flagging cloaking.

11.3 Discover and Registration Walls

Discover treats paywalled content differently from free content. Paywalled articles are eligible for Discover but with a paywall indicator in the feed: a small lock icon and "Subscribers only" label. Engagement on paywalled articles is typically 30 to 60 percent lower than on free articles. Subscription dependent publishers may prioritize paywall friction; ad dependent publishers may prioritize free Discover surface area.

A softer alternative is a registration wall. Users see content after creating a free account. The registration wall produces meaningfully better SEO outcomes than a hard paywall because users complete a single registration and access the entire site. Same schema pattern as paywalled (isAccessibleForFree: false with hasPart) but friction reduced.


12. Velocity vs Authority

12.1 The Publishing Velocity Equation

A publication's news SEO outcomes depend on two variables that interact non linearly: publishing velocity and authority. The interaction shapes optimization strategy.

Quadrant Examples Focus Risk
High velocity, high authority AP, Reuters, NYT, Washington Post, BBC Operational excellence Quality drift under cadence pressure
High velocity, low authority Content farms, aggregator sites Authority building, not more velocity Algorithmic demotion
Low velocity, high authority Trade publications, investigative outlets Beat depth, topical authority Discovery decay
Low velocity, low authority New publications, niche local, Substack Authority first, then velocity Invisibility on news surfaces

Small publishers in the low velocity low authority quadrant should publish less but better. Four thoroughly reported original stories per week with deep author credentials beats twenty lightly reported aggregated stories. The four deep stories earn citation and authority. The twenty light stories earn algorithmic suspicion and frequently a Helpful Content Update demotion. Large publishers in the high velocity high authority quadrant can sustain cadence because the authority pool absorbs individual story variance, but should still defend authority through editorial standards.

12.2 The Freshness Signal Trade Off

Freshness signals have diminishing returns. The first article on a breaking story earns disproportionate citation share. The fifth earns marginal additional citation. The fiftieth earns almost nothing and risks freshness signal dilution. Optimal pattern: cover breaking news fast, update the original with substantive developments, publish a comprehensive recap 24 to 72 hours later, publish analytical follow up at the one week mark. Three to four articles on a major story rather than ten to fifteen, each with a distinct angle.

12.3 The Beat Authority Compound

Beat authority compounds over time. A publication with 200 articles on local elections over 18 months has measurable advantage in election coverage over a publication with 10 articles. The compounding mechanism is topical entity density, internal linking depth, and named author beat coverage history. The strategic implication: pick beats and own them. A small publication owning three beats deeply beats a small publication covering fifteen beats lightly. Cross reference framework-topicalauthority.md for the broader topical authority cluster mechanics.


13. International News

13.1 Per Country Google News Editions

Google News operates per country editions (news.google.com/en-US, news.google.com/en-GB, news.google.com/ja, news.google.com/de, and many more). Each edition has its own topic feeds, its own Top Stories carousels, and its own publisher pool. A publication wanting visibility across multiple country editions must satisfy each edition's eligibility independently.

The configuration in Publisher Center supports region targeting. A publisher can target a specific country, multiple countries, or global English. Publications with separate regional editorial operations should run separate Publisher Center configurations for each region.

13.2 Multilingual News Sites

Publications publishing in multiple languages use hreflang to declare the language and region of each article.

<link rel="alternate" hreflang="en-US" href="https://example-news.com/2026/05/article-slug/">
<link rel="alternate" hreflang="es-US" href="https://example-news.com/es/2026/05/article-slug/">
<link rel="alternate" hreflang="fr-CA" href="https://example-news.com/fr/2026/05/article-slug/">
<link rel="alternate" hreflang="x-default" href="https://example-news.com/2026/05/article-slug/">

The schema inLanguage property reinforces the language declaration, and translationOfWork links translated versions to the original. Cross reference framework-hreflang.md for hreflang implementation specifics.

13.3 News Sitemap Language Declaration

The news sitemap declares language per article via the news:language element. Each <url> entry carries <news:publication> with <news:name> and <news:language> (BCP 47 short code), <news:publication_date> in ISO 8601, and <news:title> matching the article H1. Multilingual publications generate parallel <url> entries per language code, each pointing at the language specific URL with the appropriate <news:language> value.

13.4 Regional News Surfaces

Outside the Google ecosystem, regional news surfaces drive significant traffic in specific markets. Yahoo News Japan is the dominant news aggregator in Japan; inclusion requires a separate application through Yahoo Japan content partnerships with 4 to 12 week approval cycles. Naver News is the largest aggregator in South Korea and is highly gatekept; Naver maintains a curated publisher list and rarely admits new publications, with substantial traffic for those approved. Yandex News operational status is complex post 2022 due to regulatory and political pressure. Baidu News requires ICP licensing for content hosted in China; international publishers without China operations face significant barriers and typically target Chinese language Google News editions for diaspora audiences instead. Apple News and Apple News Plus uses the Apple News Format (ANF) or RSS feed; Apple News Plus is the paid tier with revenue share. Editions exist for US, UK, Canada, Australia. Microsoft Start is the default news surface on Windows 11 and Edge new tab; publisher inclusion through Microsoft Start Publisher Center. Lower volume than Google News but meaningful in Microsoft device dominant markets. Cross reference framework-international.md for the broader international SEO concerns.


14. Bubbles Hosted News SEO Toolchain

14.1 The Self Hosted Stack

The reference deployment runs on Bubbles, the Debian amd64 host at 169.155.162.118. Bubbles serves nginx with multiple virtual hosts. Each news publication site lives at /var/www/sites/[domain]/. No third party CDN, no third party proxy, no external bot mitigation that would interfere with crawler access. The entire stack is operated directly.

Rendering options. Option A: Headless WordPress on Bubbles, frontend regenerated on publish via Astro or Hugo static build. Familiar CMS plus fast static output. Option B: custom Next.js app on Bubbles using Incremental Static Regeneration with a 60 second revalidate window. Modern stack with fine grained revalidation. Option C: pure SSG with Markdown in git, built by Astro or Hugo, triggered by git hook on push. Simplest stack with no database.

The news sitemap lives at /news-sitemap.xml, regenerates every 5 minutes via cron, includes articles from the last 48 hours, stays under the 1000 URL cap. Indexing uses IndexNow with a key file at /var/www/sites/[domain]/[key].txt and post publish hook integration. Google Search Console is verified with sitemap submitted. URL Inspection live submission is reserved for the highest priority breaking stories. Schema validation runs in CI on every changed article. Monitoring covers uptime on Bubbles itself, nginx access logs grepped daily for crawler activity, and reverse DNS verification on suspicious bot user agents.

14.2 The News Sitemap Cron

The news sitemap regeneration runs every 5 minutes. The script at /usr/local/bin/regenerate-news-sitemap.sh enumerates articles published in the last 48 hours from /var/www/sites/[domain]/articles/, builds a urlset with the news: namespace, writes each <url> entry with <news:publication> (name and language), <news:publication_date> (ISO 8601 from the article metadata), and <news:title> (the article headline), then atomically renames the temp file to news-sitemap.xml so nginx never serves a partial file.

#!/bin/bash
set -euo pipefail
DOMAIN="${1:-example-news.com}"
SITE_ROOT="/var/www/sites/${DOMAIN}"
SITEMAP_PATH="${SITE_ROOT}/news-sitemap.xml"

{
    echo '<?xml version="1.0" encoding="UTF-8"?>'
    echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"'
    echo '        xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">'

    find "${SITE_ROOT}/articles" -name "*.json" -newer /tmp/cutoff_marker | while read -r article; do
        URL=$(jq -r '.url' "${article}")
        HEADLINE=$(jq -r '.headline' "${article}")
        DATE_PUBLISHED=$(jq -r '.datePublished' "${article}")
        LANGUAGE=$(jq -r '.language // "en"' "${article}")
        cat <<URLENTRY
  <url>
    <loc>${URL}</loc>
    <news:news>
      <news:publication>
        <news:name>Example News</news:name>
        <news:language>${LANGUAGE}</news:language>
      </news:publication>
      <news:publication_date>${DATE_PUBLISHED}</news:publication_date>
      <news:title>${HEADLINE}</news:title>
    </news:news>
  </url>
URLENTRY
    done

    echo '</urlset>'
} > "${SITEMAP_PATH}.tmp"

mv "${SITEMAP_PATH}.tmp" "${SITEMAP_PATH}"
touch -d "5 minutes ago" /tmp/cutoff_marker

The cron entry: */5 * * * * www-data /usr/local/bin/regenerate-news-sitemap.sh example-news.com >> /var/log/news-sitemap.log 2>&1

14.3 The IndexNow Post Publish Hook

IndexNow is the push protocol for immediate indexing. Bing and Yandex support it natively. Google has partial support. The post publish hook submits the URL on every publish:

#!/bin/bash
# /usr/local/bin/indexnow-submit.sh
# Called from CMS post publish webhook with the published URL as argument

set -euo pipefail

URL="${1}"
KEY=$(cat "$(dirname ${0})/indexnow-key")
HOST=$(echo "${URL}" | awk -F[/:] '{print $4}')

curl -s -X POST "https://api.indexnow.org/indexnow" \
    -H "Content-Type: application/json" \
    -d "{\"host\":\"${HOST}\",\"key\":\"${KEY}\",\"keyLocation\":\"https://${HOST}/${KEY}.txt\",\"urlList\":[\"${URL}\"]}"

14.4 The nginx Configuration

The nginx server block for a news publication on Bubbles handles HTTP to HTTPS redirect, www to non www redirect, TLS termination, document root at /var/www/sites/example-news.com/, dedicated location blocks for /news-sitemap.xml (with 300 second cache), /sitemap.xml (3600 second cache), /feed/ (mapping to the RSS XML), /llms.txt, and the IndexNow key file at the site root. Static assets get 30 day immutable cache headers. Known bad bots (Semrush, Ahrefs, MJ12bot, DotBot, PetalBot) get a 403 at the nginx level. The robots.txt at the site root is the single source of truth for crawler permissions.

server {
    listen 443 ssl http2;
    server_name example-news.com;

    ssl_certificate /etc/letsencrypt/live/example-news.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example-news.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;

    root /var/www/sites/example-news.com;
    index index.html;

    location = /news-sitemap.xml {
        default_type application/xml;
        add_header Cache-Control "public, max-age=300";
    }

    location = /sitemap.xml {
        default_type application/xml;
        add_header Cache-Control "public, max-age=3600";
    }

    location = /feed/ {
        try_files /feed/index.xml =404;
        default_type application/rss+xml;
    }

    location = /llms.txt { default_type text/markdown; }
    location ~ ^/[a-f0-9]{32}\.txt$ { default_type text/plain; }

    location / { try_files $uri $uri/index.html $uri.html =404; }

    location ~* \.(jpg|jpeg|png|gif|svg|webp|avif|css|js|woff2)$ {
        expires 30d;
        add_header Cache-Control "public, immutable";
    }

    if ($http_user_agent ~* (semrush|ahrefs|mj12bot|dotbot|petalbot)) {
        return 403;
    }

    access_log /var/log/nginx/example-news.com.access.log;
}

14.5 The robots.txt

The robots.txt explicitly allows news relevant crawlers and AI engine crawlers. Each entry is a separate User-agent: block with Allow: / to ensure parsers that read only the most specific block do not fall through to a restrictive wildcard. The catchall wildcard at the bottom permits the rest of the open web while disallowing administrative paths.

User-agent: Googlebot
Allow: /

User-agent: Googlebot-News
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Bingbot
Allow: /

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Applebot
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /?s=

Sitemap: https://example-news.com/sitemap.xml
Sitemap: https://example-news.com/news-sitemap.xml

14.6 The Editorial Workflow

Drafts live in headless WordPress. Review brings in assigning editor, copy editor, fact checker. Manual rich results test on staging. Legal review applies to stories involving named individuals, allegations, or court records. Publish triggers Astro or Hugo rebuild, news sitemap regeneration, IndexNow submission, RSS regeneration, social share. Minor edits ship without dateModified bump; substantive edits ship with bump and sitemap re push. Articles older than 18 months drop from news sitemap.

The pattern is repeatable. The same Bubbles host runs multiple publications simultaneously. Per-publication incremental cost is minimal.


15. Audit Mode

# Criterion Pass/Fail
NS1 Google News Publisher Center configured and approved
NS2 NewsArticle schema present on every news article
NS3 Schema headline matches article H1 exactly
NS4 Schema headline 110 characters or fewer
NS5 datePublished present with ISO 8601 timezone
NS6 dateModified updated for substantive updates
NS7 Author Person schema with sameAs, knowsAbout, hasCredential
NS8 Publisher Organization schema with logo at correct dimensions
NS9 Image at minimum 1200 px wide in NewsArticle and og:image
NS10 News sitemap configured at /news-sitemap.xml
NS11 News sitemap regenerates within 5 minutes of publication
NS12 Standard sitemap also includes news content
NS13 RSS feed configured and publication available
NS14 Masthead page present with named editors and reporters
NS15 Corrections policy page present with public log
NS16 Ethics policy page present
NS17 Ownership disclosure page present
NS18 Funding disclosure page present
NS19 Fact checking policy page present
NS20 Editor contact information visible
NS21 Every byline links to a real author page
NS22 Author pages have minimum 200 word bio with credentials
NS23 GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot all allowed in robots.txt
NS24 Google-Extended allowed in robots.txt
NS25 llms.txt present and current
NS26 LCP under 2.5 seconds on mobile
NS27 INP under 200 milliseconds
NS28 CLS under 0.1
NS29 Mobile experience excellent on real device test
NS30 Paywall schema present if applicable, isAccessibleForFree set correctly
NS31 IndexNow configured and submitting on publish
NS32 Beat coverage consistent over past 6 months
NS33 No re dating of old articles to game freshness
NS34 Corrections handled per AP Stylebook standards with visible notation
NS35 No generic Editorial Team or Staff bylines

Score 35. World class: 32+. Strong: 28-31. Adequate: 22-27. Failing: below 22.


End of Framework Document

Companion documents:

Want this framework implemented on your site?

ThatDevPro ships these frameworks as productized services. SDVOSB-certified veteran owned. Cassville, Missouri.

See Engine Optimization service ›