SEO & AI Engine Optimization Framework · May 2026

Initial Audit: full-site SEO baseline, prioritization, scoring

By Joseph W. Anady — Founder & Lead Engineer, ThatDevPro (BA Computer Engineering, MA Cybersecurity) · Updated May 2026

The Canonical 2026 Reference for Auditing a Newly Engaged Client Across SEO, AEO, AIO, and GEO Surfaces

A comprehensive installation and audit reference for the initial client engagement audit. The initial audit is the load bearing foundation that every retainer activity points back to. It establishes baseline state, identifies issues, surfaces opportunities, and produces the deliverable that scopes every subsequent month. Distinct from framework-ongoingaudit.md, which specifies the recurring monthly and quarterly audit cadence, this document specifies the one time comprehensive evaluation performed in the first two to four weeks of a new engagement.

Cross stack implementation note: code samples are written in plain HTML and bash. For React, Vue, Svelte, Next.js, Nuxt, SvelteKit, Astro, Hugo, 11ty, Remix, WordPress, Shopify, and Webflow equivalents of any visible page patterns, see framework-cross-stack-implementation.md. For pure client rendered SPAs see framework-react.md.

Quick answer

The initial audit is a one-time comprehensive evaluation performed in the first two to four weeks of a new client engagement. It establishes baseline state, identifies issues and opportunities across the 14-tier engine optimization stack, and produces a 12-page prioritized deliverable that scopes the next 90 days. It spans SEO, AEO, AIO, and GEO surfaces, runs 16 to 80 hours, and every retainer activity points back to it.

1. Document Purpose

1.1 What This Document Is

The initial client audit is the single most consequential deliverable produced in any engagement. It establishes the baseline, frames the scope, justifies the pricing, builds the trust that converts a discovery call to a signed retainer, and produces the prioritized roadmap that drives the first 90 days. Every framework in this library is referenced by the initial audit. The audit is the integration point.

Read Section 2 to collect client variables. Section 3 to pull data. Sections 4 through 12 to perform the audit across the 14 tiers. Section 13 to produce the deliverable. Section 14 to run the toolchain on Bubbles.

1.2 The Initial Audit vs the Ongoing Audit

The initial audit is performed once at engagement onset. It covers every framework, every page sampled, every dimension. The deliverable is a 12 page report. Effort range 16 to 80 hours.

framework-ongoingaudit.md covers the recurring monthly and quarterly audits. Recurring audits focus on delta from baseline, new issues, freshness of priority pages, and citation drift. Recurring audits run on a sampling cadence rather than a full site sweep. The initial audit establishes the comparison points.

1.3 Multiple Purposes Served

Baseline establishment, issue identification, opportunity identification, engagement scoping, pricing justification, trust building, documentation. A common failure mode is collapsing these into one: the audit gets executed as a problem hunt, the deliverable reads as a complaint list, the pricing conversation becomes adversarial. Treat each purpose as a separate audit pass.

1.4 Three Operating Modes

Mode A, Light Audit, 4 to 8 hours. For discovery calls and lead qualification. Output: a one to three page summary with go or no go recommendation.

Mode B, Standard Audit, 16 to 30 hours. For most paid client engagements at Tier 2 and Tier 3 pricing. Output: the 12 page deliverable specified in Section 13.

Mode C, Deep Audit, 40 to 80 hours and beyond. For enterprise engagements, complex sites, or audit only engagements. Output: a 25 to 50 page deliverable.

This framework specifies the Standard Audit. Light is a subset. Deep extends with additional analyses.

1.5 Required Tools and Access

GSC verified with 16 months of data, GA4 property access with 13 months, Bing Webmaster Tools verified, GBP management access if local, Screaming Frog SEO Spider, Sitebulb, server access logs (30 days minimum), Ahrefs or Semrush full account, CMS administrator access. Access provisioning typically takes 3 to 10 business days. Do not start the audit until all access is verified working.

2. Client Variables Intake

The intake captures every dimension across the 14 tiers of the engine optimization stack. Stored at /var/www/sites/[domain]/audit/initial/intake.yaml.

# INITIAL AUDIT FRAMEWORK CLIENT VARIABLES

# --- Business and Site Identity (REQUIRED) ---
business_name: ""
business_dbas: []
primary_domain: ""
secondary_domains: []
industry: ""
ymyl_classification: ""              # full_ymyl, partial_ymyl, lite_ymyl, non_ymyl
geographic_focus: ""                 # local, regional, national, multi_national
service_areas: []
years_in_business: 0
business_model: ""                   # b2b, b2c, marketplace, publisher, saas, local_service
employee_count: 0
revenue_band: ""

# --- Tier 1: Hosting and Rendering ---
hosting_provider: ""
hosting_type: ""                     # shared, vps, dedicated, cloud, self_hosted
http2_enabled: false
http3_enabled: false
rendering_pattern: ""                # static_html, ssr, ssg, csr_spa, hybrid
cms_platform: ""

# --- Tier 2: On-Page ---
title_tag_pattern_documented: false
breadcrumb_implementation: false
canonical_tag_coverage_percent: 0

# --- Tier 3: Content Quality ---
total_indexable_pages: 0
content_publishing_cadence: ""
author_bylines_present: false
author_pages_with_credentials: false
reviewer_credit_for_ymyl: false
hcs_score_estimate: 0                # out of 100
eeat_score_estimate: 0               # out of 130
infogain_categories_active: []

# --- Tier 4: Links ---
referring_domains_total: 0
domain_rating: 0
toxic_link_percent_estimate: 0
orphan_page_count: 0

# --- Tier 5: Schema ---
core_graph_present: false            # Organization, WebSite, WebPage, Person
schema_validation_pass_rate: 0
schema_types_in_use: []
sameas_network_completeness: 0
wikidata_qid: ""
knowledge_panel_present: false

# --- Tier 6: Technical ---
robots_txt_audited: false
sitemap_xml_audited: false
llms_txt_present: false
hreflang_implementation: ""
core_web_vitals_pass_percent: 0
javascript_required_for_primary_content: false

# --- Tier 7: UX ---
mobile_friendly_test_pass: false
accessibility_audit_completed: false
wcag_level: ""

# --- Tier 8: Local ---
gbp_claimed: false
gbp_completeness_score: 0
nap_consistency_score: 0
review_count_total: 0
review_average_rating: 0
local_pack_ranking_position: 0

# --- Tier 9: Multimedia ---
image_alt_coverage_percent: 0
video_count: 0
video_schema_present: false

# --- Tier 10: AI Surface ---
gptbot_allowed: false
claudebot_allowed: false
google_extended_allowed: false
perplexitybot_allowed: false
applebot_allowed: false
priority_queries_cited_in_aio: []
priority_queries_cited_in_chatgpt: []
priority_queries_cited_in_perplexity: []
priority_queries_cited_in_gemini: []
ai_citation_gap_percent: 0

# --- Tier 11: Branded ---
brand_search_volume_monthly: 0
brand_search_trend: ""
press_coverage_last_12_months: 0

# --- Tier 12: Monetization ---
primary_conversion_event: ""
sales_cycle_days_average: 0
attribution_model_in_use: ""
crm_platform: ""

# --- Tier 13: Analytics ---
ga4_property_id: ""
gsc_property_verified: false
bigquery_export_enabled: false
consent_mode_v2_implemented: false

# --- Tier 14: Governance ---
publishing_workflow_documented: false
editorial_review_required: false

# --- Engagement Variables ---
engagement_tier: ""
audit_scope: ""                      # light, standard, deep
audit_budget_hours: 0

The intake is collected over three sessions: a kickoff call, a follow up call for gaps, and a written questionnaire. The intake is locked at the start of the audit and updates after audit start are noted as addenda.

3. Pre-Audit Data Pull Checklist

Pull the data and verify every export before the audit begins. Missing data discovered mid audit forces re sequencing and adds days to delivery.

3.1 Google Search Console Exports

Pull 16 months of Performance data, the maximum GSC retains. Queries, Pages, Countries, Devices, and Search appearance exports per property. For sites above 5000 monthly clicks, use the GSC API to bypass row limits. The script at ~/Code/tdg-tools/gsc.py references the service account ai-403@encoded-equator-335209.iam.gserviceaccount.com and pulls full unsampled query data for any siteFullUser verified property.

Pull additionally: Coverage (indexed, excluded, error counts), Sitemaps report, Enhancements report (structured data), Manual actions, Security issues, Links report, Page experience report.

3.2 Google Analytics 4 Exports

Pull 13 months of GA4, the maximum under the default 14 month setting. Six reports: acquisition by default channel grouping, acquisition by source and medium, engagement by landing page, conversions by event name, pages and screens, tech overview. If BigQuery export is enabled, pull the raw events table for the audit window. If not enabled, recommend enablement.

3.3 Bing Webmaster Tools

Bing confirms or contradicts Google patterns. Pull search performance for the 6 month window Bing retains, indexed pages count, sitemap status, crawl information. If the client has not verified in Bing, complete verification during the audit.

3.4 Google Business Profile Insights

For local clients, pull profile views (search, maps), customer actions (calls, directions, website clicks, bookings), search queries used to find the profile, photo views. For multi location clients, pull insights per location.

3.5 Ahrefs or Semrush Full Export

Ahrefs Site Explorer: referring domains, backlinks, organic keywords, top pages, competing domains, content gap. Semrush analogs: Backlink Analytics, Position Tracking, Organic Research, Keyword Gap, Domain Analytics. For comprehensive audits, run both to cross verify.

3.6 Screaming Frog and Sitebulb Crawls

Run both crawlers in parallel per Section 4. Save the Screaming Frog crawl as a .seospider file. Export all tabs to CSV. The Sitebulb hint engine surfaces patterns the tabular output buries. Output sizes typically 200 to 800 MB for a midsize site.

3.7 Server Access Logs

Server access logs are the single most underused audit input. Logs reveal bot crawl behavior the GSC Crawl stats report aggregates and obscures. Pull 30 days minimum of raw access logs. For sites on Bubbles, logs live at /var/log/nginx/[domain].access.log and rotate daily.

sudo zcat /var/log/nginx/[domain].access.log.*.gz > /tmp/access-30d.log
sudo cat /var/log/nginx/[domain].access.log >> /tmp/access-30d.log

For sites not self hosted, request logs from the hosting provider. If logs are unavailable, note the limitation and rely on GSC Crawl stats.

3.8 Sitemap.xml, robots.txt, llms.txt

Pull the live discovery and access policy files.

curl -A "Mozilla/5.0 (audit)" -s https://[domain]/robots.txt > /tmp/audit/robots.txt
curl -A "Mozilla/5.0 (audit)" -s https://[domain]/sitemap.xml > /tmp/audit/sitemap.xml
curl -A "Mozilla/5.0 (audit)" -s https://[domain]/llms.txt > /tmp/audit/llms.txt
curl -A "Mozilla/5.0 (audit)" -s https://[domain]/ai.txt > /tmp/audit/ai.txt

A 404 on llms.txt or ai.txt is expected for many sites and indicates an opportunity. A 404 on robots.txt or sitemap.xml is a critical finding requiring remediation in the first 30 days.

3.9 Schema Markup Audit

Three schema validation approaches in parallel. Schema App or equivalent schema crawler reports schema type coverage. Google Rich Results Test on priority page templates (home, blog post, product, location, contact, about, author). Schema.org Validator pass on the same templates for syntactic correctness. The result is a three column table per template: Google rich result eligibility, Schema.org syntactic correctness, completeness against the reference graph from framework-schema.md.

3.10 Data Pull Manifest

Produce a manifest at /var/www/sites/[domain]/audit/initial/manifest.md listing every pulled data source with date pulled, file path, and sha256 checksum. The manifest is the chain of custody.

4. Crawl Architecture Audit

The crawl is the most important data collection step. Every other audit step references the crawl output.

4.1 Screaming Frog Configuration

Spider scope: include subdomains where the audit covers them, exclude external links from depth counts. Behavior: 5 to 10 concurrent threads (lower for shared hosting, higher for dedicated), respect robots.txt, follow nofollow when link graph completeness matters, ignore noindex during crawl but report in output, store HTML on disk for AI parsability comparison.

Rendering: JavaScript rendering with Headless Chrome, timeout 5 seconds for fast sites and 10 for slow, 1366x768 desktop and 360x640 mobile, capture both rendered and raw HTML.

User agent: rotate across Googlebot Mobile, Googlebot Desktop, GPTBot, ClaudeBot, PerplexityBot. The default Screaming Frog UA produces different responses than real bots on sites with bot specific routing. Run separate crawls per UA and compare for cloaking patterns.

Custom extraction: XPath or CSS selectors for page template identifier, author byline, publish date, modified date, schema JSON LD blocks, CTA button text, pricing display, trust badge presence.

Crawl size budget: under 1000 pages 30 to 90 minutes. 1000 to 10000 in 4 to 12 hours. 10000 to 100000 overnight to several days.

4.2 Sitebulb Configuration

Sitebulb runs in parallel. Hint based reporting and visual crawl mapping surface patterns the Screaming Frog tabular output buries. Full audit with all modules enabled. Crawl source: XML sitemap plus discovered URLs. JavaScript enabled. The hint engine surfaces 250 plus specific issue patterns from critical to advisory.

4.3 Custom Python Audit Scripts

Bubbles audit scripts at /home/user/audits/scripts/ cover patterns commercial tools miss.

python3 /home/user/audits/scripts/substrate-check.py --domain [domain] --urls /tmp/audit/priority.txt
python3 /home/user/audits/scripts/schema-completeness.py --domain [domain] --reference /home/user/audits/refs/schema-graph.json
python3 /home/user/audits/scripts/aio-gap.py --queries /tmp/audit/priority-queries.txt --engines aio,chatgpt,perplexity,gemini,claude,copilot
python3 /home/user/audits/scripts/llms-audit.py --domain [domain]

Output at /var/www/sites/[domain]/audit/initial/scripts/.

4.4 The Crawl Comparison Diagnostic

Run three crawls in parallel and compare URL counts. A: Screaming Frog default UA, no JavaScript rendering. B: Screaming Frog Googlebot Mobile UA, JavaScript rendering enabled. C: curl -A "GPTBot" pass on priority URLs inspecting first byte HTML.

If A produces 1200 URLs, B produces 1800, and C reveals priority pages with no H1 or lede in first byte, the site has a content first failure per framework-contentfirst.md. The 600 URL gap between A and B is JavaScript discoverable content invisible to first byte AI parsing. The C failure means the priority pages do not pass the AIO substrate test from framework-aioverviews.md Section 6.9.

4.5 Crawl Output Storage

Store crawl outputs at /mnt/storage/audits/[domain]/initial/crawls/. Each crawl gets a timestamped directory with the source file, exported CSVs, custom extraction outputs, and a manifest. Archived crawls enable the recurring audit cadence to perform delta analysis.

5. Indexability Diagnostic

Indexability is the prerequisite for ranking, citation, and conversion. A page that is not indexed cannot rank in classic SEO, cannot be cited in AIO, and cannot drive traffic. Indexability is the first finding in the audit deliverable.

5.1 The Three Circle Venn Diagram

Three sets of URLs. Set A: URLs the crawler discovered (Screaming Frog output, reachable from the homepage). Set B: URLs declared in the XML sitemap. Set C: URLs Google has actually indexed (GSC Coverage report and site: operator).

Seven regions, each with a specific diagnosis.

A only: intentional (admin URLs, faceted traps) or unintentional (orphaned content). Investigation required.

B only: the sitemap makes claims the site does not back up. Usually a stale sitemap not regenerated after URL changes.

C only: phantom URLs Google retains from prior site versions, parameter variants, or moved content without redirects. Often the source of duplicate content issues.

A and B not C: the "I asked Google to index this and it refused" set. Quality, manual action exclusion, crawl budget, or technical blocks.

A and C not B: indexed despite no sitemap inclusion. Often product variants or paginated content.

B and C not A: the crawler missed it. JavaScript rendering issue, login wall, or include rule configuration error.

A and B and C: the healthy intersection.

5.2 The Drift Metric

Drift = (A only + B only + C only) / (A + B + C) * 100

Below 5 percent is healthy. 5 to 15 percent indicates minor issues. 15 to 30 percent indicates systemic problems. Above 30 percent indicates crisis where indexability is the dominant blocker. The drift metric appears in the executive summary.

5.3 GSC Coverage Report Drill Down

The Coverage report categorizes excluded URLs by reason.

Excluded by noindex tag: verify every excluded URL's noindex is intentional. Common accidental sources: theme defaults applied during migration, staging settings carried to production, template variables not set per page.

Crawled not indexed: Google saw the page and decided not to index. Usually quality, duplication, or low priority. Sample 20 URLs and cross reference with the Section 6 content audit.

Discovered not indexed: Google knows the URL exists but has not crawled. Crawl budget issue. Common on very large sites and sites where the internal link graph buries the URL.

Duplicate Google chose different canonical: surfaces canonical declaration errors. Duplicate no user selected canonical: add canonical tags. Redirect: verify intentional. Soft 404: returns 200 but Google detected the page as effectively empty. Critical finding requiring content review or proper 404 status.

5.4 The site: Operator Cross Check

The site:[domain] operator returns the rough indexed URL count plus a sample. Cross check the GSC indexed count against site: results. Drift between them surfaces sampling and counting differences and reveals phantom indexed URLs the GSC report under counts.

5.5 Robots.txt and Meta Robots Audit

curl -A "Googlebot" -s https://[domain]/robots.txt
curl -A "GPTBot" -s https://[domain]/robots.txt
curl -A "ClaudeBot" -s https://[domain]/robots.txt

Identify User-agent and Disallow combinations blocking critical content. Common issues: a Disallow from migration never removed, a blanket User-agent: * Disallow: / accidentally deployed, AI bot Allow rules missing. Per page meta robots tags appear in Screaming Frog crawl output. Cross reference with the GSC Coverage excluded by noindex bucket.

5.6 XML Sitemap Audit

Completeness: every indexable URL appears in the sitemap or a referenced sub sitemap. Stale URLs are removed. Image and video sitemaps exist where multimedia is significant.

Accuracy: lastmod dates reflect actual content modifications. Priority and changefreq are honest or omitted.

Sitemap index pattern for large sites: a sitemap index at /sitemap.xml references multiple sub sitemaps. Each stays under the 50000 URL limit.

5.7 Indexability Findings Output

Deliverable indexability section: Venn counts, drift metric, GSC Coverage breakdown, robots.txt and sitemap findings, prioritized remediation. The remediation list is the foundation of Phase 1 work in the 90 day roadmap.

6. Content Quality Baseline

Content quality is the single highest leverage dimension in 2026. Helpful Content System updates, the September 2025 SQRG update, and AI Overview citation decoupling have elevated content quality from a Tier 3 concern to a Tier 1 gating factor. The content audit samples 50 pages stratified by traffic decile and scores each against three rubrics: HCS, E-E-A-T, and Information Gain.

6.1 The Stratified Sample

A pure random sample under represents top traffic pages. Pull the past 90 days of organic landing page traffic from GA4. Sort by sessions descending. Divide into 10 deciles. Sample 5 pages from each decile, weighted toward the top.

For sites with more than 1000 indexable pages, supplement with 10 audit lead judgment pages: known problem pages flagged in intake, pages cited in past competitor reviews, pages targeting strategic priority queries.

6.2 Helpful Content System Scoring

framework-hcs.md defines 18 signals. The audit applies a 12 signal subset, deferring full scoring to framework-contentaudit.md.

The 12 signals: original information, substantial value compared to peers, useful summary without padding, insightful analysis beyond reporting, first hand experience evident, author authority for topic, trustworthy signals through citations and transparency, honest title without clickbait, people first writing, topic mastery beyond surface, satisfying reader, no SEO manipulation. Each scores 0 to 5. Site HCS estimate is the mean across the 50 page sample.

6.3 E-E-A-T Scoring

framework-eeat.md covers 26 dimensions producing 130 points. The initial audit applies a 12 dimension subset.

Experience: first person account, specific events with dates, photos or evidence of activity, named contexts, evidence of duration. Expertise: author credentials displayed, credentials relevant to topic, prior published work referenced, professional affiliations linked. Authoritativeness: site mentioned in third party sources, Wikipedia or Wikidata presence, industry directory listings, professional licensing verified. Trustworthiness: clear contact information, About page describing business, no manipulative claims, accurate dates.

Each scores 0 to 5. Site E-E-A-T estimate is the mean.

6.4 Information Gain Scoring

framework-infogain.md defines 10 categories. Each page is scored as present or absent per category.

The 10 categories: novel data and original research, first hand observation and lived experience, edge case coverage and contrarian findings, comparison synthesis across separate documents, updated information correcting outdated common knowledge, specific instance vs general principle, quoted expert with verifiable credentials, named process or methodology proprietary to the source, geographic or temporal specificity, procedural detail beyond summary sources.

A page with 0 categories is a removal or rewriting candidate. A page with 5 or more is performing the Information Gain function and is a citation candidate.

6.5 Stale Date and Manipulation Detection

The audit flags content with date manipulation signals: dateModified updates without substantive content changes, visible "Updated" dates contradicting rendered date in schema, content with copyright dates rolled forward as the only change, author bylines added or changed without disclosure.

Detection method: pull the prior version from Wayback Machine, compare against current rendered HTML, surface cases where dateModified moved without underlying content change. Date manipulation is a Helpful Content System red flag.

6.6 Thin Content and AI Red Flags

Thin content shows up in three patterns: pages under 300 words on templates where 800 plus is appropriate, pages with original word count under 50 percent (the rest is template chrome), pages with high text to template ratio but no Information Gain (long but empty).

AI generated content is not inherently penalized but specific patterns are. The audit surfaces generic phrasing that does not differ between competitor sites, lists with parallel mechanical structure, conclusions that summarize without insight, author bylines for generic stock names, absent first person voice on topics that should carry it.

The audit does not assert "this was AI generated." It surfaces pattern signals. The bar: does the page contribute Information Gain a credentialed author with experience would write?

6.7 Missing Byline and Reviewer Credit

Every content page on a YMYL or partial YMYL site requires an author byline. YMYL pages additionally require a credentialed reviewer credit. Pages missing these signals are systematically excluded from AIO citation per framework-aioverviews.md Section 8.3.

The audit flags every content page missing byline or YMYL reviewer credit. The count is reported as "X pages of Y total YMYL pages lack required credential signals." This drives the Phase 1 content remediation work.

6.8 Missing Publish and Update Timestamps

Visible publish dates and dateModified dates are E-E-A-T trust signals. The audit flags pages missing either. Pages with publish date but no update history on content older than 12 months are flagged for refresh review.

6.9 Content Quality Summary Output

The deliverable content quality section includes the site HCS mean, site E-E-A-T mean, Information Gain category coverage, date manipulation flag count, thin content flag count, missing byline count, and prioritized remediation by traffic decile.

7. Schema and Entity Audit

Schema and entities are the structural layer the synthesis engines on AIO, AI Mode, ChatGPT, Perplexity, Gemini, Claude, and Copilot use to reconcile what the site is, who runs it, what it sells, where it operates, and how it relates to other entities. A site without comprehensive schema is invisible to the structural extraction layer regardless of how strong the content is.

7.1 Schema Type Coverage

Pull schema type coverage from the Screaming Frog crawl. The crawl extracts JSON LD blocks per page. Aggregate @type values to produce a coverage map.

Expected core schema types: Organization, WebSite, WebPage, Person (for authors). Per content template: Article or BlogPosting for articles, Product for e commerce, Service for service businesses, LocalBusiness for local sites, FAQPage where FAQ content exists, HowTo where procedural content exists, BreadcrumbList for breadcrumb implementations.

The coverage map shows count of pages with each schema type. A site with 200 product pages and Product schema on 80 of them has a 60 percent coverage gap on the highest priority template.

7.2 Validation Pass Rate

Three validation modes. Google Rich Results Test: tests against Google's stricter rich result eligibility rules. Pass rate is percentage of sampled pages producing zero errors and at least one valid rich result type.

Schema.org Validator: tests against schema.org syntactic correctness independent of Google's eligibility. Pass rate is percentage with zero errors and zero warnings.

Custom validation against the reference graph: tests for completeness against the @graph pattern from framework-schema.md.

A site can have 100 percent Schema.org pass rate (syntactically correct), 30 percent Rich Results pass rate (missing properties Google requires), and 0 percent reference graph pass rate (no @id cross references, no completeness).

7.3 sameAs Network Completeness

The sameAs property links the entity to authoritative external sources. The completeness of the sameAs network is a direct E-E-A-T signal. Audit the Organization schema sameAs. Count references. Verify each link resolves and the destination identifies the same entity.

Recommended for a business: Wikipedia where eligible, Wikidata, Crunchbase, LinkedIn company page, Facebook, Twitter or X, Instagram, industry directory entries, professional licensing board listing.

Recommended for a Person: Wikipedia where eligible, Wikidata, LinkedIn, ORCID for researchers, GitHub for technical people, professional licensing board, Google Scholar for academics.

A site with 0 references has zero external entity reconciliation. 3 to 5 is baseline. 8 plus is comprehensive coverage.

7.4 Wikidata Claim Status

Wikidata is the lower notability bar entry point to the Google Knowledge Graph.

curl -A "Mozilla/5.0" -s "https://www.wikidata.org/w/api.php?action=wbsearchentities&search=[business-name]&language=en&format=json"

If a QID exists, audit for completeness. Recommended properties: instance of (P31), country (P17), location (P276), industry (P452), founded by (P112), founding date (P571), website (P856).

If no QID and the business has external coverage, Wikidata creation is a Phase 2 deliverable. If no QID and no external coverage, the prerequisite is digital PR per framework-linkbuilding.md. Joseph's TDG entity at Q139709771 is the canonical example. Joseph's personal Q139592630 and MEGAMIND Q139592633 were speedy deleted as non notable, illustrating that Wikidata notability requires external press as input.

7.5 Knowledge Graph Entity Presence

curl -A "Mozilla/5.0" -s "https://kgsearch.googleapis.com/v1/entities:search?query=[business-name]&key=[api-key]&limit=10"

A non zero response indicates entity presence. Entity creation is downstream of sufficient external press, sameAs network establishment, and Wikidata claim resolution.

7.6 Schema and Entity Findings Output

The deliverable schema section includes the type coverage map, three pass rates, sameAs completeness, Wikidata status, Knowledge Graph entity presence, and prioritized remediation clustered into three phases: core graph deployment, content type schema deployment, entity reconciliation infrastructure.

8. Backlink Profile Snapshot

The backlink profile is the historical authority graph. The audit measures size, distribution, quality, and trajectory.

8.1 Referring Domain Count

The headline is referring domains, not raw backlinks. A single referring domain with 100 links counts once. Report total, growth over the past 12 months by quarter, and comparison to audited competitors.

8.2 DR or DA Distribution

Domain Rating (Ahrefs) and Domain Authority (Moz, via Semrush) approximate authority on a 0 to 100 logarithmic scale. Distribution matters more than the headline.

Bucket by DR or DA: 0 to 19 (low), 20 to 39 (moderate), 40 to 59 (good), 60 to 79 (high), 80 plus (exceptional). A healthy profile has most links in 20 to 59 with a tail of 60 plus and manageable 0 to 19 counts. The inverse suggests low quality or potentially toxic acquisition.

8.3 Anchor Text Distribution

Healthy. Branded anchors 35 to 50 percent. Naked URL 15 to 25 percent. Generic ("click here") 10 to 20 percent. Topical exact match 5 to 15 percent. Long tail descriptive 10 to 20 percent.

Red flag. Topical exact match above 25 percent. Branded below 20 percent. Repetitive anchor patterns across many low DR domains. Foreign language anchors on an English site. The exact match heavy pattern is the historical signature of paid link campaigns and a manual action risk per framework-spampolicies.md.

8.4 Toxic Link Signals

Toxic link signals require investigation, not automatic disavow. Signals warranting review: referring domains with no organic traffic, domains penalized or deindexed, domains in unrelated languages or industries, domains on PBNs detectable via shared hosting and themes. Remediation is staged: monitor first, request removal where approachable, disavow only after manual review confirms harm.

8.5 Link Velocity Over Time

Pull monthly referring domain counts for the past 24 months. Healthy: steady linear growth, seasonal patterns aligned with industry, predictable spikes around earned media. Suspicious: sudden spikes without corresponding press, exponential growth that levels off (paid campaign signature), large drops where natural link loss exceeds new acquisition.

8.6 Branded to Non Branded Ratio

A ratio at or above 1.0 indicates a brand mature profile. Below 0.5 indicates a manipulated or transactional profile. 0.5 to 1.0 is typical.

8.7 Top Linking Pages

Sort by referring domains pointing to a specific URL. Top linking pages reveal what content has earned authority. Cross reference against Section 6 content quality scoring. If top linking pages are also the highest Information Gain pages, the linking pattern is content driven. If commercial pages with low Information Gain, the pattern is transactional.

8.8 Lost Link Audit

Pull lost referring domains from the past 90 days. Each is a candidate for recovery via outreach. Classify: editorial removal, page deletion, redirect to a different target, competitor displacement. Competitor displacement is the most actionable: a page that linked to the client but now links to a competitor is an outreach target with a specific replacement argument.

8.9 Backlink Findings Output

Deliverable backlink section: referring domain count, growth, DR distribution, anchor distribution, toxic signal count, velocity pattern, branded ratio, top linking pages, and lost link recovery opportunities.

9. Technical SEO Diagnostics

Technical SEO covers the implementation layer. The audit verifies every technical foundation before higher leverage content and entity work.

9.1 HTTPS Coverage

Every page served over HTTPS. Mixed content not allowed. HSTS enabled. Verification: Screaming Frog crawl over both HTTP and HTTPS, check HTTP only URLs returning 200, redirect chains from HTTP to HTTPS pass through 301 (not 302), HSTS header present with reasonable max-age.

curl -I -s https://[domain] | grep -i strict-transport

9.2 HTTP/2 or HTTP/3 Enablement

curl --http2 -I -s https://[domain] | head -1
curl --http3 -I -s https://[domain] | head -1

A site on HTTP/1.1 in 2026 is operating without standard performance affordances and should upgrade as a Phase 1 hosting deliverable.

9.3 Canonical Tag Coverage and Correctness

Canonical tags are the dominant signal Google uses to deduplicate URLs. Coverage: percentage of indexable pages with a self referencing canonical tag. Healthy 95 to 100 percent.

Correctness: canonical tags must point to the actual canonical URL, must be absolute URLs, must use HTTPS, must use the trailing slash convention the site standardizes on. Audit for canonical tags pointing to non existent URLs, cross domain canonicals where unintended, and canonical chains where the target canonicalizes elsewhere.

9.4 hreflang Implementation

For sites with multiple language or region versions, hreflang is mandatory. Coverage: percentage of pages with annotations. Every hreflang annotation must be reciprocated on the target page. Tags must point to canonical URLs of the target language version. Common errors: missing x-default, incorrect language codes (en-uk instead of en-gb), missing reciprocal hreflang on target pages.

9.5 Robots.txt, Sitemap, and Structured Data

Per Sections 5.5, 5.6, and 7.2. Findings appear in the technical section of the deliverable.

9.6 Mobile-Friendliness

Beyond binary pass or fail: viewport meta tag present and correctly configured, tap targets sized at least 48x48 pixels and spaced appropriately, text legibility at default zoom, content fits viewport width without horizontal scroll, mobile responsive vs separate mobile site (separate is rare in 2026 and indicates legacy implementation).

9.7 Core Web Vitals from CrUX and Lab Data

Field data (Chrome User Experience report, real user) is the ranking signal. Lab data (PageSpeed Insights, synthetic) is the diagnostic tool. The audit pulls both.

Field data thresholds: LCP under 2.5 seconds, INP (replaced FID March 2024) under 200 milliseconds, CLS under 0.1. Segmented by mobile and desktop. Pass rate is the percentage of page loads meeting all three thresholds.

Lab data from PageSpeed Insights or Lighthouse surfaces specific opportunities: preload key resources, defer offscreen images, eliminate render blocking resources, reduce JavaScript execution time. The deliverable reports field data pass rate per device class plus the top 5 lab data opportunities across priority templates.

9.8 JavaScript Rendering Verification

The Section 4.4 crawl comparison surfaces JavaScript dependency issues. The technical section includes: pages requiring JavaScript for primary content rendering, schema injected via JavaScript invisible to AI bots, FAQ accordions built with JavaScript reveal patterns invisible to first byte HTML.

9.9 Technical SEO Findings Output

Deliverable technical section: HTTPS status, HTTP version, canonical coverage and errors, hreflang implementation, robots.txt and sitemap findings, structured data validation rates, mobile-friendliness, CWV pass rates, JavaScript rendering findings, and prioritized remediation.

10. AI Surface Audit

The AI surface audit measures citation presence across the seven major generative engines. Methodology is manual sampling because no automated tool comprehensively covers all engines in 2026.

10.1 Priority Query Selection

Select 10 priority queries representing the highest commercial intent terms and 10 informational queries representing the highest awareness building terms. The 20 query set is the test bed.

For each query, run in: Google AI Overviews, Google AI Mode, ChatGPT (default web mode), Perplexity (default mode with sources), Gemini, Claude (via Claude.ai), Microsoft Copilot, SearchGPT or OpenAI's search if available.

10.2 Citation Recording Methodology

For each query and engine, record citation present (yes or no), citation URL, citation context (the sentence or claim attributed), citation position. Use incognito mode, set geographic location to the client's target market, run the query verbatim, screenshot the result.

10.3 The AI Citation Gap Metric

The AI citation gap is the percentage of priority queries with zero citation across all audited engines.

AI Citation Gap = (queries with zero citation / total priority queries) * 100

A new client engagement frequently shows gaps of 60 to 90 percent. A mature optimized site shows below 30 percent. The metric appears in the executive summary because it represents the size of the AI surface opportunity.

10.4 Citation Pattern Analysis

If cited on Perplexity but not AIO, the content has structural extractability but not the entity authority AIO requires. If cited on ChatGPT but not Gemini, the content is reaching one synthesis engine's training data but not the other's index (ChatGPT training data takes months to update, Gemini updates near real time). If cited only on AI Mode but not AIO, the pattern reflects AI Mode's higher fan out producing broader candidate inclusion. AIO is the harder bar.

10.5 Brand Mention Detection

Beyond explicit citations, AI engines mention brands without linking. Per framework-aioverviews.md Section 3.2, AI Overview mentions brands 61 percent of the time. AI Mode mentions 37.6 percent. The brand mention rate indicates entity recognition independent of source selection.

10.6 Competitor Citation Overlap

For each priority query, record which competitors are cited. Competitors cited frequently across the priority set are the AI surface ranking targets.

10.7 Bot Access Audit

for bot in GPTBot ClaudeBot PerplexityBot Google-Extended OAI-SearchBot Applebot-Extended Meta-ExternalAgent; do
  echo "$bot:"
  curl -A "$bot" -s https://[domain]/robots.txt | grep -A 5 "User-agent: $bot"
done

If any bot is blocked, recommend specific Allow rules. The default recommendation is allow all reputable AI engine bots unless the client has specific content licensing concerns.

10.8 llms.txt and ai.txt Presence

The llms.txt file is the proposed AI specific content discovery file. The ai.txt file is the AI training data permission file. Neither is universally adopted as of May 2026. Pull per Section 3.8. If absent, recommend creation as a Phase 2 deliverable per framework-aicitations.md.

10.9 AI Surface Findings Output

Deliverable AI surface section: priority query citation matrix, AI citation gap percentage, competitor citation patterns, brand mention rates, bot access status, llms.txt and ai.txt status, and prioritized remediation per framework-aioverviews.md and framework-aicitations.md page patterns.

11. Competitive Snapshot

The competitive snapshot identifies 3 to 5 competitors and analyzes them against the same dimensions as the client. The output frames the engagement's positioning.

11.1 Competitor Identification

Three sources. Direct competitors named by the client during intake (validate each by verifying they rank on at least 5 priority queries). Ahrefs Competing Domains or Semrush Competitors (top 3 to 5 by organic keyword overlap). SERP analysis (for each of the 10 priority queries, record the top 3 ranking domains; those appearing on 5 plus are de facto competitors).

The final set combines all three weighted toward SERP analysis. Size settles at 3 to 5. Larger sets dilute analysis. Smaller sets miss patterns.

11.2 Competitor Audit Dimensions

For each competitor, perform a light audit: referring domains and DR, top ranking keywords count and traffic estimate, top traffic pages, content publishing cadence, schema implementation, author bylines and credential signals, AI surface citation patterns.

11.3 Content Gap Analysis

Semrush Keyword Gap or Ahrefs Content Gap. Inputs: the client domain and the 3 to 5 competitor domains. Output: keywords where competitors rank in the top 50 but the client does not rank in the top 100. Filter by intent, search volume (above 100 monthly), and difficulty. The filtered list is the priority content opportunity backlog. For the deliverable, summarize as a count and showcase 10 to 20 high opportunity gaps.

11.4 SERP Overlap Analysis

For the 10 priority queries, record SERP composition (top 10 organic positions, AI Overview presence, AIO cited sources, featured snippet source, People Also Ask). A competitor appearing on 8 of 10 priority SERPs is the dominant comparison point.

11.5 AI Overview Overlap Analysis

AIO overlap identifies competitors consistently cited in AIO results on priority queries. If competitor A is cited on 7 of 10 priority AIO results and the client is cited on 0, the AIO opportunity gap is the 7 queries.

11.6 Competitive Findings Output

Deliverable competitive section: competitor list, content gap output, SERP overlap matrix, AIO overlap matrix, and strategic implications.

12. Local Signals Audit

The local signals audit applies when the client serves a defined geographic market and competes in local pack and Google Maps results. For non local clients, this section is omitted.

12.1 Google Business Profile Completeness

The GBP audit covers every dimension. Business name (exactly matching legal name, no keyword stuffing). Business categories (primary plus up to 9 secondary). Address (exact match to legal). Service area defined precisely. Phone unique to this location. Website canonical domain without UTM parameters. Hours current including holidays. Photos minimum 10. GBP attributes, Q and A answered by the owner, full service catalog. Completeness score: percentage of fields completed. Healthy 90 percent or higher.

12.2 NAP Consistency Across Citations

NAP consistency is the foundation of local SEO trust. Audit across 50 citation sources.

Tier 1: GBP, Bing Places, Apple Maps, Facebook, Yelp. Tier 2: industry directories (Avvo for legal, Houzz for home services, ZocDoc for healthcare). Tier 3: general business directories (Yellowpages, Manta, Chamber of Commerce).

Flag drift from canonical NAP. Common patterns: phone number with different formatting, address with different unit number, business name with appended location, multiple names where the client has rebranded. NAP consistency score: percentage with exact match. Healthy 90 percent or higher.

12.3 Review Velocity and Rating

Pull from GBP, Yelp, Facebook, and industry specific review platforms. Healthy: steady acquisition (at least 1 per month on primary), average rating 4.5 plus, recent reviews dominating sentiment, owner responses to negative ones. Concerning: review spikes followed by drought, all reviews 5 stars with no variation, no owner responses, reviews referencing different business names.

12.4 Local Pack Ranking for Primary Terms

Use Local Falcon, BrightLocal, or equivalent geographic grid ranking tool. Configure a 7x7 grid centered on the business location with 0.5 to 2 mile spacing depending on market density. Run for each priority local term. Output: heatmap showing ranking position at each grid point. The center should rank 1 to 3. Edges degrade. The degradation pattern reveals competitive pressure and service area realism.

12.5 Local Schema Validation

LocalBusiness schema validation per Section 7.2 with local specific properties. Required: name, address (with PostalAddress subnodes), geo (with GeoCoordinates), telephone, openingHoursSpecification, priceRange, image. Recommended: aggregateRating, areaServed, hasMap (with Google Maps URL), sameAs. The hasMap and geo properties tie the business to its Google Maps entity, supporting the entity reconciliation chain AIO synthesis uses for local queries.

12.6 Apple Maps and Bing Places Presence

Apple Maps via Apple Business Connect: verify claimed, NAP matches, photos and hours current. Bing Places: verify claimed, NAP matches, profile complete. Bing powers Bing search, ChatGPT search, and Microsoft Copilot local. A client claimed in Google but not Apple or Bing is missing 20 to 30 percent of map and local search exposure.

12.7 Local Findings Output

Deliverable local section (if applicable): GBP completeness, NAP consistency, review velocity and rating per platform, local pack heatmap, local schema validation status, Apple Maps and Bing Places status, and prioritized remediation.

13. Findings Document Structure

The audit deliverable is a 12 page document sized for client consumption. Executive level readers consume the first two pages. Operational readers consume the full deliverable. Both audiences are served by the structure.

13.1 The 12 Page Structure

Page 1 Executive Summary, Page 2 Scoring Rubric, Pages 3 to 6 Prioritized Findings, Pages 7 to 10 Technical Appendix, Page 11 90 Day Roadmap, Page 12 KPIs and Success Metrics. A small site audit may run shorter. A complex enterprise audit may extend the technical appendix to 8 to 12 pages.

13.2 Executive Summary (Page 1)

Three paragraphs. P1: site overview (total pages, indexed pages, estimated monthly traffic, business model, geographic focus). P2: overall health (health score on a 0 to 100 scale, critical issue count, high priority count, AI citation gap percentage). P3: strategic framing.

Below the paragraphs, two boxed sections side by side. Left: Top 5 Critical Findings. Right: Top 5 Strategic Opportunities.

13.3 Scoring Rubric (Page 2)

A single table summarizing every dimension on a consistent scale.

Dimension	Score	Out of
Technical SEO		100
Page Experience		100
Schema and Structured Data		100
Content Quality (HCS)		100
E-E-A-T		130
Information Gain		10
Internal Linking		100
Keyword Strategy		100
Local SEO (if applicable)		100
Spam Policy Compliance	Pass/Fail
AI Surface Citation		100
Entity Authority		100
Backlink Profile		100
Competitive Position		100
Drift Metric (indexability)		100
Overall Health		100

The Status column uses Critical, Caution, Healthy. The visual pattern tells the executive the engagement's emphasis at a glance.

13.4 Prioritized Findings (Pages 3 to 6)

Organized by severity, not by framework.

Severity 1 Critical. Active harm to ranking, citation, or conversion. Security or penalty risk. Blocking other improvements. Each gets a subsection with description, evidence, impact, remediation, effort estimate, and recommended phase.

Severity 2 High Priority. Significantly limiting performance but not blocking. Each gets a paragraph with description, impact, and remediation summary.

Severity 3 Medium Priority. Improvement opportunities. Listed as a bulleted backlog.

Severity 4 Low Priority. Summarized as a count with examples.

13.5 Technical Appendix (Pages 7 to 10)

The supporting evidence the prioritized findings reference. Operational readers consume this section to plan implementation.

Section A, Crawl Analysis: crawl comparison diagnostic, indexability Venn, drift metric, GSC Coverage breakdown.
Section B, Content Analysis: stratified sample summary, HCS distribution, E-E-A-T distribution, Information Gain coverage, date manipulation flag count.
Section C, Schema Analysis: type coverage map, three pass rates, sameAs completeness, Wikidata status.
Section D, Backlink Analysis: referring domain count, DR distribution, anchor distribution, velocity pattern, lost link audit.
Section E, AI Surface Analysis: priority query citation matrix, AI citation gap, competitor citation overlap.
Section F, Technical SEO Analysis: HTTPS status, HTTP version, canonical coverage, hreflang status, CWV pass rates, JavaScript rendering findings.
Section G, Local Analysis (if applicable): GBP completeness, NAP consistency, review patterns, local pack heatmap.

13.6 90 Day Roadmap (Page 11)

Days 1 to 30, Phase 1: critical remediation, quick wins, foundational infrastructure. Typically 40 to 60 hours.

Days 31 to 60, Phase 2: high priority work, content quality improvements, schema deployment, AI surface optimization. Typically 40 to 60 hours.

Days 61 to 90, Phase 3: strategic positioning, competitive content production, link building, brand building. Typically 40 to 60 hours.

The roadmap ties to engagement pricing. A client paying for 20 hours per month sees a different roadmap than one paying for 60.

13.7 KPIs and Success Metrics (Page 12)

Baseline metrics as of the audit date. Targets at 90, 180, and 365 days.

Standard KPI set: organic sessions (GA4), organic non brand sessions, organic conversions (GA4 key events), AI citation rate on priority queries, GSC AIO impression baseline, indexed pages, referring domains (Ahrefs), brand search volume (GSC), local pack ranking position (Local Falcon if applicable).

KPIs are reviewed at every monthly check in. The recurring audits in framework-ongoingaudit.md re measure against the baseline.

13.8 Cross Reference Framework

The deliverable cross references this library throughout. Every finding references the framework that prescribes the remediation.

Finding: 23 of 30 sampled YMYL pages lack credentialed reviewer credit.
Impact: Pages without reviewer credit are systematically excluded from AIO citation. AI citation gap on YMYL queries currently 87 percent.
Remediation: Per [framework-aioverviews.md](framework-aioverviews.md) Section 6.7, deploy credentialed reviewer credit on all YMYL pages.
Effort: 8 hours template work plus 2 hours per page, total approximately 64 hours.
Phase: Phase 1 (gates AIO citation work in Phases 2 and 3).

The cross referencing makes the deliverable a navigation point into the framework library.

13.9 Deliverable Production

Produced in Markdown source at /var/www/sites/[domain]/audit/initial/deliverable.md and rendered to PDF via Pandoc per Section 14. The PDF is the client deliverable. The Markdown source is the audit team's working document and the input to the ongoing audit cadence comparison.

14. Bubbles-Hosted Audit Toolchain

The audit toolchain runs on the Bubbles self hosted server at IP 169.155.162.118. The toolchain is intentionally self hosted with no third party CDN or proxy in the path. Audit data is sensitive client information and the audit infrastructure mirrors the principle of client trust the engagement is built on.

14.1 Bubbles Server Profile

Bubbles is Debian amd64 with 16 GB RAM. LAN at 192.168.1.132 / 192.168.1.173. Tailscale at 100.90.97.104. Public at 169.155.162.118. SSH from M2 via Tailscale (ssh user@bubbles). Go at /usr/local/go/bin/go. Python 3.13 system installation.

Web hosting via nginx with vhosts under /var/www/ and /var/www/sites/. Audit deliverables under /var/www/sites/[domain]/audit/. Audit working files under /home/user/audits/.

14.2 Screaming Frog SEO Spider on Linux

Screaming Frog runs on Linux desktop via the official Debian package. Headless mode is available for automated crawls.

screamingfrogseospider --crawl https://[domain] \
  --headless --save-crawl \
  --output-folder /home/user/audits/[domain]/initial/crawls/sf/ \
  --config /home/user/audits/configs/sf-initial-audit.seospiderconfig \
  --export-tabs "Internal:All,External:All,Response Codes:All" \
  --export-format csv

The config codifies the Section 4.1 configuration: 5 to 10 concurrent threads, JavaScript rendering enabled, custom extraction rules, UA rotation. For very large crawls exceeding Bubbles' memory capacity, run from a development machine with more RAM and sync to Bubbles for storage and processing.

14.3 Sitebulb on Linux Desktop

Sitebulb runs on Linux desktop via the official AppImage. Sitebulb requires a graphical environment for the audit UI even when crawls are headless. Bubbles runs headless by default; Sitebulb audits initiate from a desktop client with results archived to Bubbles. Output: .sitebulb file plus PDF and CSV at /home/user/audits/[domain]/initial/crawls/sitebulb/.

14.4 Custom Python Audit Scripts

Scripts at /home/user/audits/scripts/. substrate-check.py tests curl -A "GPTBot" against priority URLs and verifies H1, lede, H2s, FAQ content, and schema appear in first byte. schema-completeness.py parses JSON LD blocks and compares against the reference graph at /home/user/audits/refs/schema-graph.json. aio-gap.py submits each priority query via SerpAPI and records citation presence per engine. llms-audit.py fetches llms.txt and ai.txt and validates format. indexability-compare.py consumes the Screaming Frog export, sitemap URL list, and GSC Coverage export and produces the three circle Venn from Section 5.1.

Each script outputs JSON to /var/www/sites/[domain]/audit/initial/scripts/[scriptname].json and a markdown summary.

14.5 Pandoc PDF Rendering

pandoc /var/www/sites/[domain]/audit/initial/deliverable.md \
  --output /var/www/sites/[domain]/audit/initial/deliverable.pdf \
  --pdf-engine xelatex \
  --template /home/user/audits/templates/audit-template.tex \
  --variable mainfont="Helvetica Neue" --variable monofont="Menlo" \
  --variable fontsize=10pt --variable geometry:margin=1in \
  --toc --toc-depth 2 --number-sections

The template at /home/user/audits/templates/audit-template.tex defines the cover page, headers and footers, and section heading styles. Branded with TDG colors and includes the SDVOSB designation. For interactive review, render to HTML via --output deliverable.html --standalone --self-contained.

14.6 Audit State Storage and Archive

Audit state in BadgerDB at /home/user/audits/db/: project metadata, version history of deliverables, audit decision log, cross reference graph linking findings to framework sections.

Deliverables and supporting data are archived to the Bubbles external 4.5 TB storage at /mnt/storage/audits/. The external storage is slow for random writes (SMR drive) but well suited for archive. Reference per project memory project_bubbles_external_storage.md.

/mnt/storage/audits/[domain]/initial/
  deliverable.pdf  deliverable.md  intake.yaml  manifest.md
  data/   gsc/ ga4/ bing/ gbp/ ahrefs/
  crawls/ sf/ sitebulb/
  scripts/ substrate-check.json schema-completeness.json aio-gap.json llms-audit.json indexability-compare.json
  reference-screenshots/

14.7 No Third Party CDN or Proxy Recommendations

The Bubbles audit toolchain operates without any third party CDN or proxy in the path. The client's audit data does not transit external infrastructure beyond the public internet between Bubbles and the audited site.

This is intentional. Audit data is sensitive. Routing through external infrastructure introduces dependencies and exposes data to additional parties. The Bubbles direct connection model is the architectural choice.

For client sites where CDN or proxy is part of the existing stack, the audit notes the configuration but does not recommend adding new third party infrastructure.

14.8 Toolchain Maintenance Cadence

Weekly: nginx access log rotation, BadgerDB compaction, deliverable PDF rendering verification.

Monthly: Screaming Frog and Sitebulb version updates, Python audit script updates against new framework versions, Pandoc template review.

Quarterly: full toolchain test pass against a known audit project, validation that every framework cross reference resolves, archive review on /mnt/storage/audits/.

Annually: framework library review, audit script regression test, BadgerDB migration if versions warrant.

End of Framework Document

Document version: 2.0 Last updated: 2026-05-14 Maintained by: ThatDeveloperGuy

The initial client audit is the foundation document every subsequent retainer activity references. This framework specifies the systematic process across the 14 tiers of the engine optimization stack, the data pull checklist, the crawl architecture, the indexability diagnostic, the content quality baseline, the schema and entity audit, the backlink profile snapshot, the technical SEO diagnostics, the AI surface audit that quantifies the citation gap, the competitive snapshot, the local signals audit, the findings document structure, and the Bubbles hosted toolchain.

Apply once at engagement onset for every new client. The output drives the first 90 days and establishes the baseline against which the recurring audit cadence in framework-ongoingaudit.md measures delta.

Companions

framework-ongoingaudit.md, recurring audit cadence
framework-contentaudit.md, deep content audit
framework-competitoraudit.md, deep competitive analysis
framework-gscanalysis.md, GSC data analysis
framework-technicalseo.md, technical SEO implementation
framework-schema.md, schema and graph pattern
framework-linkbuilding.md, backlink acquisition
framework-localseo.md, local SEO implementation
framework-aicitations.md, AI citation across engines
framework-clientonboarding.md, onboarding process
framework-reporting.md, reporting cadence
framework-eeat.md, E-E-A-T pillars
framework-hcs.md, Helpful Content System rubric
framework-infogain.md, Information Gain categories
framework-sqrg.md, Search Quality Rater Guidelines
framework-contentfirst.md, substrate doctrine
framework-aioverviews.md, AI Overview citation
framework-attribution.md, attribution architecture
framework-ga4.md, GA4 install
framework-ymyl.md, YMYL classification
framework-spampolicies.md, spam policy
framework-knowledgegraph.md, Knowledge Graph and Wikidata
framework-cross-stack-implementation.md, stack specific patterns
14 tier Engine Optimization Stack

Frequently asked questions

What is the difference between the initial audit and the ongoing audit?

The initial audit runs once at engagement onset, covering every framework, every dimension, and a sample of pages, producing a 12-page report over 16 to 80 hours. The ongoing audit (framework-ongoingaudit.md) covers recurring monthly and quarterly cadences, focusing on delta from baseline, new issues, freshness of priority pages, and citation drift. The initial audit establishes the comparison points the recurring audits measure against.

What are the operating modes for an initial SEO audit?

There are three modes. Mode A, Light Audit, runs 4 to 8 hours for discovery calls and lead qualification, output a one-to-three-page go/no-go summary. Mode B, Standard Audit, runs 16 to 30 hours for most paid Tier 2 and Tier 3 engagements, output the 12-page deliverable. Mode C, Deep Audit, runs 40 to 80 hours and beyond for enterprise or audit-only engagements, output a 25 to 50 page deliverable.

How do you measure indexability in a site audit?

Build a three-circle Venn of URLs the crawler discovered (A), URLs in the XML sitemap (B), and URLs Google indexed (C), producing seven diagnostic regions. Then compute the Drift Metric: (A only + B only + C only) / (A + B + C) x 100. Below 5 percent is healthy, 15 to 30 percent signals systemic problems, and above 30 percent indicates a crisis where indexability is the dominant blocker.

What is the AI citation gap and how is it calculated?

The AI citation gap is the percentage of priority queries with zero citation across all audited engines, calculated as (queries with zero citation / total priority queries) x 100. It is measured by manually running 20 priority queries across seven engines including AI Overviews, ChatGPT, Perplexity, Gemini, Claude, and Copilot. New clients frequently show gaps of 60 to 90 percent; a mature optimized site shows below 30 percent.

What does the initial audit deliverable contain?

It is a 12-page document: Page 1 Executive Summary, Page 2 Scoring Rubric, Pages 3 to 6 Prioritized Findings (organized by severity, not framework), Pages 7 to 10 Technical Appendix, Page 11 a 90-Day Roadmap split into three 40-to-60-hour phases, and Page 12 KPIs and Success Metrics with baseline and 90, 180, and 365-day targets. Every finding cross-references the framework prescribing its remediation.

Want this framework implemented on your site?

ThatDevPro ships these frameworks as productized services. SDVOSB-certified veteran owned. Cassville, Missouri.

See Engine Optimization service ›