SEO & AI Engine Optimization Framework · May 2026

Image SEO: alt text, image sitemap, ImageObject schema, AVIF/WebP

A comprehensive installation and audit reference for image search optimization — the discipline of making images discoverable, indexable, and authoritative in image search results, image carousels,…

Image Optimization, Alt Text Strategy, Image Search, Google Lens, Visual Entity Recognition, and Image Schema

A comprehensive installation and audit reference for image search optimization — the discipline of making images discoverable, indexable, and authoritative in image search results, image carousels, and visual recognition systems like Google Lens.

Cross-stack implementation note: the code samples in this framework are written in plain HTML for clarity. For React, Vue, Svelte, Next.js, Nuxt, SvelteKit, Astro, Hugo, 11ty, Remix, WordPress, Shopify, and Webflow equivalents of every pattern below, see framework-cross-stack-implementation.md. For pure client-rendered SPAs (no SSR/SSG) see framework-react.md. For Tailwind-specific concerns (purge, dynamic classes, dark-mode CLS, focus accessibility) see framework-tailwind.md.


1. Document Purpose

Image search drives substantial traffic for many sites — sometimes 20-40% of total organic traffic for visual industries (recipes, fashion, design, real estate, e-commerce, travel). Beyond direct traffic, images contribute to entity recognition, knowledge panel populations, and increasingly, AI engine visual understanding.

In 2026, image SEO encompasses:

This framework specifies image optimization across all dimensions.


2. Image File Optimization

2.1 Format Selection

AVIF is the best-compression modern format. About 50% smaller than JPEG and 20 to 30% smaller than WebP at equivalent perceptual quality. Browser support reached 95%+ in 2025 across Chrome, Edge, Firefox, Safari (16.4+), and all modern mobile browsers (caniuse.com AVIF row, sample N captured 2025-09).

WebP delivers excellent compression with near-universal support. About 25 to 35% smaller than JPEG. Keep as the second tier in the picture element fallback chain.

JPEG has universal support. Use as the final fallback in the picture element so legacy browsers and embedded webviews still render a photo.

PNG is lossless. Use only when transparency is required or for graphics with sharp edges (UI screenshots, line art, logos that cannot ship as SVG). Replace with AVIF/WebP for photos.

SVG for icons, logos, simple graphics. Scalable, tiny file size. Inline SVG when feasible (one fewer request, CSS targetable).

GIF is largely deprecated. Use APNG, AVIF animation, or short MP4/WebM video for animation. Animated GIFs are typically 10x to 30x larger than equivalent AVIF or H.264.

2.1a AVIF as the 2026 Standard

AVIF (AV1 Image File Format) is the recommended baseline for new image pipelines in 2026. Key reasons:

File size advantages. AVIF beats WebP by roughly 20 to 30% at SSIM-matched quality, and beats JPEG by 40 to 60% (Netflix engineering blog AVIF analysis, 2020; AOMedia benchmark suite, sample 1000 images, 2024). For a typical 1920x1080 hero photo at high quality:

typical_hero_file_sizes:
  jpeg_q85: ~280 KB
  webp_q80: ~190 KB
  avif_q55: ~125 KB

The AVIF quality scale is not equivalent to JPEG. AVIF q55 to q60 looks visually equivalent to JPEG q85. Lower q values are not a quality complaint, they are a different encoder.

Browser support in 2026. Chrome (85+), Edge (Chromium), Firefox (93+), Safari (16.4+ on macOS Ventura, 17+ on iOS), Samsung Internet, all major Android browsers. Estimated global support 96% (caniuse.com AVIF tracker, sample N captured 2025-10). The remaining 4% is mostly older iOS versions and locked-down corporate Chromes, all of which fall through cleanly to the WebP source.

Where AVIF still struggles.

AVIF encoding tools.

avifenc (libavif reference encoder, CLI):

avifenc --min 0 --max 63 --speed 6 --jobs 8 input.png output.avif
# Speed 6 balances encode time and quality. Speed 0 is slowest/best, speed 10 is fastest.

sharp (Node.js, recommended for build pipelines):

const sharp = require('sharp');
sharp('input.jpg')
  .avif({ quality: 55, effort: 6, chromaSubsampling: '4:2:0' })
  .toFile('output.avif');

ImageMagick 7.1+ with libheif:

magick input.png -quality 55 output.avif

squoosh-cli (Mozilla CLI wrapper around WASM encoders):

squoosh-cli --avif '{"cqLevel":33,"speed":6}' input.png

cavif-rs (Rust encoder, fastest single-threaded):

cavif --quality 55 --speed 6 input.png

Nginx mime-type configuration. Many older nginx installs do not include AVIF in the default mime.types. Add explicitly:

# /etc/nginx/mime.types or per-server block
types {
    image/avif    avif;
    image/avif-sequence avifs;
    image/webp    webp;
}

Without the mime type, AVIF files are served as application/octet-stream and browsers may refuse to render them. Verify with:

curl -I https://example.com/images/photo.avif | grep -i content-type
# Expected: Content-Type: image/avif

The picture element fallback chain. Always order sources from best to worst:

<picture>
  <source srcset="/images/photo.avif" type="image/avif">
  <source srcset="/images/photo.webp" type="image/webp">
  <img src="/images/photo.jpg"
       alt="Descriptive alt text"
       width="1600"
       height="900"
       loading="lazy"
       decoding="async">
</picture>

Browser picks the first source it understands. The <img> fallback inside <picture> runs alt text, dimensions, loading, and decoding attributes for the entire element, so put them on the <img>, not on every <source>.

Common AVIF gotcha. Some browsers (notably older Safari) sniff incorrectly when AVIF is served without the type attribute on <source>. Always include type="image/avif". Skipping it causes silent fallback to WebP even when AVIF would work.

2.2 Modern Image Delivery Pattern

<picture>
  <source srcset="/images/photo.avif" type="image/avif">
  <source srcset="/images/photo.webp" type="image/webp">
  <img src="/images/photo.jpg" 
       alt="Descriptive alt text" 
       width="800" 
       height="600"
       loading="lazy"
       decoding="async">
</picture>

2.3 Responsive Images

<img src="/images/photo-800.webp"
     srcset="/images/photo-400.webp 400w,
             /images/photo-800.webp 800w,
             /images/photo-1600.webp 1600w,
             /images/photo-2400.webp 2400w"
     sizes="(max-width: 768px) 100vw,
            (max-width: 1200px) 50vw,
            33vw"
     alt="Descriptive alt text"
     width="800"
     height="600"
     loading="lazy">

Browser selects the appropriate size based on viewport and pixel density.

How srcset and sizes actually work. srcset is a candidate list with intrinsic widths (the 400w value means "this file is 400 CSS pixels wide"). sizes is a hint that tells the browser how large the rendered image will be at various viewports, expressed as a CSS length. The browser does the math: viewport width times the matching sizes clause = the rendered width, multiplied by device pixel ratio, then picks the smallest candidate in srcset that is greater than or equal to that target.

Recommended size breakpoints. Generate at minimum four widths covering common viewports plus retina:

responsive_widths:
  mobile_1x: 400
  mobile_2x: 800
  tablet_1x: 800
  tablet_2x: 1600
  desktop_1x: 1200
  desktop_2x: 2400
  retina_hero: 3200  # for hero LCP only, on retina desktops

Combining picture + srcset for both format and size negotiation:

<picture>
  <source
    type="image/avif"
    srcset="/images/photo-400.avif 400w,
            /images/photo-800.avif 800w,
            /images/photo-1600.avif 1600w,
            /images/photo-2400.avif 2400w"
    sizes="(max-width: 768px) 100vw, (max-width: 1200px) 50vw, 33vw">
  <source
    type="image/webp"
    srcset="/images/photo-400.webp 400w,
            /images/photo-800.webp 800w,
            /images/photo-1600.webp 1600w,
            /images/photo-2400.webp 2400w"
    sizes="(max-width: 768px) 100vw, (max-width: 1200px) 50vw, 33vw">
  <img
    src="/images/photo-800.jpg"
    srcset="/images/photo-400.jpg 400w,
            /images/photo-800.jpg 800w,
            /images/photo-1600.jpg 1600w,
            /images/photo-2400.jpg 2400w"
    sizes="(max-width: 768px) 100vw, (max-width: 1200px) 50vw, 33vw"
    alt="Descriptive alt text"
    width="1600"
    height="900"
    loading="lazy"
    decoding="async">
</picture>

This is the gold-standard pattern. Browser negotiates both format and size.

Native lazy loading. loading="lazy" ships in every modern browser. The browser defers loading until the image is near the viewport. Combined with decoding="async", the image cannot block render.

Core Web Vitals LCP impact. The Largest Contentful Paint element on most marketing pages is the hero image. Three rules:

  1. Never lazy-load the LCP image. Use loading="eager" and fetchpriority="high" on the hero.
  2. Always specify width and height. Without explicit dimensions, the browser cannot reserve space, and CLS (Cumulative Layout Shift) spikes.
  3. Preload the LCP image with <link rel="preload" as="image" imagesrcset=... imagesizes=...> in <head> when the picture/srcset source set is known at render time.
<head>
  <link
    rel="preload"
    as="image"
    imagesrcset="/images/hero-800.avif 800w, /images/hero-1600.avif 1600w, /images/hero-2400.avif 2400w"
    imagesizes="100vw"
    type="image/avif"
    fetchpriority="high">
</head>

For Tailwind specific responsive image patterns and dark-mode CLS concerns, see framework-tailwind.md.

2.4 Compression Targets

compression_targets:
  hero_images: < 200KB (ideally < 100KB)
  content_images: < 100KB
  thumbnail_images: < 30KB
  product_images: < 150KB
  avatars: < 20KB
  icons: < 5KB (SVG preferred)

2.5 Dimension Standards

common_image_dimensions:
  hero_desktop: 1920×1080 or 1600×900
  hero_mobile: 800×450 or 600×337
  blog_featured: 1200×630 (also matches OG image dimensions)
  thumbnail: 400×225
  square_thumbnail: 400×400
  social_share: 1200×630 (Facebook/LinkedIn), 1200×675 (Twitter)
  schema_recommended:
    1x1: 1200×1200
    4x3: 1200×900
    16x9: 1200×675

For schema, providing all three aspect ratios (1x1, 4x3, 16x9) maximizes compatibility.


3. Alt Text Strategy

Alt text serves three functions:

  1. Accessibility (screen readers describe images to vision-impaired users)
  2. Search engine understanding (what the image contains)
  3. Fallback display when image fails to load

3.1 Alt Text Best Practices

Describe the image accurately and specifically:

<!-- BAD — generic -->
<img src="/dog.jpg" alt="dog">

<!-- BETTER — descriptive -->
<img src="/dog.jpg" alt="Golden retriever puppy playing fetch in autumn park">

<!-- BEST — context-appropriate -->
<img src="/dog.jpg" alt="Eight-week-old golden retriever puppy named Bailey playing fetch with red rubber ball in Cassville Memorial Park during autumn">

Match alt text to content context:

If the image illustrates a specific concept in the article, alt text should reflect that:

<!-- In article about color theory -->
<img src="/sunset.jpg" alt="Coastal sunset showing warm color palette transitioning from yellow at horizon through orange and red into purple twilight overhead">

Include relevant keywords naturally:

Don't keyword-stuff, but include relevant keywords if they describe the image:

<!-- For SEO services page -->
<img src="/team.jpg" alt="ThatDeveloperGuy SEO consultation session with small business owner reviewing search performance dashboards">

3.2 Alt Text for Different Image Types

Product images:

<img alt="Apple iPhone 15 Pro in titanium gray, side view showing camera array">

Decorative images (purely visual, no semantic content):

<!-- Use empty alt and CSS-only decoration when possible -->
<img alt="" role="presentation" src="/divider.png">

<!-- Or use CSS background-image for purely decorative graphics -->
<div class="decorative-pattern"></div>

Infographics:

<img alt="Infographic showing 14-tier framework with 112 individual optimizations across foundation, search visibility, AI domination, entity authority, local domination, content multimedia, social community, data analytics, monitoring intelligence, workflow operations, marketplace retail, international, technical advanced tiers">
<!-- Plus include detailed description in surrounding content -->

Charts and graphs:

<img alt="Bar chart showing organic traffic growth from 2023 to 2026, increasing from 5,000 monthly visitors to 47,000 monthly visitors">
<!-- Include data table or detailed description nearby for accessibility -->

Logos:

<img alt="ThatDeveloperGuy logo" src="/logo.svg">
<!-- For company logos, brief brand name is appropriate -->

People:

<img alt="Joseph Anady, founder of ThatDeveloperGuy, in office at Cassville Missouri location">

3.3 Alt Text Anti-Patterns

3.4 Decorative vs Informative Images

The single most misunderstood accessibility-SEO overlap. Every image on the page falls into one of two buckets:

Informative images carry content meaning. Alt text must describe what the image communicates.

Decorative images are pure visual flourish. They contribute no information beyond what surrounding text already conveys. These get alt="" (empty string, not missing attribute).

The critical distinction: alt="" is not the same as omitting alt entirely. Screen readers handle them differently:

<!-- Decorative divider, purely visual -->
<img src="/images/wave-divider.svg" alt="" role="presentation">

<!-- Decorative background flourish behind a hero heading -->
<img src="/images/hero-bg-pattern.svg" alt="" aria-hidden="true">

<!-- Decorative icon next to text label that already says "Phone" -->
<span class="icon-row">
  <img src="/icons/phone.svg" alt="" width="16" height="16">
  Phone: (417) 598-8753
</span>

<!-- Informative icon, no surrounding label -->
<a href="tel:+14175988753" aria-label="Call us">
  <img src="/icons/phone.svg" alt="Call (417) 598-8753" width="24" height="24">
</a>

Decision tree for "is this image decorative".

  1. Would removing this image make the page lose meaning? If yes, informative.
  2. Does surrounding text already describe what the image shows? If yes, decorative.
  3. Is the image purely a divider, background, or visual filler? If yes, decorative.
  4. Is the image inside a labeled anchor or button that already has aria-label or accessible text? If yes, the image is decorative within that context.
  5. Is the image a logo or brand mark inside a labeled wordmark wrapper? If the wrapper already labels the brand, the logo image is decorative.

The accessibility-SEO overlap. Search engines and screen readers consume the same alt attribute. Stuffing decorative images with keyword-laden alt text:

The clean rule: if the image were missing, would the page still make sense? If yes, the alt text is either empty or describes a decorative role. If no, the alt text must explain what the image contributes.

For the full accessibility framework, see framework-accessibility.md.


3a. Figure and Figcaption Semantics

For any image that warrants a caption, citation, or expanded description, the correct semantic is <figure> with <figcaption>. This outperforms a plain <img> with alt for three reasons.

3a.1 Why figure outperforms plain img

Explicit caption association. When you wrap an <img> in <figure> and place a <figcaption>, the browser and accessibility tree know the caption belongs to that image specifically, not to surrounding paragraphs. Screen readers announce the caption together with the image. Search crawlers extract the caption as image-adjacent text for image search context.

Accessibility tree improvements. The ARIA computed name for a <figure> defaults to its <figcaption> text. This means an image with sparse alt text but a rich figcaption still presents richly to assistive tech. The caption also serves as a visible label for sighted users, doubling its value.

Citation extractability. AI engines and image-search crawlers actively look for caption text when deciding what an image depicts. A figcaption with attribution, date, and description gives the crawler structured signal that no alt attribute can match.

3a.2 Code Samples

Basic figure with caption:

<figure>
  <img src="/images/cassville-storefront-2026.avif"
       alt="ThatDeveloperGuy storefront on Main Street, Cassville Missouri, with branded signage and large display window"
       width="1600"
       height="900"
       loading="lazy">
  <figcaption>
    ThatDeveloperGuy storefront on Main Street in downtown Cassville, Missouri, photographed in April 2026.
  </figcaption>
</figure>

Figure with photo credit and caption (most common e-commerce and editorial pattern):

<figure>
  <picture>
    <source srcset="/images/team-collab.avif" type="image/avif">
    <source srcset="/images/team-collab.webp" type="image/webp">
    <img src="/images/team-collab.jpg"
         alt="Joseph Anady reviewing client website analytics on dual monitors"
         width="1600"
         height="900"
         loading="lazy"
         decoding="async">
  </picture>
  <figcaption>
    <strong>Client review session, March 2026.</strong>
    Joseph Anady walks through Q1 organic traffic gains for a Cassville retail client.
    Photo by ThatDeveloperGuy.
  </figcaption>
</figure>

Figure with chart and detailed alt + caption (informative content):

<figure>
  <img src="/images/organic-traffic-2023-2026.svg"
       alt="Bar chart, organic traffic grew from 5,000 to 47,000 monthly visitors between January 2023 and March 2026"
       width="1200"
       height="675">
  <figcaption>
    Monthly organic search traffic, January 2023 through March 2026.
    Source: Google Search Console aggregated across 14 SEO clients.
    Sample N = 14 domains. Y-axis: unique sessions per month.
  </figcaption>
</figure>

Figure grouped into a gallery (multiple figures inside a parent):

<section aria-label="Project gallery">
  <figure>
    <img src="/images/proj-before.avif"
         alt="Outdated client homepage from 2023, dense block of text, no images, generic layout"
         width="800" height="600" loading="lazy">
    <figcaption>Before: original homepage, January 2023.</figcaption>
  </figure>
  <figure>
    <img src="/images/proj-after.avif"
         alt="Redesigned client homepage from 2026, hero image, clear value proposition, prominent CTA, responsive layout"
         width="800" height="600" loading="lazy">
    <figcaption>After: rebuild deployed March 2026.</figcaption>
  </figure>
</section>

3a.3 When NOT to Use Figure

The rule: if the image deserves a visible caption or sits as a standalone content unit you would cite or reference, wrap it in <figure>. Otherwise, plain <img> is fine.


4. File Naming Strategy

Image filenames are a minor signal but contribute to optimization.

file_naming:
  bad_examples:
    - DSC_4892.jpg
    - IMG_001.png
    - screenshot-2026-04-29.png
    - Untitled.jpg
    - photo (1).jpeg
  
  good_examples:
    - golden-retriever-puppy-cassville-park.jpg
    - thatdeveloperguy-team-collaboration.jpg
    - 14-tier-seo-framework-diagram.png
    - cassville-mo-storefront-exterior.jpg
    - joseph-anady-headshot.jpg

Pattern: Lowercase, hyphens (not underscores), descriptive of image content.

Don't: Stuff keywords, use random characters, use spaces, use special characters.


5. Image Schema

5.1 Image Object Schema

For specific images:

{
  "@context": "https://schema.org",
  "@type": "ImageObject",
  "@id": "https://example.com/images/team-photo.jpg",
  "url": "https://example.com/images/team-photo.jpg",
  "contentUrl": "https://example.com/images/team-photo.jpg",
  "width": 1600,
  "height": 900,
  "caption": "ThatDeveloperGuy team at Cassville office",
  "description": "Photo of Joseph Anady reviewing client website on dual monitors",
  "creator": {"@id": "https://example.com/#organization"},
  "copyrightHolder": {"@id": "https://example.com/#organization"},
  "creditText": "Photo by ThatDeveloperGuy",
  "copyrightNotice": "Copyright 2026 ThatDeveloperGuy",
  "datePublished": "2026-04-15",
  "license": "https://example.com/image-license/",
  "acquireLicensePage": "https://example.com/contact/"
}

Field by field:

5.2 Image License Schema

For images licensed for use, including the license + acquireLicensePage pair triggers the image license rich result in Google Image search (a small icon, license tooltip, and link to acquisition page).

Creative Commons licenses. Link directly to the canonical CC URL:

{
  "@context": "https://schema.org",
  "@type": "ImageObject",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "acquireLicensePage": "https://example.com/license-info/",
  "creditText": "Photo by Jane Photographer, CC BY 4.0"
}

Common CC license URLs:

creative_commons_licenses:
  CC0_public_domain: https://creativecommons.org/publicdomain/zero/1.0/
  CC_BY: https://creativecommons.org/licenses/by/4.0/
  CC_BY_SA: https://creativecommons.org/licenses/by-sa/4.0/
  CC_BY_ND: https://creativecommons.org/licenses/by-nd/4.0/
  CC_BY_NC: https://creativecommons.org/licenses/by-nc/4.0/
  CC_BY_NC_SA: https://creativecommons.org/licenses/by-nc-sa/4.0/
  CC_BY_NC_ND: https://creativecommons.org/licenses/by-nc-nd/4.0/

Stock photo attribution patterns. When using paid stock from Getty, Shutterstock, Adobe Stock, Unsplash+:

{
  "@context": "https://schema.org",
  "@type": "ImageObject",
  "license": "https://www.shutterstock.com/license",
  "acquireLicensePage": "https://www.shutterstock.com/image-photo/1234567890",
  "creditText": "Photo by FirstnameLastname / Shutterstock",
  "creator": {
    "@type": "Person",
    "name": "Firstname Lastname"
  },
  "copyrightHolder": {
    "@type": "Organization",
    "name": "Shutterstock, Inc."
  }
}

Editorial use only vs commercial. When a license is editorial-only (newsworthy use), include that in the description:

{
  "@type": "ImageObject",
  "license": "https://example.com/editorial-license/",
  "description": "Editorial use only. Not licensed for advertising, promotional, or commercial purposes.",
  "creditText": "Photo by Wire Service / Getty Images"
}

Royalty-free Unsplash and Pexels. These platforms grant broad usage rights but still benefit from attribution:

{
  "@type": "ImageObject",
  "license": "https://unsplash.com/license",
  "acquireLicensePage": "https://unsplash.com/photos/example-photo-id",
  "creditText": "Photo by Firstname Lastname on Unsplash"
}

5.3 Article and Page Image Schema

Images included in article schema (multiple aspect ratios maximize compatibility across Google surfaces):

{
  "@type": "Article",
  "image": [
    "https://example.com/images/article-1x1.jpg",
    "https://example.com/images/article-4x3.jpg",
    "https://example.com/images/article-16x9.jpg"
  ]
}

Or with full ImageObject nesting (richer):

{
  "@type": "Article",
  "image": [
    {
      "@type": "ImageObject",
      "url": "https://example.com/images/article-1x1.jpg",
      "width": 1200,
      "height": 1200
    },
    {
      "@type": "ImageObject",
      "url": "https://example.com/images/article-4x3.jpg",
      "width": 1200,
      "height": 900
    },
    {
      "@type": "ImageObject",
      "url": "https://example.com/images/article-16x9.jpg",
      "width": 1200,
      "height": 675
    }
  ]
}

5.4 Image-to-Video Schema Relationships

When images form a sequence, animated content, or supplement video material, pair ImageObject with VideoObject schema for richer search treatment.

Image carousel with video summary. A product page with 12 product images and a 30-second product video:

{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Walnut Dining Table",
  "image": [
    "https://example.com/images/table-front.jpg",
    "https://example.com/images/table-side.jpg",
    "https://example.com/images/table-detail.jpg"
  ],
  "video": {
    "@type": "VideoObject",
    "name": "Walnut Dining Table, 360 degree view",
    "description": "Rotating product video showing finish, joinery, and scale",
    "thumbnailUrl": "https://example.com/images/table-video-thumb.jpg",
    "contentUrl": "https://example.com/videos/table-360.mp4",
    "uploadDate": "2026-04-10",
    "duration": "PT30S"
  }
}

Animated content (formerly GIFs, now AVIF animation or short MP4). Treat as VideoObject, not ImageObject:

{
  "@type": "VideoObject",
  "name": "Product assembly animation",
  "thumbnailUrl": "https://example.com/images/assembly-thumb.jpg",
  "contentUrl": "https://example.com/videos/assembly.mp4",
  "duration": "PT8S",
  "uploadDate": "2026-04-10"
}

Search engines treat animated images as video for rich-result purposes. A GIF or AVIF-animation tagged as ImageObject loses video carousel eligibility.

Image sequences as how-to steps. Each step image can be its own ImageObject inside HowToStep:

{
  "@type": "HowTo",
  "step": [
    {
      "@type": "HowToStep",
      "name": "Step 1, prep the surface",
      "image": {
        "@type": "ImageObject",
        "url": "https://example.com/images/step-1.jpg",
        "width": 800,
        "height": 600
      }
    },
    {
      "@type": "HowToStep",
      "name": "Step 2, apply primer",
      "image": {
        "@type": "ImageObject",
        "url": "https://example.com/images/step-2.jpg",
        "width": 800,
        "height": 600
      }
    }
  ]
}

For full HowTo schema patterns, see framework-schema.md.


6. Image Sitemaps

Inform search engines about images on each page using the image sitemap extension. Two delivery models exist: inline in the main sitemap (per-URL <image:image> blocks), or a dedicated image sitemap that lists images independently.

6.1 Inline Image Sitemap (Recommended for Most Sites)

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>https://example.com/article/</loc>
    <image:image>
      <image:loc>https://example.com/images/hero.jpg</image:loc>
      <image:caption>Article hero image showing concept overview</image:caption>
      <image:title>Article Hero</image:title>
      <image:license>https://example.com/license/</image:license>
    </image:image>
    <image:image>
      <image:loc>https://example.com/images/diagram.png</image:loc>
      <image:caption>Diagram showing relationship between concepts</image:caption>
    </image:image>
  </url>
</urlset>

This model is the default. Each <url> entry can carry up to 1,000 <image:image> children. Image sitemaps help discovery especially for images loaded via JavaScript or in unusual page structures.

6.2 When to Use a Separate Image Sitemap

Split into a dedicated image sitemap when:

Pattern: keep the main sitemap for HTML URLs, add a separate /sitemap-images.xml for images. Reference both from robots.txt:

# /robots.txt
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-images.xml

The separate image sitemap uses the same schema, but each <url> <loc> should point to the HTML page that hosts the images, not to the image file itself.

6.3 Schema-Driven Image Sitemap Generation

When pages already carry JSON-LD ImageObject schema, generate the image sitemap from the schema rather than re-encoding the same metadata in two places. The build pipeline:

sitemap_build_steps:
  1: Crawl HTML pages or read the build manifest
  2: Extract all JSON-LD ImageObject and Article.image references
  3: Deduplicate by image URL
  4: Group images by hosting page
  5: Emit XML using schema fields (caption from schema.caption, license from schema.license)
  6: Validate against http://www.google.com/schemas/sitemap-image/1.1
  7: Write to /var/www/sites/[domain]/sitemap-images.xml

This keeps schema and sitemap in sync without manual duplication. If a new image is added to a page with schema, it shows up in the next sitemap rebuild automatically.

6.4 Multi-Domain Image Hosting Considerations

When images are served from a different host than the HTML page (e.g., HTML on example.com, images on cdn.example.com or images.example.com):

Both hosts must be verified in Search Console. Without verification of the image host, image sitemap entries pointing there are rejected as cross-domain.

The <image:loc> value uses the image host, not the HTML host:

<url>
  <loc>https://example.com/article/</loc>
  <image:image>
    <image:loc>https://images.example.com/article-hero.jpg</image:loc>
    <image:caption>Article hero image</image:caption>
  </image:image>
</url>

Pseudo-CDN model. Many sites serve images from a path like /static/ or /images/ on the same domain. This is the simplest model and avoids cross-domain verification. Recommended for most clients unless image bandwidth justifies splitting.

Subdomain split model. When images live on images.example.com, both example.com and images.example.com must be verified properties in Search Console, and the image sitemap is best served from example.com (the HTML host) listing image URLs on the image subdomain.

6.5 Image Sitemap Field Reference

Field Required Notes
image:loc Yes Absolute URL of the image file
image:caption No Short caption, ideally matches <figcaption>
image:title No Title for the image, distinct from caption
image:license No URL to license document. Pair with acquireLicensePage in JSON-LD for full license rich result
image:geo_location No Deprecated by Google; was free-text location string

Notes:


7. Lazy Loading Strategy

<!-- Standard lazy loading -->
<img src="/image.jpg" alt="..." loading="lazy">

<!-- Eager loading for above-fold images -->
<img src="/hero.jpg" alt="..." loading="eager" fetchpriority="high">

<!-- Async decoding -->
<img src="/image.jpg" alt="..." loading="lazy" decoding="async">

Above-fold images: loading="eager" + fetchpriority="high" for the LCP image.

Below-fold images: loading="lazy" to defer loading until needed.

Don't lazy-load LCP image — destroys LCP score.


8. Origin-Side Image Optimization

For most sites the better path is origin-side optimization: a one-time encode at build or upload, AVIF + WebP + JPEG variants written into /var/www/sites/[domain]/images/, served directly by nginx with proper cache headers. No third-party dependency, no edge billing, full control.

A typical build pipeline:

build_pipeline_steps:
  1_intake: Drop source PNG or JPEG into /var/www/sites/[domain]/images/_source/
  2_encode_avif: Generate 400w, 800w, 1600w, 2400w AVIF via sharp or avifenc
  3_encode_webp: Generate same widths as WebP via sharp or cwebp
  4_encode_jpeg: Generate same widths as JPEG via sharp or mozjpeg (for fallback)
  5_strip_metadata: Strip EXIF, GPS, and orientation tags (see Section 9)
  6_write_manifest: Emit a JSON manifest of (path, width, height, format) for runtime
  7_invalidate: Bust nginx cache for the affected paths

Example sharp-based encoder snippet (Node.js):

const sharp = require('sharp');
const path = require('path');
const fs = require('fs');

const SOURCE = '/var/www/sites/example.com/images/_source/photo.jpg';
const OUT_DIR = '/var/www/sites/example.com/images/';
const WIDTHS = [400, 800, 1600, 2400];

const base = path.basename(SOURCE, path.extname(SOURCE));

for (const w of WIDTHS) {
  sharp(SOURCE).resize({ width: w })
    .avif({ quality: 55, effort: 6 })
    .toFile(`${OUT_DIR}${base}-${w}.avif`);
  sharp(SOURCE).resize({ width: w })
    .webp({ quality: 80, effort: 6 })
    .toFile(`${OUT_DIR}${base}-${w}.webp`);
  sharp(SOURCE).resize({ width: w })
    .jpeg({ quality: 82, mozjpeg: true })
    .toFile(`${OUT_DIR}${base}-${w}.jpg`);
}

Nginx serving with format negotiation via Accept header (optional, complements picture/source negotiation):

# In server block
map $http_accept $webp_suffix { default ""; "~image/webp" ".webp"; }
map $http_accept $avif_suffix { default ""; "~image/avif" ".avif"; }

location ~* ^/images/(.+)\.(jpg|jpeg|png)$ {
    add_header Vary Accept;
    try_files /images/$1$avif_suffix /images/$1$webp_suffix /images/$1.$2 =404;
    expires 1y;
    add_header Cache-Control "public, immutable";
}

This is a complement to picture+source negotiation, not a replacement. The picture element gives you per-element fallback. Accept-header negotiation gives you a fallback for <img src="..."> tags that did not get wrapped.

Image transformation services (Cloudinary, Imgix) exist for sites where build-time encoding is impractical (user-uploaded content at scale, headless CMS workflows with thousands of images). For static-site or smallish dynamic sites, origin-side encoding is faster, cheaper, and avoids third-party dependencies. For the cross-stack equivalents of the encoding pipeline in WordPress, Shopify, Next.js, etc., see framework-cross-stack-implementation.md.


9. Google Lens and Visual Search Optimization

Google Lens identifies entities, products, places, plants, animals, and text in images. It is a separate ranking surface from traditional Image Search, with different inputs and behaviors. An image can rank in Image Search but be invisible to Lens, or vice versa.

9.1 What Makes Images "Lens-Friendly"

Five concrete attributes determine Lens recognition:

1. Product shots from multiple angles. Lens's object-matching index benefits from breadth. A single front-facing product photo gets matched less reliably than a set of front, side, three-quarter, top, and detail shots. For e-commerce, ship at minimum:

product_photo_set_minimum:
  - front_facing_neutral_bg
  - three_quarter_left
  - three_quarter_right
  - back_view (if relevant)
  - top_down
  - detail_close_up (texture, joinery, label)
  - in_context_lifestyle (optional but valuable)

Each photo gets its own ImageObject schema entry, each linked from Product.image as an array.

2. Neutral backgrounds for object recognition. Lens uses object segmentation. A product on a busy lifestyle background segments worse than the same product on white, light gray, or a single solid color. Lifestyle shots are valuable for human conversion, but the primary catalog shot for Lens should be on a clean background.

background_color_guidance:
  best_for_lens: pure_white (#FFFFFF) or light_gray (#F5F5F5)
  acceptable: any single solid color
  weaker_for_lens: gradient, textured, patterned, lifestyle

3. Alt text matching what Lens would identify. Lens internally generates a label for each query image, then matches against indexed images with similar visual features AND similar contextual text. Alt text that matches Lens's likely label boosts ranking. For a walnut dining table, alt text like "walnut dining table, four legs, mid-century modern" matches Lens's likely internal label better than "Bailey Studios product".

4. Structured data for visual search products. Product schema with brand, sku, gtin, and offers/price gives Lens enough metadata to render a rich shopping result with price and availability when the image matches.

{
  "@type": "Product",
  "name": "Walnut Mid-Century Dining Table",
  "image": ["url1", "url2", "url3"],
  "brand": {"@type": "Brand", "name": "Bailey Studios"},
  "sku": "BST-DT-WAL-72",
  "gtin13": "0123456789012",
  "offers": {
    "@type": "Offer",
    "price": "1899.00",
    "priceCurrency": "USD",
    "availability": "https://schema.org/InStock"
  }
}

5. Image quality and resolution. Lens prefers high-resolution source images. The image-search index downsamples to multiple thumbnails internally, but the source needs enough detail. Aim for minimum 1200px on the long edge for any image you want Lens-discoverable.

9.2 Lens-Aware Image Pages

A product detail page or article page that wants strong Lens performance ships:

9.3 Cross-Reference: Visual Search Framework

For deeper coverage of Pinterest visual search, image-based product discovery on Amazon and Etsy, Bing Visual Search, and AI multimodal image search (ChatGPT image queries, Gemini image input), see framework-visualsearch.md. That framework covers the platforms; this one covers the on-page implementation that feeds them.

9.4 Entity Establishment for Visual Recognition

Lens recognizes entities (a specific person, a specific product, a specific landmark) more reliably when the entity has been established in the Knowledge Graph. Reverse: if Lens can recognize the entity from anywhere on the web, your images of that entity rank better because they have a known anchor.

The chicken-and-egg solution: provide ImageObject schema that links to your Organization/Person/Product @id, ensure those entities have consistent reference images across the site, and pursue the entity-establishment path documented in framework-knowledgegraph.md.

For locally-recognized entities (a small business storefront, a regional product), Lens leverages local search signals. The same NAP consistency that powers local SEO powers local visual recognition.


9a. AI Engine Image Consumption

AI search engines and chat assistants treat images very differently from traditional crawlers. Understanding this difference shapes what you ship.

9a.1 What Reading-Mode Bots Actually Read

The bots used by ChatGPT, Claude, Perplexity, Google AI Overviews, and Bing Copilot to populate cited content are reading-mode bots. They fetch the HTML, extract text content, and rarely fetch or decode binary image data. What gets consumed:

What does not get consumed:

The practical implication: for AI engine consumption, the alt text and figcaption are the image. Everything else is invisible. This raises the bar on alt text quality. If alt text is missing or generic, the image effectively does not exist to the AI engine.

9a.2 Multimodal AI Engines (The Exceptions)

Some surfaces do consume image bytes:

For all of these, the on-site context (alt text, figcaption, schema) is still the primary signal. Image bytes alone, without text context, do not rank.

9a.3 What to Ship for AI Engine Image Coverage

ai_engine_image_checklist:
  - rich_alt_text_on_every_informative_image
  - figcaption_on_images_that_warrant_one
  - imageobject_schema_with_caption_description_creditText
  - image_adjacent_prose_that_describes_the_image_in_context
  - structured_data_linking_image_to_parent_entity (Product.image, Article.image, Person.image)
  - aeo_json_or_llms_txt_referencing_key_images (see framework-aeo.md)

9a.4 The "Image Is Cited" Goal

When an AI engine cites your page in a response, it sometimes pulls an image alongside the text citation. The image it picks is usually the schema-declared primary image (Article.image, OpenGraph image, or twitter:image). Ensure these are set, are 1200x630 minimum, and visually represent the page topic.

For the broader AI engine optimization framework, see framework-aeo.md and framework-llmoptimization.md.


9b. EXIF Data and Privacy

Every photo taken with a phone or digital camera embeds EXIF metadata: timestamp, camera model, lens, focal length, ISO, shutter speed, white balance, and (critically) GPS coordinates if location services were on. This data persists when the image is uploaded to a website unless explicitly stripped.

9b.1 The Privacy Risk

A real-estate photo with EXIF GPS coordinates discloses the property's exact location, sometimes more precisely than the listing URL admits. A team headshot uploaded by an employee can reveal the employee's home address if the photo was taken at home with GPS on. A product photo from a freelance photographer can disclose the photographer's studio address.

The geotag risk is the highest-stakes privacy leak in image uploads. Other EXIF fields (camera model, timestamp) are lower-risk but still unnecessary in public-facing assets.

9b.2 The Standard EXIF Strip Workflow

Strip EXIF on every image at upload or build time. Three reliable tools:

ImageMagick:

magick input.jpg -strip output.jpg

The -strip flag removes all profiles and metadata.

ExifTool (more granular):

exiftool -all= input.jpg
# Strips all EXIF, GPS, IPTC, XMP

To strip only GPS while preserving copyright and color profile:

exiftool -gps:all= -xmp:gps:all= input.jpg

sharp (in a Node build pipeline):

sharp('input.jpg')
  .rotate()                    // Apply EXIF rotation, then strip
  .withMetadata({ exif: {} })  // Empty EXIF
  .jpeg({ quality: 82, mozjpeg: true })
  .toFile('output.jpg');

Or more explicitly:

sharp('input.jpg')
  .rotate()
  .removeAlpha()
  .toBuffer({ resolveWithObject: true })
  .then(({ data }) => sharp(data).jpeg().toFile('output.jpg'));

By default, sharp strips most metadata unless you explicitly call .withMetadata(). Verify by re-reading EXIF on the output.

Verify EXIF strip:

exiftool output.jpg | grep -i -E "gps|location|camera|lens"
# Expected: no output, or only basic format info

9b.3 When to Preserve EXIF

Some metadata is worth keeping:

A balanced strip workflow:

exiftool -all= -tagsfromfile @ \
  -ColorSpaceTags -ICC_Profile \
  -copyright -creator -credit \
  input.jpg

This strips everything, then re-applies color profile and copyright fields.

9b.4 EXIF and SEO

EXIF GPS data is not used as a ranking signal in 2026. Google does not extract GPS coordinates from image files for image-search localization. Local relevance comes from page-level signals: NAP, LocalBusiness schema, page content, GBP linkage.

The reason to strip EXIF is privacy, not SEO. There is no SEO penalty for keeping EXIF, but the privacy risk for clients and subjects is the binding constraint.

For the broader privacy and security posture framework, see framework-security.md.


10. Audit Rubric

The image SEO audit operates at three layers. The per-image rubric runs over each image asset. The site-wide rubric runs over pipeline, schema, and sitemap. The first-90-days subset is what to triage when starting a new client engagement.

10.1 Per-Image Rubric (15 items)

Run against every important image on the site. "Important" means hero images, product photos, article images, team photos, infographics. Decorative dividers and UI icons get a lighter pass.

# Criterion Pass condition
P1 Modern format served AVIF source in <picture>, WebP fallback, JPEG/PNG final fallback
P2 Width and height attributes set Explicit width and height integers on the <img>
P3 Alt text present and descriptive Specific, not generic, not stuffed, not "image of"
P4 Alt text matches context Matches surrounding content meaning
P5 Decorative images use empty alt alt="" not missing attribute
P6 Filename descriptive Lowercase, hyphens, content-descriptive
P7 Compression appropriate Hero < 200KB, content < 100KB, thumb < 30KB
P8 Responsive srcset present At least 3 widths covering mobile, tablet, desktop
P9 sizes attribute matches layout Reflects actual rendered size at each breakpoint
P10 Lazy loading correctly applied loading="lazy" on below-fold, loading="eager" on LCP
P11 LCP image preloaded <link rel="preload" as="image"> in head if hero
P12 figure/figcaption used where warranted Caption-worthy images wrapped in figure
P13 EXIF stripped No GPS, camera, or timestamp metadata in file
P14 ImageObject schema attached JSON-LD ImageObject with caption, description, license
P15 Image included in sitemap Listed in /sitemap-images.xml or inline image sitemap

Per-image score: 15. World-class: 14+/15 across a 20-image sample.

10.2 Site-Wide Rubric (10 items)

Run once per site audit. Evaluates pipeline, infrastructure, and aggregate practices.

# Criterion Pass condition
S1 Image encoding pipeline in place Build-time AVIF + WebP + JPEG generation
S2 Nginx mime types include AVIF image/avif registered, verified via curl
S3 Image sitemap deployed and referenced from robots.txt XML present, valid, referenced
S4 OG and Twitter Card images set on every page og:image and twitter:image present site-wide
S5 Schema-driven image references Article.image, Product.image, Person.image consistent
S6 Multi-aspect-ratio schema images 1x1, 4x3, 16x9 variants for primary article images
S7 EXIF strip enforced in upload or build pipeline Verified on a 5-image sample
S8 Image license metadata present where relevant ImageObject license + acquireLicensePage for licensed assets
S9 Core Web Vitals LCP under 2.5s on key pages Verified via PageSpeed Insights or real-user metrics
S10 Featured image present on every article Editorial enforcement, no article ships without featured image

Site-wide score: 10. World-class: 9+/10.

10.3 First 90 Days Subset (5 items)

When starting a new client engagement, audit these five before going deeper. They unblock the rest.

# Triage item Why first
F1 Alt text coverage on top 50 pages by traffic Biggest accessibility-SEO overlap, often missing
F2 LCP image lazy-loading bug Single most common Core Web Vitals failure
F3 OG and Twitter Card images on every published URL Drives social CTR and AI engine image picks
F4 Image sitemap presence and validity Indexing coverage prerequisite
F5 EXIF strip on user-uploaded content Privacy risk, often overlooked

If all five pass, proceed to the per-image rubric. If any fail, fix before deeper audit.

10.4 Sampling Strategy

For sites with 100+ image-bearing pages, sample as follows:

audit_sampling:
  high_traffic_pages: top 20 by GSC clicks (last 90d)
  product_pages_or_money_pages: random sample of 15
  blog_articles: random sample of 15
  homepage_and_top_nav_pages: full coverage
  decorative_image_spot_check: 5 randomly chosen

Score each sample. Compute per-image rubric average across the sample.


11. Cross-Stack Implementation

The code samples in this framework are HTML-first. For each pattern, equivalent implementations exist for major frameworks. The canonical reference is framework-cross-stack-implementation.md. Quick pointers:

For SPA-only stacks (Create React App, plain Vue without SSR), build-time image processing is harder. The image pipeline must run as part of the bundler (Webpack/Vite/esbuild loaders). See framework-react.md.

For Tailwind purge concerns and dark-mode CLS impact on images, see framework-tailwind.md.


12. Common Failure Modes

A short catalog of the failures observed across 50+ client audits (ThatDeveloperGuy internal audit corpus, sample N = 50, captured 2024-2026):

1. The LCP lazy-load bug. Every site, every client, at some point. A theme update or a developer copy-pastes loading="lazy" onto every <img>. The hero image now lazy-loads, LCP jumps from 1.8s to 4.2s, Core Web Vitals turn red. Fix: identify the LCP element, switch to loading="eager" and fetchpriority="high", preload it.

2. Missing dimensions causing CLS. Theme or CMS strips width and height on output. Browser cannot reserve space, layout shifts when images load. Fix: enforce explicit width and height at the template level.

3. Alt text from the upload filename. WordPress and many CMSes default alt text to the uploaded filename. "DSC_4892.jpg" becomes the alt. Fix: editorial process requires alt text entry at upload time, with a queryable "missing alt" report in the CMS.

4. AVIF served as octet-stream. Nginx default mime types do not include AVIF on many Ubuntu and Debian installs prior to 24.04. Browsers refuse to render. Fix: add to mime.types and reload.

5. Image sitemap referencing 404s. Old images deleted, sitemap not regenerated. Search Console flags coverage errors. Fix: regenerate sitemap on every deploy, validate URLs in the build pipeline.

6. Same image at 5x its display size. A 4000-pixel-wide hero rendered at 800px display, no srcset, no resize. Bandwidth blown, LCP poor. Fix: build-time resizing + srcset.

7. Decorative images with stuffed alt. "ThatDeveloperGuy SEO services Cassville web development hero image" on a decorative background flourish. Spam signal + screen reader pollution. Fix: editorial training + automated audit flagging suspicious alt text patterns.

8. EXIF GPS leaking client home address. Team headshot taken in employee's home, GPS embedded, address discoverable. Fix: strip EXIF at upload.

9. og:image pointing to a relative URL. Facebook and Twitter require absolute URLs. Relative URLs silently fail to surface. Fix: enforce absolute URLs at template level.

10. Article schema with one image when three aspect ratios exist. Schema lists only the 16x9 image. Google's article rich result on mobile prefers 4x3 or 1x1. Result: missing rich snippet. Fix: ship all three aspects in the Article.image array.


End of Framework Document

Document version: 1.1

Companion documents:

Want this framework implemented on your site?

ThatDevPro ships these frameworks as productized services. SDVOSB-certified veteran owned. Cassville, Missouri.

See Engine Optimization service ›