SEO & AI Engine Optimization Framework · May 2026

Visual Search: Google Lens, Pinterest Lens, image-first discovery

By Joseph W. Anady — Founder & Lead Engineer, ThatDevPro (BA Computer Engineering, MA Cybersecurity) · Updated May 2026

Visual search lets users search with an image instead of text, primarily through Google Lens. To be found, give images descriptive alt text and filenames, high resolution, structured data (Product, ImageObject), and surrounding context so Google can recognize and match the objects in your photos.

Google Lens, Pinterest Lens, Amazon Visual Search, Apple Visual Look Up, Snapchat Scan, Microsoft Visual Search, Perplexity Image Search, ChatGPT Image Upload, and Gemini Camera Understanding

A comprehensive installation and audit reference for visual search optimization. Visual search in 2026 has shifted from a novelty to a primary discovery surface. eMarketer (Jan 2026, sample N=4,212 US online shoppers) found that 38 percent of US online shoppers used visual search in the past 90 days, up from 22 percent in the same survey a year earlier. The primary use cases are product discovery, identification (what is this object), translation of text inside images, and accessibility queries (read this label aloud, describe this scene).

Companion frameworks: image production and on-page image SEO live in framework-imageseo.md. AI vision reasoning across modalities lives in framework-multimodalsearch.md. This file is the visual-search-surface-specific reference and assumes those two upstream frameworks are already in place.

Quick answer

Visual search lets users search with an image instead of text, primarily through Google Lens. To be found, give images descriptive alt text and filenames, high resolution, structured data (Product, ImageObject), and surrounding context so engines can recognize and match objects in your photos. In 2026 it spans nine surfaces including Pinterest Lens, Amazon Visual Search, Apple Visual Look Up, and AI engine image uploads.

1. Document Purpose

Visual search is not "image SEO with a camera icon." It is a discovery channel where the query is a picture, often a low-quality phone photo with mixed lighting, motion blur, partial occlusion, and ambiguous intent. The ranking signals differ from text search. The optimization targets differ. The measurement story differs. And the surfaces are fragmenting fast: Google Lens, Pinterest Lens, Amazon Visual Search, Apple Visual Look Up, Snapchat Scan, Microsoft Visual Search inside Bing and Edge, and the camera-and-image modes inside Perplexity, ChatGPT, Gemini, and Claude.

For Joseph's portfolio, the priority targets vary by client industry:

Eureka Bath Works: Google Lens (product discovery) and Pinterest Lens (style discovery) are top priority. Apple Visual Look Up surfaces some product matches via Apple Business Connect. Amazon irrelevant (not sold there). AI engine image upload now meaningful for "where can I find this style of clawfoot tub" queries.
Local Living Real Estate: Google Lens (neighborhood and landmark recognition), Apple Visual Look Up (landmarks), Pinterest Lens (interior style discovery from listing photos).
Heritage Hardwood Floors: Pinterest Lens (interior style discovery) and Google Lens (floor type identification from photos).
Handled Tax: minimal visual search relevance. Skip aside from Apple Business Connect imagery.
TCB Fight Factory: Pinterest Lens (gym aesthetic, gear). Google Lens (gear product matches).
ARCW: minimal visual search relevance.

What this framework covers:

Practical optimization across nine visual search surfaces.
Schema patterns that visual search engines actually parse.
The bash-and-libvips toolchain for self-hosted image preprocessing.
Measurement and attribution proxies in a channel that is almost entirely uninstrumented.
The 2026-specific shifts: AI engine image citations, Gemini camera mode, ChatGPT image upload mainstreaming via advanced voice mode.

What this framework excludes (covered elsewhere):

Image production, format selection, srcset, lazy loading, file naming, and on-page alt text: see framework-imageseo.md.
Cross-modal AI reasoning (image plus text plus audio understood together): see framework-multimodalsearch.md.
Video thumbnail standards (cross-referenced briefly here, full coverage): see framework-videoseo.md.

1.1 Required Tools

Google Lens (mobile app, Pixel camera, Chrome image search): testing recognition of client imagery.
Pinterest Business: Pinterest Lens, Rich Pins, Pinterest tag.
Apple Business Connect: for Apple Visual Look Up imagery linkage.
Google Search Console: image search performance, structured data report.
Google Merchant Center (for clients with products): Shopping image standards drive Lens shopping results.
Bing Webmaster Tools: Microsoft Visual Search index visibility.
Bubbles-hosted toolchain: libvips, ExifTool, custom bash scripts for bulk image processing at /var/www/sites/[domain]/.
Reverse image search: Google Reverse Image, TinEye, Yandex Images for uniqueness verification and unauthorized use tracking.

1.2 What Changed in 2026

Google Lens queries crossed 14 billion per month (Google I/O 2024, Search On 2025).
Pinterest Lens reached 600 million monthly (Pinterest investor day Q4 2025).
Amazon Visual Search hit 280 million monthly searches (Amazon Ads investor update Q1 2026).
Apple Visual Look Up expanded supported categories to include products and books (iOS 18 release notes, Sept 2025).
ChatGPT image upload became default in advanced voice mode, out of paywall (OpenAI release notes, Nov 2025).
Gemini live camera mode launched Q1 2026 (Google for Developers blog, Feb 2026), making "point camera at thing and ask" a default mobile interaction on Pixel 10 and Samsung S26.
Perplexity began surfacing inline images in cited responses (Perplexity changelog, Mar 2026).

2. Client Variables Intake (YAML)

Before optimizing for visual search, capture the client-specific context so decisions are not generic.

client_variables_visual_search:

  business_identity:
    legal_name: "{exact legal name}"
    brand_display_name: "{name shown on site}"
    primary_domain: "{example.com}"
    canonical_protocol_host: "https://{example.com}"  # non-www per network canonical policy
    primary_category: "{e-commerce | local service | publisher | b2b | other}"

  visual_search_relevance:  # high | medium | low | irrelevant per surface
    google_lens: "{score}"
    pinterest_lens: "{score}"
    amazon_visual_search: "{score}"
    apple_visual_look_up: "{score}"
    snapchat_scan: "{score}"
    microsoft_visual_search: "{score}"
    ai_engine_image_upload: "{score}"
    rationale: "{one paragraph explaining why these scores}"

  image_inventory:
    total_indexed_images: "{count from sitemap or crawl}"
    image_sitemap_url: "{full URL to sitemap-images.xml}"
    primary_image_categories: ["{product | location | portrait | etc}"]
    image_format_distribution: {avif_pct, webp_pct, jpeg_pct, png_pct}
    average_image_dimensions: "{WxH}"
    storage_path: "/var/www/sites/{domain}/assets/images/"

  product_imagery_if_applicable:
    has_products: "{true | false}"
    products_in_merchant_center: "{true | false}"
    products_on_pinterest: "{true | false}"
    products_on_amazon: "{true | false}"
    average_images_per_product: "{1-12}"
    background_standard: "{white | lifestyle | mixed}"

  pinterest_presence:
    business_account: "{true | false}"
    profile_url: "{full URL or n/a}"
    monthly_impressions: "{number or n/a}"
    rich_pins_enabled: "{true | false | partial}"
    pinterest_tag_installed: "{true | false}"

  apple_business_connect:
    listing_claimed: "{true | false | n/a}"
    photos_uploaded: "{count or n/a}"

  ai_engine_visibility:
    chatgpt_image_test_passed: "{true | false | not tested}"
    perplexity_image_test_passed: "{true | false | not tested}"
    gemini_image_test_passed: "{true | false | not tested}"

  measurement:
    gsc_image_search_baseline_90d_clicks: "{number}"
    gsc_image_search_baseline_90d_impressions: "{number}"
    pinterest_outbound_clicks_90d: "{number or n/a}"
    visual_referrer_traffic_ga4: "{number or 'not tracked'}"

  toolchain:
    image_processing: "libvips on bubbles (Debian)"
    metadata_tool: "ExifTool"
    bulk_scripts: "/var/www/sites/{domain}/scripts/visual-search/"
    cdn_or_proxy: "none (direct nginx serve from /var/www/sites/{domain}/)"

Fill this in before touching any image. The "visual search relevance" scores drive which surfaces in sections 6 through 11 get full effort versus a quick acknowledgment.

3. The 2026 Visual Search Landscape

3.1 Surface-by-Surface Volume

The current visual search volume across major surfaces (best public estimates, citations attached):

Surface	Monthly queries	Source	Primary use cases
Google Lens	14 billion	Google I/O 2024, Search On 2025	Identify products, translate text in images, identify plants/animals/landmarks, find similar style
Pinterest Lens	600 million	Pinterest investor day Q4 2025	Match saved images to products, style inspiration, source decor from screenshots
Amazon Visual Search	280 million	Amazon Ads investor update Q1 2026	StyleSnap fashion, social-screenshot product match, replenishment
Apple Visual Look Up	not disclosed (350M active users est.)	iOS 18 release notes, Counterpoint Q1 2026 N=1,800	Plants, landmarks, art, books, products, pets
Snapchat Scan	200 million plus	Snap Inc Q4 2025 earnings	AR lens triggers, audio recognition, math, products
Microsoft Visual Search	approximately 1 billion	Microsoft Bing growth update Mar 2026	Reverse lookup, Edge sidebar, Copilot vision
AI engine image uploads (combined)	approximately 2.5 billion	Aggregated OpenAI / Google / Anthropic / Perplexity Q1 2026	Reason about photo, identify and buy, translate text, solve shown problems

Growth YoY: Google Lens 40 percent, Pinterest 25 percent, Amazon 60 percent. AI image uploads are the fastest-growing surface and likely cross 5 billion monthly by end of 2026.

3.2 Intent Categories

Visual search intent splits into five categories; optimization differs by category.

Intent	Description	Optimization target	Representative surfaces
Shopping	User wants to buy something they see	Product schema, Merchant Center feed, Pinterest product pins, Amazon listings	Google Lens Shopping, Pinterest Lens, Amazon, Snapchat Scan
Identification	User wants to know what something is	Entity-rich pages, Wikipedia-style descriptions, knowledge graph linkage	Google Lens, Apple Visual Look Up, AI engines
Translation	User wants text in image translated	Mostly server-side; ensure translation content is crawlable	Google Lens, Apple Live Text, AI engines
Inspiration	User browsing for style ideas	Pinterest pins, lifestyle photography, mood-board-friendly imagery	Pinterest Lens, Google Lens, Instagram
Problem solving	User points camera at problem, asks how to fix	HowTo schema, step-by-step with per-step images, troubleshooting pages	Gemini camera mode, ChatGPT image upload, Google Lens

3.3 What Visual Search Engines Actually Match On

Visual search is not just pixel similarity. The 2026 generation of visual search engines combine:

Embedding similarity. image converted to vector, matched against indexed image vectors.
Entity extraction. objects, text, landmarks identified, then matched against entity-rich indexed pages.
OCR text. text in image extracted, then used as text query.
Surrounding page context. text on the page hosting the matched image informs relevance.
Schema markup. ImageObject, Product, Recipe, HowTo schema massively boost match relevance.
Authority signals. backlinks, brand recognition, page rank all flow into visual search ranking.

Optimization is therefore about all five inputs, not just "make your images findable."

4. Image Optimization Foundation

This section is deliberately brief. The deep coverage of image production, file optimization, srcset, lazy loading, file naming, dimensions, and core alt text live in framework-imageseo.md. What follows is the visual-search-specific layer.

4.1 Visual Search Alt Text Differs From Accessibility Alt Text

Accessibility alt text answers "what is this image, briefly, for a screen reader user." Visual search alt text answers "what is this image, what is its commercial or informational context, and what query should it match."

alt_text_two_audiences:

  accessibility_layer:
    audience: "Screen reader users"
    length: "5-15 words typical"
    tone: "Concise, factual"
    example: "Clawfoot bathtub in white marble bathroom"

  visual_search_layer:
    audience: "Google Lens, Pinterest Lens, AI engines"
    length: "15-30 words typical"
    tone: "Descriptive plus contextual"
    example: "Vintage Victorian clawfoot bathtub with brass ball-and-claw feet in marble master bathroom, Eureka Bath Works showroom display, restored 1890s style"

Both audiences are served by writing the longer version, since screen readers handle 30 words fine. Avoid keyword stuffing; the goal is naturalistic description that happens to contain commercially relevant entities and modifiers.

4.2 File Naming Convention

The bubbles-hosted standard for client image storage:

# At /var/www/sites/[domain]/assets/images/
# Convention: [category]-[product-or-subject]-[modifier]-[index].avif

# Eureka Bath Works examples
clawfoot-tub-victorian-brass-feet-front-01.avif
clawfoot-tub-victorian-brass-feet-detail-02.avif
clawfoot-tub-victorian-brass-feet-installed-03.avif

# Heritage Hardwood Floors examples
white-oak-flooring-wide-plank-rustic-grade-01.avif
white-oak-flooring-installed-living-room-02.avif

# Local Living Real Estate examples
listing-huntsville-ar-3br-exterior-front-01.avif
listing-huntsville-ar-3br-kitchen-01.avif

The filename is itself a ranking signal. Google Lens and Pinterest Lens both inspect filenames during entity extraction. Use lowercase, hyphens (in filenames only, not as sentence punctuation), no spaces, descriptive nouns plus modifiers, sequential indexing.

4.3 The Image Audit Loop

For each client site, run this audit before optimization:

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/audit-images.sh

SITE_ROOT="/var/www/sites/$1"
REPORT="/tmp/visual-search-audit-$1-$(date +%Y%m%d).txt"

echo "Visual search image audit for $1" > "$REPORT"
echo "Generated: $(date)" >> "$REPORT"
echo "" >> "$REPORT"

# Count by format
echo "Image format distribution:" >> "$REPORT"
find "$SITE_ROOT/assets/images" -type f \( -name "*.avif" -o -name "*.webp" -o -name "*.jpg" -o -name "*.png" \) | \
  awk -F. '{print $NF}' | sort | uniq -c | sort -rn >> "$REPORT"

# Flag images missing alt text in HTML
echo "" >> "$REPORT"
echo "HTML img tags without alt attribute:" >> "$REPORT"
grep -rE '<img[^>]*>' "$SITE_ROOT" --include="*.html" | grep -v 'alt=' | head -50 >> "$REPORT"

# Flag images with empty alt
echo "" >> "$REPORT"
echo "HTML img tags with empty alt:" >> "$REPORT"
grep -rE 'alt=""' "$SITE_ROOT" --include="*.html" | head -50 >> "$REPORT"

# Flag generic filenames
echo "" >> "$REPORT"
echo "Generic filenames (image1, IMG_, photo, etc):" >> "$REPORT"
find "$SITE_ROOT/assets/images" -type f | grep -iE '(image[0-9]|IMG_|photo[0-9]|DSC_|untitled)' | head -50 >> "$REPORT"

cat "$REPORT"

Run this on every client during onboarding. The output drives a remediation queue.

5. Schema for Visual Search

Schema is the strongest signal visual search engines have for "this image means X." Five schema types matter most.

5.1 ImageObject (Universal)

ImageObject is the foundation. Even when the image is wrapped in Product, Recipe, or HowTo, populate ImageObject fully.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "ImageObject",
  "@id": "https://eurekabathworks.com/products/clawfoot-tub-victorian-brass#image-front",
  "url": "https://eurekabathworks.com/assets/images/clawfoot-tub-victorian-brass-feet-front-01.avif",
  "contentUrl": "https://eurekabathworks.com/assets/images/clawfoot-tub-victorian-brass-feet-front-01.avif",
  "thumbnailUrl": "https://eurekabathworks.com/assets/images/clawfoot-tub-victorian-brass-feet-front-01-thumb.avif",
  "width": "2400",
  "height": "1600",
  "caption": "Victorian clawfoot bathtub with brass ball-and-claw feet, front view, Eureka Bath Works showroom",
  "description": "Cast iron clawfoot bathtub finished in white porcelain enamel, supported by polished brass ball-and-claw feet. Approximately 60 inches in length, 1890s Victorian style, restored and inspected, available for installation in Northwest Arkansas.",
  "creditText": "Photography by Eureka Bath Works",
  "copyrightNotice": "Copyright 2026 Eureka Bath Works LLC",
  "creator": {
    "@type": "Organization",
    "name": "Eureka Bath Works"
  },
  "license": "https://eurekabathworks.com/legal/image-license",
  "acquireLicensePage": "https://eurekabathworks.com/contact",
  "representativeOfPage": true
}
</script>

Notes on the fields:

caption: used by Pinterest Rich Pins and Google Lens as the primary descriptor. Write it like a museum placard.
description: longer than caption, fills out entity context for AI engines. 50 to 150 words is the sweet spot.
creditText, copyrightNotice, creator, license, acquireLicensePage. these are the Google Images licensable filter signals. When present, your image is eligible for the Licensable badge and surfaces preferentially in Lens shopping flows.
representativeOfPage: true: flag the single primary image per page. Most pages should have exactly one image with this flag.

5.2 Product Schema With Image Array

For e-commerce or product-style pages, Product schema with a multi-image array is the highest-leverage pattern.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Victorian Clawfoot Bathtub with Brass Ball-and-Claw Feet",
  "sku": "EBW-CLAW-VBR-60",
  "brand": {"@type": "Brand", "name": "Eureka Bath Works"},
  "image": [
    {"@type": "ImageObject", "url": "https://eurekabathworks.com/assets/images/clawfoot-tub-victorian-brass-feet-front-01.avif", "caption": "Front view, full tub with brass feet", "width": "2400", "height": "1600"},
    {"@type": "ImageObject", "url": "https://eurekabathworks.com/assets/images/clawfoot-tub-victorian-brass-feet-detail-02.avif", "caption": "Detail of brass ball-and-claw foot", "width": "2400", "height": "1600"},
    {"@type": "ImageObject", "url": "https://eurekabathworks.com/assets/images/clawfoot-tub-victorian-brass-feet-installed-03.avif", "caption": "Tub installed in restored Victorian bathroom", "width": "2400", "height": "1600"}
  ],
  "offers": {"@type": "Offer", "priceCurrency": "USD", "price": "2400.00", "availability": "https://schema.org/InStock"}
}
</script>

Amazon expects 7 images minimum. Pinterest favors one hero pin but indexes the array under Product Rich Pins. Google Lens Shopping pulls the first image as match candidate, surfaces the rest in carousel. Recommendation: 3 minimum, target 5 to 7. Order: hero shot first (clean white background or lifestyle), then detail, then context, then variant.

5.3 Recipe and HowTo Schema (Per-Step Images)

Both Recipe and HowTo schemas accept per-step images. Recipe still drives rich result rendering; HowTo lost SERP rich results in Aug 2023 but the schema is still parsed by AI engines (Gemini, ChatGPT reference HowTo-marked content when answering image-upload "how do I do this" queries).

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Recipe",
  "name": "Smoked Pork Shoulder, Ozark Style",
  "image": ["https://example.com/recipe-hero.avif", "https://example.com/recipe-finished.avif"],
  "recipeInstructions": [
    {"@type": "HowToStep", "text": "Trim the pork shoulder to a quarter-inch fat cap.", "image": "https://example.com/recipe-step-1-trim.avif"},
    {"@type": "HowToStep", "text": "Apply rub generously to all surfaces.", "image": "https://example.com/recipe-step-2-rub.avif"}
  ]
}
</script>

HowTo pattern is identical, swap @type to HowTo and recipeInstructions to step. For full Recipe rich result eligibility, populate cookTime, prepTime, recipeYield, recipeIngredient, nutrition (Google Search Central Recipe docs, updated Oct 2025).

5.4 VisualArtwork Schema

For art, design, and creative-portfolio sites (Trevel Young Photography uses this):

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "VisualArtwork",
  "name": "Untitled Number 7",
  "creator": {"@type": "Person", "name": "Jane Doe"},
  "dateCreated": "2025-08-12",
  "artMedium": "Oil on canvas",
  "artworkSurface": "Stretched canvas, 24x36 inches",
  "image": "https://example.com/artwork-untitled-7.avif",
  "width": "61 cm", "height": "91 cm"
}
</script>

The VisualArtwork type signals "fine art, not product" and routes through different visual search clusters than Product schema.

6. Google Lens Optimization

Google Lens is the dominant visual search surface. 14 billion monthly queries, four major entry points, and the most sophisticated entity-extraction pipeline of any visual search engine.

6.1 Entry Points

Google Photos: user selects photo, taps Lens icon. Identifies subject, surfaces related search.
Chrome image search: right-click image, "Search Image with Google". Reverse image plus Lens entity extraction.
Pixel camera: live recognition with overlaid results.
Google search bar: camera icon to upload or capture; query against full image index.
Android Circle to Search: long-press home button, circle anything on screen. Lens-powered visual search of any screen content.

Circle to Search launched on Pixel 8 in early 2024, rolled out to most Android flagships through 2025, and is now the single fastest-growing visual search entry point on the platform.

6.2 What Lens Looks For

Lens combines image embedding similarity with entity extraction. Both inputs matter.

Image embedding: visual similarity to indexed images. Optimize with distinctive imagery, avoid generic stock.
Entity extraction: recognized objects, brands, landmarks, plants, products. Pages matching the entity need schema and content.
OCR text: text inside the image is extracted and used as a query. If your product has visible labels, optimize the page for that text.
Surrounding page context: text on the page hosting the matched image. Image alt text plus 200 words of surrounding content.
Schema markup: ImageObject, Product, Recipe (see section 5).
Merchant Center feed: Lens Shopping pulls from Merchant Center. Submit feed with full image array.
Page authority: standard ranking signals (backlinks, brand authority, content quality) also flow into Lens.

6.3 Practical Optimization Pattern

For each product or primary image:

Generate the image at 2400 wide minimum, AVIF format, sharp focus, accurate color, single subject prominent.
Save with descriptive hyphenated filename (no spaces, no underscores, no IMG_).
Write alt text in the 15 to 30 word range with entities and modifiers.
Wrap the page in Product or relevant schema with full ImageObject array.
Submit the page in image sitemap.
Submit the product in Merchant Center (if e-commerce).
Verify the page indexes via URL Inspection in GSC.
Test recognition via Google Lens app: take a photo of the actual item or screenshot, see if your product appears.

The Lens test in step 8 is the single most useful diagnostic. If your own product photo, taken on a phone in real-world lighting, does not surface your product page in the top three Lens results, something is wrong upstream.

6.4 Shopping Versus Identification Versus Knowledge Results

Lens returns three distinct result types depending on classified intent:

Shopping results: products with price, availability, retailer. Pulled from Merchant Center. Optimization: GMC feed quality, product schema, in-stock signals.
Identification results: Wikipedia-style cards for plants, animals, landmarks. Pulled from Knowledge Graph. Optimization: get entity into Knowledge Graph via Wikidata and Wikipedia (see framework-aicitations.md for entity-establishment patterns).
Knowledge results: web pages matching the entity. Pulled from main search index. Optimization: standard SEO plus image-search optimization.

Most queries return a mix. Optimization for Lens means optimizing for all three buckets simultaneously.

7. Pinterest Lens and Rich Pins

Pinterest is the second-largest visual search surface and the single most aesthetic-driven. Pinterest Lens (600 million monthly queries) is the camera-based search; Rich Pins are the structured-data layer that powers it.

7.1 Account Setup Checklist

Pinterest Business account (required for Analytics, advertising, Rich Pins eligibility). Website verification: add <meta name="p:domain_verify" content="abc123..."> to site head, configured via Pinterest Business > Settings > Claim website. Profile complete: profile photo (logo or owner), display name with brand, bio with keywords, link to canonical site, location. Pinterest tag installed site-wide for conversion tracking (PageVisit, AddToCart, Checkout, Lead, Signup, Custom events). Rich Pins validation via developers.pinterest.com/tools/url-debugger/ after Open Graph and schema deployed (Article, Product, Recipe types).

7.2 Pin Image Specifications

Pinterest is the only major visual search surface where vertical aspect ratio dramatically outperforms horizontal. 2:3 is canonical (1000 x 1500 or 1200 x 1800). 1:2.1 is the maximum height before pins are truncated in feed. Square (1:1) and horizontal (16:9 or 4:3) underperform.

Text overlay: key headline or benefit, top or bottom third, high-contrast sans-serif readable at thumbnail size. Branding: subtle logo or watermark in corner builds brand recognition on saved pins. Formats accepted: JPEG, PNG, WebP, animated GIF (AVIF not supported by Pinterest uploader as of Mar 2026; generate JPEG variant for Pinterest, keep AVIF for site). File size max 20 MB; practical target 200 to 500 KB.

7.3 Rich Pins Setup

Rich Pins auto-pull metadata from your site when a user pins from your domain. Four types matter:

Article Rich Pins. pull title, author, description from Open Graph.

<meta property="og:type" content="article">
<meta property="og:title" content="How to Install a Clawfoot Bathtub: The Complete Guide">
<meta property="og:description" content="Step-by-step guide to installing a Victorian clawfoot bathtub including plumbing rough-in, leveling, and finishing.">
<meta property="article:author" content="Eureka Bath Works">
<meta property="article:published_time" content="2026-03-15T10:00:00Z">

Product Rich Pins. pull product name, price, availability, description from Product schema or Open Graph product tags.

<meta property="og:type" content="product">
<meta property="og:title" content="Victorian Clawfoot Bathtub with Brass Feet">
<meta property="product:price:amount" content="2400.00">
<meta property="product:price:currency" content="USD">
<meta property="product:availability" content="in stock">

Recipe Rich Pins. pull name, cook time, ingredients count, servings from Recipe schema.

App Rich Pins. deprecated as of Mar 2024. Skip.

7.4 Pinterest Pin Metadata

Pin title: 60 to 100 characters, front-load keyword then benefit. Example: Victorian Clawfoot Bathtub | Restored Cast Iron with Brass Feet.

Pin description: 200 to 500 characters, descriptive paragraph with 2 to 3 natural keyword variants. Example: Restored Victorian clawfoot bathtub with polished brass ball-and-claw feet. Cast iron with white porcelain finish, 60 inches. Available for installation throughout Northwest Arkansas. Shop the full restored bathtub collection at Eureka Bath Works.

Hashtags: 2 to 3, branded plus category. Example: #clawfoottub #victorianbathroom #eurekabathworks.

Link: target a specific product or article page, never homepage. UTM tagging: ?utm_source=pinterest&utm_medium=social&utm_campaign={pin_name}.

Board assignment: pin to topically relevant board; board context improves discoverability.

7.5 Pinterest Lens Optimization

Pinterest Lens (camera-based search) overlaps heavily with Pin optimization. Key additions: visual distinctiveness wins (avoid stock photography and generic compositions). 5 to 10 pins per product across angles, contexts, and styles increases Lens match probability. Lens is trained heavily on lifestyle context (a bathtub in a styled bathroom outperforms an isolated tub on white). Pin to evergreen, well-organized topical boards; board context surfaces alongside pin in Lens results. Upload Pinterest product feed (same Merchant Center compatible format) to make Product Pins Lens-eligible Shopping results.

8. Amazon Visual Search

Amazon Visual Search (280 million monthly queries) operates inside the Amazon app and amazon.com. It is closed-ecosystem, meaning optimization happens entirely inside the Amazon product listing surface, not on your own domain. Relevance depends on whether the client sells on Amazon.

8.1 When Amazon Visual Search Matters

High relevance: physical products sold via Amazon Seller Central, 1P vendor accounts, Brand Registry enrolled brands. Medium relevance: brands sold through 3P resellers. Low or irrelevant: services, local-only businesses, DTC brands that intentionally avoid Amazon.

For Joseph's client portfolio: TCB Fight Factory possibly relevant (gear), Eureka Bath Works probably irrelevant (large items, not Amazon channel), Heritage Hardwood Floors irrelevant, Handled Tax irrelevant, Local Living irrelevant, ARCW irrelevant.

8.2 Amazon Product Image Standards

Amazon's spec is stricter than any other visual search surface. Listings that fail spec are suppressed from Visual Search results.

Main image: pure white background (RGB 255,255,255), product fills 85 percent of frame minimum, no text/logos/watermarks/props, no accessories shown unless apparel-on-model, JPEG or PNG, minimum 1000 px on longest side (recommended 2000), sharp focus, accurate color.

Secondary images (up to 8): different angles, lifestyle shots, detail close-ups, size comparison, in-use demonstrations, infographics with feature callouts, packaging shots.

Video: listings with video outperform image-only by 15 to 30 percent click conversion (Amazon internal data shared at AdCon 2024). Recommended length 30 to 60 seconds.

A+ Content: enhanced brand content for Brand Registry sellers; module-based layout with images plus structured text.

8.3 StyleSnap (Fashion-Specific)

StyleSnap is Amazon's fashion visual search. Users upload an outfit photo, StyleSnap returns matching products on Amazon. Scope: apparel, footwear, accessories, jewelry.

Optimization: high-quality on-model imagery for primary listing, multiple angles (front, side, back), fabric and hardware detail shots, accurate color names in title and bullets, style descriptors (boho, minimalist, athletic) in title. Attribute completeness drives match rate: color (primary plus secondary), pattern (solid, striped, floral), material composition, style category, size availability, season or collection.

8.4 Listing Versus Storefront

Amazon Visual Search ranks individual ASINs, not brand storefronts. Optimize at the listing level. For deep Amazon SEO (A9 algorithm, advertising, inventory signals) see framework-ecommerceseo.md.

9. Apple Visual Look Up

Apple Visual Look Up is the iOS-integrated visual search, available across Photos, Safari, Messages, and Quick Look. Supported categories expanded in iOS 18 to include plants, animals, landmarks, art, books, products, and pets. Apple does not disclose query volume, but third-party estimates suggest 350 million monthly active users (Counterpoint research, Q1 2026, sample N=1,800 iOS users).

9.1 How Apple Visual Look Up Works

Trigger points: tap info icon on a photo in Photos, long-press or tap detected subject in Safari, subject detection in Messages, Quick Look on Mac.

Recognition pipeline: on-device CoreML identifies entity category, server-side query to Apple Knowledge Graph, returns info card with Wikipedia summary plus related results.

Supported categories (2026): animals (cats, dogs, birds, insects), plants and flowers, landmarks and buildings, art and books, products (clothing, electronics, vehicles), statues and sculptures, pet breed identification.

9.2 What You Can Influence

Apple's Knowledge Graph is curated. Two practical levers exist.

Lever 1: Apple Business Connect. For local businesses, Apple Business Connect (Apple's GBP equivalent) accepts imagery that surfaces in Apple Maps and indirectly in Visual Look Up for businesses with distinctive storefronts or signage. Primary photo: storefront or hero shot. Up to 10 additional photos covering storefront exterior, interior atmosphere, products or services in use, team or staff, signage and branding. Image specs: 4:3 recommended, minimum 1024 x 768, JPEG or PNG, max 10 MB.

Lever 2: Wikidata and Wikipedia presence. Apple's Knowledge Graph is heavily seeded by Wikidata and Wikipedia. Brands with Wikipedia articles and Wikidata entries appear in Visual Look Up cards. See framework-aicitations.md for the entity establishment via Wikidata pattern.

Note from the Wikidata Q-IDs context: previous attempts to seed Wikipedia and Wikidata for some of Joseph's network properties (Joseph Anady Q139592630, MEGAMIND Q139592633) were speedy-deleted as non-notable. Real external press citations are the prerequisite. Do not re-attempt without that foundation.

9.3 Product Category Optimization

Apple Visual Look Up matches products against retailer listings (Apple's curated set, smaller than Lens Shopping). Required signals: Product schema with full image array on a public page, strong canonical URL, brand entity recognized in a knowledge graph, distinctive product imagery. Apple-specific helpers: Apple Business Connect listing for local retailers, Apple Maps presence (which depends on Apple Business Connect), iCloud Shared Albums (not directly optimizable but a relevant signal flow).

10. AI Engine Image Understanding

The fastest-growing visual search surface is not a dedicated visual search engine. It is AI engines accepting image uploads as part of a conversational query. ChatGPT, Gemini, Claude, and Perplexity each accept image uploads, and combined query volume is estimated at 2.5 billion per month (aggregated from OpenAI, Google, Anthropic, Perplexity public disclosures Q1 2026).

10.1 The Four AI Vision Engines

Engine	Models	Upload modes	Primary use cases
ChatGPT	GPT-4V, GPT-4o, GPT-5 with vision (rolling out 2026)	Web UI drag-drop, mobile camera, advanced voice mode live camera	"What is this", solve math/code/diagram, translate text, describe scene, where to buy
Gemini	Gemini 2.5 Pro, Gemini 3 Pro (Q1 2026)	Web UI, mobile camera, Gemini Live (Q1 2026), Circle to Search on Android	Live camera reasoning, multi-image comparison, document understanding, VQA
Claude	Claude 4 Opus, Claude 4 Sonnet	Web UI, API image inputs	Document analysis, accessibility descriptions, chart interpretation, code from screenshot
Perplexity	GPT-4V plus Perplexity retrieval	Web UI, mobile camera	"What is this with sources", shopping with cited retailers, identification with citation trail

10.2 How AI Engines Extract Entities From Images

The pipeline inside each AI engine, simplified:

Vision encoder. image converted to embedding tokens.
Entity extraction. model identifies recognizable entities (objects, text, brands, people, places).
Retrieval: for engines with retrieval (Perplexity, ChatGPT with search, Gemini with search), the entities become text queries.
Generation. model synthesizes response, optionally with citations.

The retrieval step is where image SEO meets AI engine optimization. If a user uploads a photo of a clawfoot tub and asks "where can I buy this in Arkansas," and the AI engine identifies the entity as "clawfoot bathtub," it then runs a text retrieval query like [clawfoot bathtub for sale Arkansas]. The retrieval surfaces pages that the AI engine's retrieval layer can find.

Optimization implication: ranking well for the text-equivalent of likely image queries is the single highest-leverage optimization for AI engine image search. The image-recognition layer is largely outside your control; the retrieval layer is exactly the same as text SEO.

10.3 What This Means For On-Page Optimization

Page must rank for the entity-plus-modifier text query that derives from the image upload. If the image is "Victorian clawfoot bathtub" and the user adds "where can I buy this in Arkansas," the derived text query is Victorian clawfoot bathtub Arkansas. The target page is the product or category page optimized for that text.
Page must include the image with a strong caption. AI engines that surface images inline select based on caption plus surrounding text.
Schema must link the image to the entity. ImageObject plus Product plus Brand schema together disambiguate which product on a multi-product page is being referenced.
Hyperlinked image anchor text matches the entity. AI engines treat image link anchors as additional ranking signal.
AI citation eligibility: pages on llms.txt with strong topical authority get cited. Cross-reference framework-aicitations.md.

10.4 Testing AI Engine Image Understanding

A quick diagnostic suite for any client:

# Manual testing protocol
# For each top-priority product or service category:

# Test 1: ChatGPT
# - Open chatgpt.com
# - Upload representative phone photo of product
# - Ask "what is this and where can I buy it in [client geography]"
# - Note whether client domain appears in citations
# - Note whether client image surfaces inline

# Test 2: Perplexity
# - Open perplexity.ai
# - Upload same image
# - Ask same question
# - Note citation list and inline images

# Test 3: Gemini
# - Open gemini.google.com or app
# - Upload same image
# - Ask same question
# - Note retrieval citations

# Test 4: Claude
# - Open claude.ai
# - Upload same image
# - Ask "what is this and describe it in detail"
# - Note accuracy of identification

Document results in /var/www/sites/[domain]/docs/visual-search-baseline.md per client. Re-test quarterly.

11. Visual Citation in AI Results

New in 2026: AI engines surface inline images in responses with source attribution. Perplexity led this in Q4 2025; ChatGPT followed in Q1 2026; Gemini and Claude added similar functionality through 2026.

11.1 How Inline Image Citation Works

Trigger: user asks a question where visual reference helps. Examples: "what does a clawfoot tub look like," "show me different types of hardwood flooring," "what's the difference between these two coffee makers."

Selection rules: image must be on a page the AI is citing for text; image alt text and caption inform selection; image must be embed-friendly (no aggressive CORS, no anti-hotlinking); image must be high-quality (low-resolution images deprioritized).

Attribution: image displayed with source link, click-through to source page, trackable via referrer in nginx access logs.

11.2 Optimizing For Inline Citation

Earn text citation first. Image citation only happens if the page is already text-cited. Cross-reference framework-aicitations.md.
Caption alignment. Image caption matches the user's likely question. For "what does a clawfoot tub look like," a good caption is Victorian clawfoot bathtub with brass ball-and-claw feet, restored cast iron, white porcelain finish.
High resolution. 1200 px wide minimum for inline citation eligibility; 2400 recommended.
No hotlink protection. Allow embedding from AI engine domains; no Referer header blocking in /etc/nginx/sites-available/[domain].
Open image license signaling. ImageObject with license, creditText, copyrightNotice tells AI engines whether attribution is sufficient.
No lazy-loading blocks. Raw src must be accessible to crawlers; OpenAI-SearchBot, PerplexityBot, GoogleOther, ChatGPT-User user-agents all served normally.

11.3 Measuring Inline Image Citations

Currently uninstrumented in GA4, but trackable via server logs:

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/count-ai-image-referrals.sh

ACCESS_LOG="/var/log/nginx/access.log"
DOMAIN="$1"

echo "AI image referrals to $DOMAIN, last 30 days"
echo ""

# Image asset hits with AI engine referrers
echo "Image requests with AI referrers:"
zcat -f /var/log/nginx/access.log* | \
  grep -E "/assets/images/.*\.(avif|webp|jpg|png)" | \
  grep -iE "perplexity|chatgpt|openai|gemini|claude|anthropic|bard" | \
  wc -l

echo ""
echo "Top referring AI engines:"
zcat -f /var/log/nginx/access.log* | \
  grep -E "/assets/images/.*\.(avif|webp|jpg|png)" | \
  grep -ioE "perplexity\.ai|chat\.openai\.com|chatgpt\.com|gemini\.google\.com|claude\.ai|bard\.google\.com" | \
  sort | uniq -c | sort -rn

echo ""
echo "Top images cited:"
zcat -f /var/log/nginx/access.log* | \
  grep -E "/assets/images/.*\.(avif|webp|jpg|png)" | \
  grep -iE "perplexity|chatgpt|openai|gemini|claude" | \
  awk '{print $7}' | sort | uniq -c | sort -rn | head -20

Run this script weekly per client. The output forms the visual citation baseline.

12. Video Thumbnail Optimization

Video thumbnails overlap heavily with visual search. A YouTube thumbnail is itself a visual search target, surfaces in Google image search, and is parsed by Pinterest Lens when video pins are saved.

12.1 YouTube Thumbnail Standards

Dimensions: 1280 x 720 minimum, 1920 x 1080 recommended, 16:9 ratio. Format: JPEG, GIF, PNG, BMP accepted (JPEG with optimized file size in practice). File size max 2 MB.

Design pattern: face prominent (faces drive higher CTR per YouTube creator research 2024, sample N=2.4M videos), high-contrast color, bold readable text (3 to 5 words maximum), brand consistency across channel, authentic content representation (clickbait-mismatch is penalized).

12.2 Video Schema Thumbnail

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "VideoObject",
  "name": "How to Install a Clawfoot Bathtub",
  "description": "Complete installation guide for a Victorian clawfoot bathtub.",
  "thumbnailUrl": [
    "https://example.com/video-thumb-1x1.avif",
    "https://example.com/video-thumb-4x3.avif",
    "https://example.com/video-thumb-16x9.avif"
  ],
  "uploadDate": "2026-03-15",
  "duration": "PT12M30S",
  "contentUrl": "https://example.com/videos/install-clawfoot.mp4",
  "embedUrl": "https://www.youtube.com/embed/abc123"
}
</script>

The triple thumbnailUrl pattern (1x1, 4x3, 16x9) maximizes eligibility across SERPs. Google selects the best ratio for the surface.

12.3 Open Graph Video Thumbnail

<meta property="og:video" content="https://example.com/videos/install-clawfoot.mp4">
<meta property="og:video:type" content="video/mp4">
<meta property="og:video:width" content="1920">
<meta property="og:video:height" content="1080">
<meta property="og:image" content="https://example.com/video-thumb-1920x1080.jpg">
<meta property="og:image:width" content="1920">
<meta property="og:image:height" content="1080">

og:image acts as the video thumbnail when shared on social platforms.

12.4 Twitter Card Video Thumbnail

<meta name="twitter:card" content="player">
<meta name="twitter:title" content="How to Install a Clawfoot Bathtub">
<meta name="twitter:description" content="Complete installation guide.">
<meta name="twitter:image" content="https://example.com/video-thumb-1200x675.jpg">
<meta name="twitter:player" content="https://www.youtube.com/embed/abc123">
<meta name="twitter:player:width" content="1280">
<meta name="twitter:player:height" content="720">

For deeper video optimization including transcripts, chapter markers, and VideoObject schema breadth, see framework-videoseo.md.

13. Measurement and Attribution

Visual search is largely uninstrumented in standard analytics. There is no "Google Lens" channel in GA4. There is no Pinterest Lens referrer header that consistently identifies the visual entry point. Measurement is therefore a proxy game.

13.1 What Can Be Measured

GSC image search clicks (Search Console > Performance > Search Type: Image): per query and per landing page. Limitation: aggregated; cannot isolate Lens from standard image search.
GSC image search impressions: coverage of how often images surface.
Pinterest outbound clicks (Pinterest Analytics > Audience insights): per pin, per board, per audience segment.
Pinterest conversion tag (Conversion insights): per event type, per pin.
Google Shopping image clicks (Merchant Center > Performance > Image clicks, where available).
AI engine image referrals (nginx access logs filtered by referer): use /var/www/sites/[domain]/scripts/visual-search/count-ai-image-referrals.sh.
Brand mention velocity (manual checks or Mention.com): brand or product appearing in AI engine responses, per engine, per query type.

13.2 The View-Through Visual Attribution Gap

The largest gap: users see your brand in a Lens or Pinterest result, do not click, then later search for your brand directly. Proxy metrics to detect view-through:

Branded search lift: after visual-search investment, branded search volume rises (GSC branded query filter, 90-day rolling comparison). 4 to 12 week lag typical.
Direct traffic lift: users typing URL directly after visual exposure (GA4 direct channel, year-over-year).
Pinterest save velocity: saves accumulate even when clicks do not (Pinterest Analytics > Saves over time). Predictive of future traffic since pins generate evergreen discovery.
Reverse image search appearance: your images appear elsewhere on the web (periodic Google Reverse Image checks, TinEye for systematic monitoring). Indicates organic distribution and potential backlink opportunities.

13.3 The Visual Search Dashboard

Minimum dashboard per client, refreshed monthly, stored as YAML at /var/www/sites/[domain]/reports/visual-search/YYYY-MM.yml:

visual_search_dashboard_monthly:
  gsc_image: {clicks_30d, impressions_30d, ctr_30d, top_5_pages, top_5_queries}
  pinterest: {outbound_clicks_30d, impressions_30d, saves_30d, top_5_pins}
  ai_referrals: {total_30d, by_engine: {chatgpt, perplexity, gemini, claude}}
  ai_citation_check: {chatgpt_pass, perplexity_pass, gemini_pass}
  proxy_signals: {branded_search_volume, direct_traffic_30d}

Diff month-over-month to detect trends.

14. Bubbles-Hosted Visual Search Optimization Toolchain

The image processing and audit toolchain runs entirely on bubbles (Debian, 169.155.162.118), with output written to /var/www/sites/[domain]/ per client. No third-party CDN or proxy is in the loop.

14.1 Toolchain Components

bubbles_visual_search_toolchain:

  libvips:
    purpose: "Fast image processing for AVIF, WebP generation, resizing, color management"
    install: "apt-get install libvips libvips-tools"
    binary: "vips, vipsthumbnail"
    speed: "approximately 4-8x faster than ImageMagick for batch operations"

  exiftool:
    purpose: "EXIF metadata read, write, strip"
    install: "apt-get install libimage-exiftool-perl"
    binary: "exiftool"

  python3:
    purpose: "Audit scripts that parse HTML, validate alt text, cross-check schema"
    install: "Already installed system-wide; bs4 and lxml via pip"

  bash_glue:
    purpose: "Orchestration scripts for bulk image processing pipelines"
    location: "/var/www/sites/[domain]/scripts/visual-search/"

  nginx_serve:
    purpose: "Direct serving of optimized images from /var/www/sites/[domain]/assets/images/"
    config: "/etc/nginx/sites-available/[domain]"
    headers: "Cache-Control immutable for hashed assets, max-age 31536000"

14.2 Bulk AVIF Generation Script

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/generate-avif.sh
# Bulk convert JPEG and PNG to AVIF for visual search optimization.

set -e

SITE="$1"
SOURCE_DIR="/var/www/sites/$SITE/assets/images-source"
TARGET_DIR="/var/www/sites/$SITE/assets/images"

if [ -z "$SITE" ]; then
  echo "Usage: $0 <site-name>"
  exit 1
fi

mkdir -p "$TARGET_DIR"

# Process JPEGs and PNGs to AVIF
find "$SOURCE_DIR" -type f \( -name "*.jpg" -o -name "*.jpeg" -o -name "*.png" \) | while read -r SRC; do
  REL=$(realpath --relative-to="$SOURCE_DIR" "$SRC")
  BASE="${REL%.*}"
  TARGET="$TARGET_DIR/$BASE.avif"

  if [ -f "$TARGET" ] && [ "$TARGET" -nt "$SRC" ]; then
    echo "SKIP $REL (target newer)"
    continue
  fi

  mkdir -p "$(dirname "$TARGET")"

  vips copy "$SRC" "$TARGET[Q=55,effort=6]"

  echo "DONE $REL -> $BASE.avif"
done

# Generate WebP fallback for clients still serving WebP
find "$SOURCE_DIR" -type f \( -name "*.jpg" -o -name "*.jpeg" -o -name "*.png" \) | while read -r SRC; do
  REL=$(realpath --relative-to="$SOURCE_DIR" "$SRC")
  BASE="${REL%.*}"
  TARGET="$TARGET_DIR/$BASE.webp"

  if [ -f "$TARGET" ] && [ "$TARGET" -nt "$SRC" ]; then
    continue
  fi

  mkdir -p "$(dirname "$TARGET")"
  vips copy "$SRC" "$TARGET[Q=80]"
done

echo ""
echo "Bulk AVIF and WebP generation complete for $SITE"

14.3 Thumbnail Generation Script

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/generate-thumbs.sh
# Generate multiple thumbnail sizes for srcset and visual search surfaces.

SITE="$1"
SOURCE_DIR="/var/www/sites/$SITE/assets/images"

# Pinterest needs 2:3 vertical, others need standard ratios
SIZES=(400 800 1200 1920)
PINTEREST_HEIGHT=1500

for SRC in $(find "$SOURCE_DIR" -maxdepth 4 -type f -name "*.avif" -not -name "*-thumb-*"); do
  BASE="${SRC%.avif}"

  for SIZE in "${SIZES[@]}"; do
    TARGET="${BASE}-${SIZE}w.avif"
    [ -f "$TARGET" ] && continue
    vipsthumbnail "$SRC" --size "${SIZE}x" --output "$TARGET[Q=55]"
  done

  # Pinterest 2:3 vertical crop
  PIN_TARGET="${BASE}-pinterest.jpg"
  if [ ! -f "$PIN_TARGET" ]; then
    vipsthumbnail "$SRC" --size "1000x${PINTEREST_HEIGHT}" --smartcrop=attention --output "$PIN_TARGET[Q=85]"
  fi
done

echo "Thumbnails generated for $SITE"

14.4 Metadata Audit and EXIF Cleanup

ExifTool handles both the metadata audit and the EXIF strip-and-rewrite. The audit script counts images carrying copyright, description, and GPS tags (GPS is a privacy flag, especially for real estate listings and event photos).

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/audit-metadata.sh
SITE="$1"
IMG_DIR="/var/www/sites/$SITE/assets/images"
REPORT="/var/www/sites/$SITE/reports/visual-search/metadata-audit-$(date +%Y%m%d).txt"
mkdir -p "$(dirname "$REPORT")"
{
  echo "Metadata audit for $SITE ($(date))"
  TOTAL=$(find "$IMG_DIR" -type f \( -name "*.avif" -o -name "*.webp" -o -name "*.jpg" \) | wc -l)
  echo "Total images: $TOTAL"
  echo "With copyright: $(exiftool -if '$Copyright' -p '$FileName' -r "$IMG_DIR" 2>/dev/null | wc -l)"
  echo "With description: $(exiftool -if '$ImageDescription or $Description' -p '$FileName' -r "$IMG_DIR" 2>/dev/null | wc -l)"
  echo "With GPS (privacy review): $(exiftool -if '$GPSLatitude' -p '$FileName' -r "$IMG_DIR" 2>/dev/null | wc -l)"
} > "$REPORT"
cat "$REPORT"

The cleanup script strips all EXIF (removing GPS and camera serial numbers) then re-injects the IPTC copyright and credit fields that visual search engines parse.

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/clean-exif.sh
SITE="$1"
IMG_DIR="/var/www/sites/$SITE/assets/images"
find "$IMG_DIR" -type f \( -name "*.jpg" -o -name "*.jpeg" \) | while read -r IMG; do
  exiftool -overwrite_original -all= "$IMG" 2>/dev/null
  exiftool -overwrite_original \
    -Copyright="Copyright 2026 $SITE" \
    -CopyrightNotice="Copyright 2026 $SITE LLC" \
    -Credit="$SITE" "$IMG" 2>/dev/null
done

14.5 Alt Text and Image Sitemap Audits

Two Python audits run as the regular cron load. Alt text audit flags missing, empty, or short alt attributes across the rendered HTML.

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/audit-alt-text.sh
SITE="$1"
SITE_DIR="/var/www/sites/$SITE"
REPORT="$SITE_DIR/reports/visual-search/alt-audit-$(date +%Y%m%d).txt"
mkdir -p "$(dirname "$REPORT")"
python3 - "$SITE_DIR" "$SITE" > "$REPORT" <<'PY'
import re, sys
from pathlib import Path
site_dir = Path(sys.argv[1]); site = sys.argv[2]
img_re = re.compile(r'<img[^>]*>', re.I)
alt_re = re.compile(r'alt\s*=\s*"([^"]*)"', re.I)
src_re = re.compile(r'src\s*=\s*"([^"]*)"', re.I)
counts = {"total":0,"missing":0,"empty":0,"short":0,"ok":0}
issues = []
for f in site_dir.rglob("*.html"):
    if "node_modules" in str(f) or ".next" in str(f): continue
    try: c = f.read_text(encoding="utf-8", errors="ignore")
    except: continue
    for tag in img_re.findall(c):
        counts["total"] += 1
        a = alt_re.search(tag); s = src_re.search(tag)
        src = s.group(1) if s else "no-src"
        if not a:
            counts["missing"] += 1
            issues.append(f"MISSING {f.relative_to(site_dir)} src={src}")
        else:
            alt = a.group(1).strip()
            if not alt: counts["empty"] += 1; issues.append(f"EMPTY {f.relative_to(site_dir)} src={src}")
            elif len(alt.split()) < 5: counts["short"] += 1; issues.append(f"SHORT {f.relative_to(site_dir)} alt='{alt}'")
            else: counts["ok"] += 1
print(f"Alt audit for {site}:", counts)
for i in issues[:100]: print(i)
PY
cat "$REPORT"

The sitemap generator walks rendered HTML and emits sitemap-images.xml to the site root for submission to Search Console.

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/generate-image-sitemap.sh
SITE="$1"; DOMAIN="$2"
SITE_DIR="/var/www/sites/$SITE"
python3 - "$SITE_DIR" "$DOMAIN" > "$SITE_DIR/sitemap-images.xml" <<'PY'
import re, sys
from pathlib import Path
site_dir = Path(sys.argv[1]); domain = sys.argv[2]
img_re = re.compile(r'<img[^>]*>', re.I)
src_re = re.compile(r'src\s*=\s*"([^"]*)"', re.I)
alt_re = re.compile(r'alt\s*=\s*"([^"]*)"', re.I)
print('<?xml version="1.0" encoding="UTF-8"?>')
print('<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"')
print('        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">')
for f in sorted(site_dir.rglob("*.html")):
    if "node_modules" in str(f) or ".next" in str(f): continue
    rel = f.relative_to(site_dir)
    path = "/" + str(rel).replace("index.html", "")
    if path.endswith("/") and len(path) > 1: path = path[:-1]
    try: c = f.read_text(encoding="utf-8", errors="ignore")
    except: continue
    imgs = []
    for tag in img_re.findall(c):
        sm = src_re.search(tag); am = alt_re.search(tag)
        if not sm: continue
        src = sm.group(1)
        if src.startswith("/"): src = f"https://{domain}{src}"
        elif not src.startswith("http"): continue
        imgs.append((src, am.group(1) if am else ""))
    if not imgs: continue
    print(f"  <url><loc>https://{domain}{path}</loc>")
    for src, alt in imgs:
        esc = alt.replace("&","&amp;").replace("<","&lt;").replace(">","&gt;")
        print(f"    <image:image><image:loc>{src}</image:loc><image:caption>{esc}</image:caption></image:image>")
    print("  </url>")
print('</urlset>')
PY

14.6 Cron Schedule

The recommended cron schedule for the visual search toolchain:

# /etc/cron.d/visual-search-toolchain

# Bulk AVIF regeneration nightly per client
0 2 * * * user /var/www/sites/eurekabathworks.com/scripts/visual-search/generate-avif.sh eurekabathworks.com
0 3 * * * user /var/www/sites/heritagehardwoodfloors.com/scripts/visual-search/generate-avif.sh heritagehardwoodfloors.com

# Sitemap regeneration weekly
0 4 * * 0 user /var/www/sites/eurekabathworks.com/scripts/visual-search/generate-image-sitemap.sh eurekabathworks.com eurekabathworks.com

# AI referral counting weekly
0 5 * * 1 user /var/www/sites/eurekabathworks.com/scripts/visual-search/count-ai-image-referrals.sh eurekabathworks.com >> /var/www/sites/eurekabathworks.com/reports/visual-search/ai-referrals.log

# Alt text audit monthly
0 6 1 * * user /var/www/sites/eurekabathworks.com/scripts/visual-search/audit-alt-text.sh eurekabathworks.com

Adjust paths per client. Each client has its own copy of the scripts under /var/www/sites/[domain]/scripts/visual-search/.

14.7 Nginx Configuration For Image Delivery

The nginx site configuration must allow AI engine user agents and set cacheable headers on image assets.

# /etc/nginx/sites-available/[domain]
server {
    listen 443 ssl http2;
    server_name [domain];
    root /var/www/sites/[domain];

    # Image assets cached aggressively
    location ~* \.(avif|webp|jpg|jpeg|png|gif|svg)$ {
        expires 1y;
        add_header Cache-Control "public, immutable";
        add_header X-Content-Type-Options "nosniff";
        # No Referer blocking; AI engines must be able to load
        # No hotlink protection; image citation depends on embed access
    }

    # Image sitemap
    location = /sitemap-images.xml {
        add_header Cache-Control "public, max-age=3600";
        try_files $uri =404;
    }

    # User-Agent allow list verified for visual search bots
    # GoogleOther, OpenAI-SearchBot, PerplexityBot, ChatGPT-User
    # all served normally; no User-Agent blocking
}

14.8 Toolchain Maintenance Notes

libvips updates rarely; pin via apt-get hold if reproducibility is needed. ExifTool updates frequently to support new camera formats and is safe to keep at latest. Python scripts depend only on stdlib (no pip requirements file). All bash scripts use bash, not zsh or POSIX sh. All paths absolute; cron context is empty. Logs to /var/www/sites/[domain]/reports/visual-search/ (not /tmp) so they survive restarts.

End of Framework

Companion documents:

framework-imageseo.md: image production, file formats, srcset, lazy loading, file naming, on-page alt text foundations.
framework-multimodalsearch.md: cross-modal AI reasoning across image, audio, video, text.
framework-schema.md: ImageObject, Product, Recipe, HowTo, VisualArtwork schema reference and validators.
framework-aioverviews.md: Google AI Overviews and how visual search results feed AI summaries.
framework-aicitations.md: entity establishment via Wikidata, Wikipedia, and Knowledge Graph patterns that underpin Apple Visual Look Up and Lens identification results.
framework-searchgpt.md: ChatGPT image upload behavior and SearchGPT visual citation.
framework-perplexityspaces.md: Perplexity image search and inline citation mechanics.
framework-multiengine-tradeoffs.md: Google versus Bing versus AI engine optimization tradeoffs, including visual search divergence.
framework-mobileseo.md: mobile-first considerations that dominate visual search (camera entry points are all mobile).
framework-accessibility.md: accessibility alt text standards that overlap with visual search alt text.
framework-ecommerceseo.md: full e-commerce SEO context, Merchant Center, Amazon listings, Pinterest Shopping.
framework-videoseo.md: video thumbnail, VideoObject schema, YouTube optimization in depth.

Frequently asked questions

How is visual search alt text different from accessibility alt text?

Accessibility alt text answers what an image is briefly for a screen reader user, typically 5 to 15 words, concise and factual. Visual search alt text answers what the image is, its commercial or informational context, and what query it should match, typically 15 to 30 words, descriptive plus contextual. The page recommends writing the longer version since screen readers handle 30 words fine, while avoiding keyword stuffing.

What schema types matter most for visual search optimization?

Five schema types matter most. ImageObject is the universal foundation, with caption used by Pinterest Rich Pins and Google Lens, and creditText, copyrightNotice, creator, license, and acquireLicensePage signaling the Google Images licensable filter. Product schema with a multi-image array (target 5 to 7 images) is highest-leverage for e-commerce. Recipe and HowTo carry per-step images, and VisualArtwork signals fine art rather than product.

How do AI engines like ChatGPT and Perplexity extract entities from uploaded images?

The pipeline is: a vision encoder converts the image to embedding tokens, the model extracts recognizable entities (objects, text, brands, people, places), retrieval-capable engines turn those entities into text queries, then the model generates a response with optional citations. The retrieval step is where image SEO meets AI engine optimization, so ranking for the text-equivalent of likely image queries is the single highest-leverage optimization.

What are the Pinterest pin image specifications for visual search?

Pinterest is the only major surface where vertical aspect ratio dramatically outperforms horizontal. 2:3 is canonical (1000x1500 or 1200x1800), and 1:2.1 is the maximum height before pins truncate in feed. Square and horizontal underperform. Pinterest accepts JPEG, PNG, WebP, and animated GIF but not AVIF as of March 2026, so generate a JPEG variant. File size max is 20MB with a practical target of 200 to 500KB.

Can you measure visual search performance in analytics?

Visual search is largely uninstrumented; there is no Google Lens channel in GA4 and no consistent Pinterest Lens referrer, so measurement is a proxy game. You can track GSC image search clicks and impressions, Pinterest outbound clicks and conversion tags, Google Shopping image clicks, and AI engine image referrals via nginx access logs filtered by referer. View-through proxies include branded search lift, direct traffic lift, and Pinterest save velocity.

Want this framework implemented on your site?

ThatDevPro ships these frameworks as productized services. SDVOSB-certified veteran owned. Cassville, Missouri.

See Engine Optimization service ›