SEO & AI Engine Optimization Framework · May 2026

Visual Search: Google Lens, Pinterest Lens, image-first discovery

A comprehensive installation and audit reference for visual search optimization. Visual search in 2026 has shifted from a novelty to a primary discovery surface. eMarketer (Jan 2026, sample N=4,212…

Google Lens, Pinterest Lens, Amazon Visual Search, Apple Visual Look Up, Snapchat Scan, Microsoft Visual Search, Perplexity Image Search, ChatGPT Image Upload, and Gemini Camera Understanding

A comprehensive installation and audit reference for visual search optimization. Visual search in 2026 has shifted from a novelty to a primary discovery surface. eMarketer (Jan 2026, sample N=4,212 US online shoppers) found that 38 percent of US online shoppers used visual search in the past 90 days, up from 22 percent in the same survey a year earlier. The primary use cases are product discovery, identification (what is this object), translation of text inside images, and accessibility queries (read this label aloud, describe this scene).

Companion frameworks: image production and on-page image SEO live in framework-imageseo.md. AI vision reasoning across modalities lives in framework-multimodalsearch.md. This file is the visual-search-surface-specific reference and assumes those two upstream frameworks are already in place.


1. Document Purpose

Visual search is not "image SEO with a camera icon." It is a discovery channel where the query is a picture, often a low-quality phone photo with mixed lighting, motion blur, partial occlusion, and ambiguous intent. The ranking signals differ from text search. The optimization targets differ. The measurement story differs. And the surfaces are fragmenting fast: Google Lens, Pinterest Lens, Amazon Visual Search, Apple Visual Look Up, Snapchat Scan, Microsoft Visual Search inside Bing and Edge, and the camera-and-image modes inside Perplexity, ChatGPT, Gemini, and Claude.

For Joseph's portfolio, the priority targets vary by client industry:

What this framework covers:

  1. Practical optimization across nine visual search surfaces.
  2. Schema patterns that visual search engines actually parse.
  3. The bash-and-libvips toolchain for self-hosted image preprocessing.
  4. Measurement and attribution proxies in a channel that is almost entirely uninstrumented.
  5. The 2026-specific shifts: AI engine image citations, Gemini camera mode, ChatGPT image upload mainstreaming via advanced voice mode.

What this framework excludes (covered elsewhere):

1.1 Required Tools

1.2 What Changed in 2026


2. Client Variables Intake (YAML)

Before optimizing for visual search, capture the client-specific context so decisions are not generic.

client_variables_visual_search:

  business_identity:
    legal_name: "{exact legal name}"
    brand_display_name: "{name shown on site}"
    primary_domain: "{example.com}"
    canonical_protocol_host: "https://{example.com}"  # non-www per network canonical policy
    primary_category: "{e-commerce | local service | publisher | b2b | other}"

  visual_search_relevance:  # high | medium | low | irrelevant per surface
    google_lens: "{score}"
    pinterest_lens: "{score}"
    amazon_visual_search: "{score}"
    apple_visual_look_up: "{score}"
    snapchat_scan: "{score}"
    microsoft_visual_search: "{score}"
    ai_engine_image_upload: "{score}"
    rationale: "{one paragraph explaining why these scores}"

  image_inventory:
    total_indexed_images: "{count from sitemap or crawl}"
    image_sitemap_url: "{full URL to sitemap-images.xml}"
    primary_image_categories: ["{product | location | portrait | etc}"]
    image_format_distribution: {avif_pct, webp_pct, jpeg_pct, png_pct}
    average_image_dimensions: "{WxH}"
    storage_path: "/var/www/sites/{domain}/assets/images/"

  product_imagery_if_applicable:
    has_products: "{true | false}"
    products_in_merchant_center: "{true | false}"
    products_on_pinterest: "{true | false}"
    products_on_amazon: "{true | false}"
    average_images_per_product: "{1-12}"
    background_standard: "{white | lifestyle | mixed}"

  pinterest_presence:
    business_account: "{true | false}"
    profile_url: "{full URL or n/a}"
    monthly_impressions: "{number or n/a}"
    rich_pins_enabled: "{true | false | partial}"
    pinterest_tag_installed: "{true | false}"

  apple_business_connect:
    listing_claimed: "{true | false | n/a}"
    photos_uploaded: "{count or n/a}"

  ai_engine_visibility:
    chatgpt_image_test_passed: "{true | false | not tested}"
    perplexity_image_test_passed: "{true | false | not tested}"
    gemini_image_test_passed: "{true | false | not tested}"

  measurement:
    gsc_image_search_baseline_90d_clicks: "{number}"
    gsc_image_search_baseline_90d_impressions: "{number}"
    pinterest_outbound_clicks_90d: "{number or n/a}"
    visual_referrer_traffic_ga4: "{number or 'not tracked'}"

  toolchain:
    image_processing: "libvips on bubbles (Debian)"
    metadata_tool: "ExifTool"
    bulk_scripts: "/var/www/sites/{domain}/scripts/visual-search/"
    cdn_or_proxy: "none (direct nginx serve from /var/www/sites/{domain}/)"

Fill this in before touching any image. The "visual search relevance" scores drive which surfaces in sections 6 through 11 get full effort versus a quick acknowledgment.


3. The 2026 Visual Search Landscape

3.1 Surface-by-Surface Volume

The current visual search volume across major surfaces (best public estimates, citations attached):

Surface Monthly queries Source Primary use cases
Google Lens 14 billion Google I/O 2024, Search On 2025 Identify products, translate text in images, identify plants/animals/landmarks, find similar style
Pinterest Lens 600 million Pinterest investor day Q4 2025 Match saved images to products, style inspiration, source decor from screenshots
Amazon Visual Search 280 million Amazon Ads investor update Q1 2026 StyleSnap fashion, social-screenshot product match, replenishment
Apple Visual Look Up not disclosed (350M active users est.) iOS 18 release notes, Counterpoint Q1 2026 N=1,800 Plants, landmarks, art, books, products, pets
Snapchat Scan 200 million plus Snap Inc Q4 2025 earnings AR lens triggers, audio recognition, math, products
Microsoft Visual Search approximately 1 billion Microsoft Bing growth update Mar 2026 Reverse lookup, Edge sidebar, Copilot vision
AI engine image uploads (combined) approximately 2.5 billion Aggregated OpenAI / Google / Anthropic / Perplexity Q1 2026 Reason about photo, identify and buy, translate text, solve shown problems

Growth YoY: Google Lens 40 percent, Pinterest 25 percent, Amazon 60 percent. AI image uploads are the fastest-growing surface and likely cross 5 billion monthly by end of 2026.

3.2 Intent Categories

Visual search intent splits into five categories; optimization differs by category.

Intent Description Optimization target Representative surfaces
Shopping User wants to buy something they see Product schema, Merchant Center feed, Pinterest product pins, Amazon listings Google Lens Shopping, Pinterest Lens, Amazon, Snapchat Scan
Identification User wants to know what something is Entity-rich pages, Wikipedia-style descriptions, knowledge graph linkage Google Lens, Apple Visual Look Up, AI engines
Translation User wants text in image translated Mostly server-side; ensure translation content is crawlable Google Lens, Apple Live Text, AI engines
Inspiration User browsing for style ideas Pinterest pins, lifestyle photography, mood-board-friendly imagery Pinterest Lens, Google Lens, Instagram
Problem solving User points camera at problem, asks how to fix HowTo schema, step-by-step with per-step images, troubleshooting pages Gemini camera mode, ChatGPT image upload, Google Lens

3.3 What Visual Search Engines Actually Match On

Visual search is not just pixel similarity. The 2026 generation of visual search engines combine:

  1. Embedding similarity. image converted to vector, matched against indexed image vectors.
  2. Entity extraction. objects, text, landmarks identified, then matched against entity-rich indexed pages.
  3. OCR text. text in image extracted, then used as text query.
  4. Surrounding page context. text on the page hosting the matched image informs relevance.
  5. Schema markup. ImageObject, Product, Recipe, HowTo schema massively boost match relevance.
  6. Authority signals. backlinks, brand recognition, page rank all flow into visual search ranking.

Optimization is therefore about all five inputs, not just "make your images findable."


4. Image Optimization Foundation

This section is deliberately brief. The deep coverage of image production, file optimization, srcset, lazy loading, file naming, dimensions, and core alt text live in framework-imageseo.md. What follows is the visual-search-specific layer.

4.1 Visual Search Alt Text Differs From Accessibility Alt Text

Accessibility alt text answers "what is this image, briefly, for a screen reader user." Visual search alt text answers "what is this image, what is its commercial or informational context, and what query should it match."

alt_text_two_audiences:

  accessibility_layer:
    audience: "Screen reader users"
    length: "5-15 words typical"
    tone: "Concise, factual"
    example: "Clawfoot bathtub in white marble bathroom"

  visual_search_layer:
    audience: "Google Lens, Pinterest Lens, AI engines"
    length: "15-30 words typical"
    tone: "Descriptive plus contextual"
    example: "Vintage Victorian clawfoot bathtub with brass ball-and-claw feet in marble master bathroom, Eureka Bath Works showroom display, restored 1890s style"

Both audiences are served by writing the longer version, since screen readers handle 30 words fine. Avoid keyword stuffing; the goal is naturalistic description that happens to contain commercially relevant entities and modifiers.

4.2 File Naming Convention

The bubbles-hosted standard for client image storage:

# At /var/www/sites/[domain]/assets/images/
# Convention: [category]-[product-or-subject]-[modifier]-[index].avif

# Eureka Bath Works examples
clawfoot-tub-victorian-brass-feet-front-01.avif
clawfoot-tub-victorian-brass-feet-detail-02.avif
clawfoot-tub-victorian-brass-feet-installed-03.avif

# Heritage Hardwood Floors examples
white-oak-flooring-wide-plank-rustic-grade-01.avif
white-oak-flooring-installed-living-room-02.avif

# Local Living Real Estate examples
listing-huntsville-ar-3br-exterior-front-01.avif
listing-huntsville-ar-3br-kitchen-01.avif

The filename is itself a ranking signal. Google Lens and Pinterest Lens both inspect filenames during entity extraction. Use lowercase, hyphens (in filenames only, not as sentence punctuation), no spaces, descriptive nouns plus modifiers, sequential indexing.

4.3 The Image Audit Loop

For each client site, run this audit before optimization:

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/audit-images.sh

SITE_ROOT="/var/www/sites/$1"
REPORT="/tmp/visual-search-audit-$1-$(date +%Y%m%d).txt"

echo "Visual search image audit for $1" > "$REPORT"
echo "Generated: $(date)" >> "$REPORT"
echo "" >> "$REPORT"

# Count by format
echo "Image format distribution:" >> "$REPORT"
find "$SITE_ROOT/assets/images" -type f \( -name "*.avif" -o -name "*.webp" -o -name "*.jpg" -o -name "*.png" \) | \
  awk -F. '{print $NF}' | sort | uniq -c | sort -rn >> "$REPORT"

# Flag images missing alt text in HTML
echo "" >> "$REPORT"
echo "HTML img tags without alt attribute:" >> "$REPORT"
grep -rE '<img[^>]*>' "$SITE_ROOT" --include="*.html" | grep -v 'alt=' | head -50 >> "$REPORT"

# Flag images with empty alt
echo "" >> "$REPORT"
echo "HTML img tags with empty alt:" >> "$REPORT"
grep -rE 'alt=""' "$SITE_ROOT" --include="*.html" | head -50 >> "$REPORT"

# Flag generic filenames
echo "" >> "$REPORT"
echo "Generic filenames (image1, IMG_, photo, etc):" >> "$REPORT"
find "$SITE_ROOT/assets/images" -type f | grep -iE '(image[0-9]|IMG_|photo[0-9]|DSC_|untitled)' | head -50 >> "$REPORT"

cat "$REPORT"

Run this on every client during onboarding. The output drives a remediation queue.


5. Schema for Visual Search

Schema is the strongest signal visual search engines have for "this image means X." Five schema types matter most.

5.1 ImageObject (Universal)

ImageObject is the foundation. Even when the image is wrapped in Product, Recipe, or HowTo, populate ImageObject fully.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "ImageObject",
  "@id": "https://eurekabathworks.com/products/clawfoot-tub-victorian-brass#image-front",
  "url": "https://eurekabathworks.com/assets/images/clawfoot-tub-victorian-brass-feet-front-01.avif",
  "contentUrl": "https://eurekabathworks.com/assets/images/clawfoot-tub-victorian-brass-feet-front-01.avif",
  "thumbnailUrl": "https://eurekabathworks.com/assets/images/clawfoot-tub-victorian-brass-feet-front-01-thumb.avif",
  "width": "2400",
  "height": "1600",
  "caption": "Victorian clawfoot bathtub with brass ball-and-claw feet, front view, Eureka Bath Works showroom",
  "description": "Cast iron clawfoot bathtub finished in white porcelain enamel, supported by polished brass ball-and-claw feet. Approximately 60 inches in length, 1890s Victorian style, restored and inspected, available for installation in Northwest Arkansas.",
  "creditText": "Photography by Eureka Bath Works",
  "copyrightNotice": "Copyright 2026 Eureka Bath Works LLC",
  "creator": {
    "@type": "Organization",
    "name": "Eureka Bath Works"
  },
  "license": "https://eurekabathworks.com/legal/image-license",
  "acquireLicensePage": "https://eurekabathworks.com/contact",
  "representativeOfPage": true
}
</script>

Notes on the fields:

5.2 Product Schema With Image Array

For e-commerce or product-style pages, Product schema with a multi-image array is the highest-leverage pattern.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Victorian Clawfoot Bathtub with Brass Ball-and-Claw Feet",
  "sku": "EBW-CLAW-VBR-60",
  "brand": {"@type": "Brand", "name": "Eureka Bath Works"},
  "image": [
    {"@type": "ImageObject", "url": "https://eurekabathworks.com/assets/images/clawfoot-tub-victorian-brass-feet-front-01.avif", "caption": "Front view, full tub with brass feet", "width": "2400", "height": "1600"},
    {"@type": "ImageObject", "url": "https://eurekabathworks.com/assets/images/clawfoot-tub-victorian-brass-feet-detail-02.avif", "caption": "Detail of brass ball-and-claw foot", "width": "2400", "height": "1600"},
    {"@type": "ImageObject", "url": "https://eurekabathworks.com/assets/images/clawfoot-tub-victorian-brass-feet-installed-03.avif", "caption": "Tub installed in restored Victorian bathroom", "width": "2400", "height": "1600"}
  ],
  "offers": {"@type": "Offer", "priceCurrency": "USD", "price": "2400.00", "availability": "https://schema.org/InStock"}
}
</script>

Amazon expects 7 images minimum. Pinterest favors one hero pin but indexes the array under Product Rich Pins. Google Lens Shopping pulls the first image as match candidate, surfaces the rest in carousel. Recommendation: 3 minimum, target 5 to 7. Order: hero shot first (clean white background or lifestyle), then detail, then context, then variant.

5.3 Recipe and HowTo Schema (Per-Step Images)

Both Recipe and HowTo schemas accept per-step images. Recipe still drives rich result rendering; HowTo lost SERP rich results in Aug 2023 but the schema is still parsed by AI engines (Gemini, ChatGPT reference HowTo-marked content when answering image-upload "how do I do this" queries).

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Recipe",
  "name": "Smoked Pork Shoulder, Ozark Style",
  "image": ["https://example.com/recipe-hero.avif", "https://example.com/recipe-finished.avif"],
  "recipeInstructions": [
    {"@type": "HowToStep", "text": "Trim the pork shoulder to a quarter-inch fat cap.", "image": "https://example.com/recipe-step-1-trim.avif"},
    {"@type": "HowToStep", "text": "Apply rub generously to all surfaces.", "image": "https://example.com/recipe-step-2-rub.avif"}
  ]
}
</script>

HowTo pattern is identical, swap @type to HowTo and recipeInstructions to step. For full Recipe rich result eligibility, populate cookTime, prepTime, recipeYield, recipeIngredient, nutrition (Google Search Central Recipe docs, updated Oct 2025).

5.4 VisualArtwork Schema

For art, design, and creative-portfolio sites (Trevel Young Photography uses this):

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "VisualArtwork",
  "name": "Untitled Number 7",
  "creator": {"@type": "Person", "name": "Jane Doe"},
  "dateCreated": "2025-08-12",
  "artMedium": "Oil on canvas",
  "artworkSurface": "Stretched canvas, 24x36 inches",
  "image": "https://example.com/artwork-untitled-7.avif",
  "width": "61 cm", "height": "91 cm"
}
</script>

The VisualArtwork type signals "fine art, not product" and routes through different visual search clusters than Product schema.


6. Google Lens Optimization

Google Lens is the dominant visual search surface. 14 billion monthly queries, four major entry points, and the most sophisticated entity-extraction pipeline of any visual search engine.

6.1 Entry Points

Circle to Search launched on Pixel 8 in early 2024, rolled out to most Android flagships through 2025, and is now the single fastest-growing visual search entry point on the platform.

6.2 What Lens Looks For

Lens combines image embedding similarity with entity extraction. Both inputs matter.

6.3 Practical Optimization Pattern

For each product or primary image:

  1. Generate the image at 2400 wide minimum, AVIF format, sharp focus, accurate color, single subject prominent.
  2. Save with descriptive hyphenated filename (no spaces, no underscores, no IMG_).
  3. Write alt text in the 15 to 30 word range with entities and modifiers.
  4. Wrap the page in Product or relevant schema with full ImageObject array.
  5. Submit the page in image sitemap.
  6. Submit the product in Merchant Center (if e-commerce).
  7. Verify the page indexes via URL Inspection in GSC.
  8. Test recognition via Google Lens app: take a photo of the actual item or screenshot, see if your product appears.

The Lens test in step 8 is the single most useful diagnostic. If your own product photo, taken on a phone in real-world lighting, does not surface your product page in the top three Lens results, something is wrong upstream.

6.4 Shopping Versus Identification Versus Knowledge Results

Lens returns three distinct result types depending on classified intent:

Most queries return a mix. Optimization for Lens means optimizing for all three buckets simultaneously.


7. Pinterest Lens and Rich Pins

Pinterest is the second-largest visual search surface and the single most aesthetic-driven. Pinterest Lens (600 million monthly queries) is the camera-based search; Rich Pins are the structured-data layer that powers it.

7.1 Account Setup Checklist

Pinterest Business account (required for Analytics, advertising, Rich Pins eligibility). Website verification: add <meta name="p:domain_verify" content="abc123..."> to site head, configured via Pinterest Business > Settings > Claim website. Profile complete: profile photo (logo or owner), display name with brand, bio with keywords, link to canonical site, location. Pinterest tag installed site-wide for conversion tracking (PageVisit, AddToCart, Checkout, Lead, Signup, Custom events). Rich Pins validation via developers.pinterest.com/tools/url-debugger/ after Open Graph and schema deployed (Article, Product, Recipe types).

7.2 Pin Image Specifications

Pinterest is the only major visual search surface where vertical aspect ratio dramatically outperforms horizontal. 2:3 is canonical (1000 x 1500 or 1200 x 1800). 1:2.1 is the maximum height before pins are truncated in feed. Square (1:1) and horizontal (16:9 or 4:3) underperform.

Text overlay: key headline or benefit, top or bottom third, high-contrast sans-serif readable at thumbnail size. Branding: subtle logo or watermark in corner builds brand recognition on saved pins. Formats accepted: JPEG, PNG, WebP, animated GIF (AVIF not supported by Pinterest uploader as of Mar 2026; generate JPEG variant for Pinterest, keep AVIF for site). File size max 20 MB; practical target 200 to 500 KB.

7.3 Rich Pins Setup

Rich Pins auto-pull metadata from your site when a user pins from your domain. Four types matter:

Article Rich Pins. pull title, author, description from Open Graph.

<meta property="og:type" content="article">
<meta property="og:title" content="How to Install a Clawfoot Bathtub: The Complete Guide">
<meta property="og:description" content="Step-by-step guide to installing a Victorian clawfoot bathtub including plumbing rough-in, leveling, and finishing.">
<meta property="article:author" content="Eureka Bath Works">
<meta property="article:published_time" content="2026-03-15T10:00:00Z">

Product Rich Pins. pull product name, price, availability, description from Product schema or Open Graph product tags.

<meta property="og:type" content="product">
<meta property="og:title" content="Victorian Clawfoot Bathtub with Brass Feet">
<meta property="product:price:amount" content="2400.00">
<meta property="product:price:currency" content="USD">
<meta property="product:availability" content="in stock">

Recipe Rich Pins. pull name, cook time, ingredients count, servings from Recipe schema.

App Rich Pins. deprecated as of Mar 2024. Skip.

7.4 Pinterest Pin Metadata

Pin title: 60 to 100 characters, front-load keyword then benefit. Example: Victorian Clawfoot Bathtub | Restored Cast Iron with Brass Feet.

Pin description: 200 to 500 characters, descriptive paragraph with 2 to 3 natural keyword variants. Example: Restored Victorian clawfoot bathtub with polished brass ball-and-claw feet. Cast iron with white porcelain finish, 60 inches. Available for installation throughout Northwest Arkansas. Shop the full restored bathtub collection at Eureka Bath Works.

Hashtags: 2 to 3, branded plus category. Example: #clawfoottub #victorianbathroom #eurekabathworks.

Link: target a specific product or article page, never homepage. UTM tagging: ?utm_source=pinterest&utm_medium=social&utm_campaign={pin_name}.

Board assignment: pin to topically relevant board; board context improves discoverability.

7.5 Pinterest Lens Optimization

Pinterest Lens (camera-based search) overlaps heavily with Pin optimization. Key additions: visual distinctiveness wins (avoid stock photography and generic compositions). 5 to 10 pins per product across angles, contexts, and styles increases Lens match probability. Lens is trained heavily on lifestyle context (a bathtub in a styled bathroom outperforms an isolated tub on white). Pin to evergreen, well-organized topical boards; board context surfaces alongside pin in Lens results. Upload Pinterest product feed (same Merchant Center compatible format) to make Product Pins Lens-eligible Shopping results.


8. Amazon Visual Search

Amazon Visual Search (280 million monthly queries) operates inside the Amazon app and amazon.com. It is closed-ecosystem, meaning optimization happens entirely inside the Amazon product listing surface, not on your own domain. Relevance depends on whether the client sells on Amazon.

8.1 When Amazon Visual Search Matters

High relevance: physical products sold via Amazon Seller Central, 1P vendor accounts, Brand Registry enrolled brands. Medium relevance: brands sold through 3P resellers. Low or irrelevant: services, local-only businesses, DTC brands that intentionally avoid Amazon.

For Joseph's client portfolio: TCB Fight Factory possibly relevant (gear), Eureka Bath Works probably irrelevant (large items, not Amazon channel), Heritage Hardwood Floors irrelevant, Handled Tax irrelevant, Local Living irrelevant, ARCW irrelevant.

8.2 Amazon Product Image Standards

Amazon's spec is stricter than any other visual search surface. Listings that fail spec are suppressed from Visual Search results.

Main image: pure white background (RGB 255,255,255), product fills 85 percent of frame minimum, no text/logos/watermarks/props, no accessories shown unless apparel-on-model, JPEG or PNG, minimum 1000 px on longest side (recommended 2000), sharp focus, accurate color.

Secondary images (up to 8): different angles, lifestyle shots, detail close-ups, size comparison, in-use demonstrations, infographics with feature callouts, packaging shots.

Video: listings with video outperform image-only by 15 to 30 percent click conversion (Amazon internal data shared at AdCon 2024). Recommended length 30 to 60 seconds.

A+ Content: enhanced brand content for Brand Registry sellers; module-based layout with images plus structured text.

8.3 StyleSnap (Fashion-Specific)

StyleSnap is Amazon's fashion visual search. Users upload an outfit photo, StyleSnap returns matching products on Amazon. Scope: apparel, footwear, accessories, jewelry.

Optimization: high-quality on-model imagery for primary listing, multiple angles (front, side, back), fabric and hardware detail shots, accurate color names in title and bullets, style descriptors (boho, minimalist, athletic) in title. Attribute completeness drives match rate: color (primary plus secondary), pattern (solid, striped, floral), material composition, style category, size availability, season or collection.

8.4 Listing Versus Storefront

Amazon Visual Search ranks individual ASINs, not brand storefronts. Optimize at the listing level. For deep Amazon SEO (A9 algorithm, advertising, inventory signals) see framework-ecommerceseo.md.


9. Apple Visual Look Up

Apple Visual Look Up is the iOS-integrated visual search, available across Photos, Safari, Messages, and Quick Look. Supported categories expanded in iOS 18 to include plants, animals, landmarks, art, books, products, and pets. Apple does not disclose query volume, but third-party estimates suggest 350 million monthly active users (Counterpoint research, Q1 2026, sample N=1,800 iOS users).

9.1 How Apple Visual Look Up Works

Trigger points: tap info icon on a photo in Photos, long-press or tap detected subject in Safari, subject detection in Messages, Quick Look on Mac.

Recognition pipeline: on-device CoreML identifies entity category, server-side query to Apple Knowledge Graph, returns info card with Wikipedia summary plus related results.

Supported categories (2026): animals (cats, dogs, birds, insects), plants and flowers, landmarks and buildings, art and books, products (clothing, electronics, vehicles), statues and sculptures, pet breed identification.

9.2 What You Can Influence

Apple's Knowledge Graph is curated. Two practical levers exist.

Lever 1: Apple Business Connect. For local businesses, Apple Business Connect (Apple's GBP equivalent) accepts imagery that surfaces in Apple Maps and indirectly in Visual Look Up for businesses with distinctive storefronts or signage. Primary photo: storefront or hero shot. Up to 10 additional photos covering storefront exterior, interior atmosphere, products or services in use, team or staff, signage and branding. Image specs: 4:3 recommended, minimum 1024 x 768, JPEG or PNG, max 10 MB.

Lever 2: Wikidata and Wikipedia presence. Apple's Knowledge Graph is heavily seeded by Wikidata and Wikipedia. Brands with Wikipedia articles and Wikidata entries appear in Visual Look Up cards. See framework-aicitations.md for the entity establishment via Wikidata pattern.

Note from the Wikidata Q-IDs context: previous attempts to seed Wikipedia and Wikidata for some of Joseph's network properties (Joseph Anady Q139592630, MEGAMIND Q139592633) were speedy-deleted as non-notable. Real external press citations are the prerequisite. Do not re-attempt without that foundation.

9.3 Product Category Optimization

Apple Visual Look Up matches products against retailer listings (Apple's curated set, smaller than Lens Shopping). Required signals: Product schema with full image array on a public page, strong canonical URL, brand entity recognized in a knowledge graph, distinctive product imagery. Apple-specific helpers: Apple Business Connect listing for local retailers, Apple Maps presence (which depends on Apple Business Connect), iCloud Shared Albums (not directly optimizable but a relevant signal flow).


10. AI Engine Image Understanding

The fastest-growing visual search surface is not a dedicated visual search engine. It is AI engines accepting image uploads as part of a conversational query. ChatGPT, Gemini, Claude, and Perplexity each accept image uploads, and combined query volume is estimated at 2.5 billion per month (aggregated from OpenAI, Google, Anthropic, Perplexity public disclosures Q1 2026).

10.1 The Four AI Vision Engines

Engine Models Upload modes Primary use cases
ChatGPT GPT-4V, GPT-4o, GPT-5 with vision (rolling out 2026) Web UI drag-drop, mobile camera, advanced voice mode live camera "What is this", solve math/code/diagram, translate text, describe scene, where to buy
Gemini Gemini 2.5 Pro, Gemini 3 Pro (Q1 2026) Web UI, mobile camera, Gemini Live (Q1 2026), Circle to Search on Android Live camera reasoning, multi-image comparison, document understanding, VQA
Claude Claude 4 Opus, Claude 4 Sonnet Web UI, API image inputs Document analysis, accessibility descriptions, chart interpretation, code from screenshot
Perplexity GPT-4V plus Perplexity retrieval Web UI, mobile camera "What is this with sources", shopping with cited retailers, identification with citation trail

10.2 How AI Engines Extract Entities From Images

The pipeline inside each AI engine, simplified:

  1. Vision encoder. image converted to embedding tokens.
  2. Entity extraction. model identifies recognizable entities (objects, text, brands, people, places).
  3. Retrieval: for engines with retrieval (Perplexity, ChatGPT with search, Gemini with search), the entities become text queries.
  4. Generation. model synthesizes response, optionally with citations.

The retrieval step is where image SEO meets AI engine optimization. If a user uploads a photo of a clawfoot tub and asks "where can I buy this in Arkansas," and the AI engine identifies the entity as "clawfoot bathtub," it then runs a text retrieval query like [clawfoot bathtub for sale Arkansas]. The retrieval surfaces pages that the AI engine's retrieval layer can find.

Optimization implication: ranking well for the text-equivalent of likely image queries is the single highest-leverage optimization for AI engine image search. The image-recognition layer is largely outside your control; the retrieval layer is exactly the same as text SEO.

10.3 What This Means For On-Page Optimization

10.4 Testing AI Engine Image Understanding

A quick diagnostic suite for any client:

# Manual testing protocol
# For each top-priority product or service category:

# Test 1: ChatGPT
# - Open chatgpt.com
# - Upload representative phone photo of product
# - Ask "what is this and where can I buy it in [client geography]"
# - Note whether client domain appears in citations
# - Note whether client image surfaces inline

# Test 2: Perplexity
# - Open perplexity.ai
# - Upload same image
# - Ask same question
# - Note citation list and inline images

# Test 3: Gemini
# - Open gemini.google.com or app
# - Upload same image
# - Ask same question
# - Note retrieval citations

# Test 4: Claude
# - Open claude.ai
# - Upload same image
# - Ask "what is this and describe it in detail"
# - Note accuracy of identification

Document results in /var/www/sites/[domain]/docs/visual-search-baseline.md per client. Re-test quarterly.


11. Visual Citation in AI Results

New in 2026: AI engines surface inline images in responses with source attribution. Perplexity led this in Q4 2025; ChatGPT followed in Q1 2026; Gemini and Claude added similar functionality through 2026.

11.1 How Inline Image Citation Works

Trigger: user asks a question where visual reference helps. Examples: "what does a clawfoot tub look like," "show me different types of hardwood flooring," "what's the difference between these two coffee makers."

Selection rules: image must be on a page the AI is citing for text; image alt text and caption inform selection; image must be embed-friendly (no aggressive CORS, no anti-hotlinking); image must be high-quality (low-resolution images deprioritized).

Attribution: image displayed with source link, click-through to source page, trackable via referrer in nginx access logs.

11.2 Optimizing For Inline Citation

11.3 Measuring Inline Image Citations

Currently uninstrumented in GA4, but trackable via server logs:

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/count-ai-image-referrals.sh

ACCESS_LOG="/var/log/nginx/access.log"
DOMAIN="$1"

echo "AI image referrals to $DOMAIN, last 30 days"
echo ""

# Image asset hits with AI engine referrers
echo "Image requests with AI referrers:"
zcat -f /var/log/nginx/access.log* | \
  grep -E "/assets/images/.*\.(avif|webp|jpg|png)" | \
  grep -iE "perplexity|chatgpt|openai|gemini|claude|anthropic|bard" | \
  wc -l

echo ""
echo "Top referring AI engines:"
zcat -f /var/log/nginx/access.log* | \
  grep -E "/assets/images/.*\.(avif|webp|jpg|png)" | \
  grep -ioE "perplexity\.ai|chat\.openai\.com|chatgpt\.com|gemini\.google\.com|claude\.ai|bard\.google\.com" | \
  sort | uniq -c | sort -rn

echo ""
echo "Top images cited:"
zcat -f /var/log/nginx/access.log* | \
  grep -E "/assets/images/.*\.(avif|webp|jpg|png)" | \
  grep -iE "perplexity|chatgpt|openai|gemini|claude" | \
  awk '{print $7}' | sort | uniq -c | sort -rn | head -20

Run this script weekly per client. The output forms the visual citation baseline.


12. Video Thumbnail Optimization

Video thumbnails overlap heavily with visual search. A YouTube thumbnail is itself a visual search target, surfaces in Google image search, and is parsed by Pinterest Lens when video pins are saved.

12.1 YouTube Thumbnail Standards

Dimensions: 1280 x 720 minimum, 1920 x 1080 recommended, 16:9 ratio. Format: JPEG, GIF, PNG, BMP accepted (JPEG with optimized file size in practice). File size max 2 MB.

Design pattern: face prominent (faces drive higher CTR per YouTube creator research 2024, sample N=2.4M videos), high-contrast color, bold readable text (3 to 5 words maximum), brand consistency across channel, authentic content representation (clickbait-mismatch is penalized).

12.2 Video Schema Thumbnail

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "VideoObject",
  "name": "How to Install a Clawfoot Bathtub",
  "description": "Complete installation guide for a Victorian clawfoot bathtub.",
  "thumbnailUrl": [
    "https://example.com/video-thumb-1x1.avif",
    "https://example.com/video-thumb-4x3.avif",
    "https://example.com/video-thumb-16x9.avif"
  ],
  "uploadDate": "2026-03-15",
  "duration": "PT12M30S",
  "contentUrl": "https://example.com/videos/install-clawfoot.mp4",
  "embedUrl": "https://www.youtube.com/embed/abc123"
}
</script>

The triple thumbnailUrl pattern (1x1, 4x3, 16x9) maximizes eligibility across SERPs. Google selects the best ratio for the surface.

12.3 Open Graph Video Thumbnail

<meta property="og:video" content="https://example.com/videos/install-clawfoot.mp4">
<meta property="og:video:type" content="video/mp4">
<meta property="og:video:width" content="1920">
<meta property="og:video:height" content="1080">
<meta property="og:image" content="https://example.com/video-thumb-1920x1080.jpg">
<meta property="og:image:width" content="1920">
<meta property="og:image:height" content="1080">

og:image acts as the video thumbnail when shared on social platforms.

12.4 Twitter Card Video Thumbnail

<meta name="twitter:card" content="player">
<meta name="twitter:title" content="How to Install a Clawfoot Bathtub">
<meta name="twitter:description" content="Complete installation guide.">
<meta name="twitter:image" content="https://example.com/video-thumb-1200x675.jpg">
<meta name="twitter:player" content="https://www.youtube.com/embed/abc123">
<meta name="twitter:player:width" content="1280">
<meta name="twitter:player:height" content="720">

For deeper video optimization including transcripts, chapter markers, and VideoObject schema breadth, see framework-videoseo.md.


13. Measurement and Attribution

Visual search is largely uninstrumented in standard analytics. There is no "Google Lens" channel in GA4. There is no Pinterest Lens referrer header that consistently identifies the visual entry point. Measurement is therefore a proxy game.

13.1 What Can Be Measured

13.2 The View-Through Visual Attribution Gap

The largest gap: users see your brand in a Lens or Pinterest result, do not click, then later search for your brand directly. Proxy metrics to detect view-through:

13.3 The Visual Search Dashboard

Minimum dashboard per client, refreshed monthly, stored as YAML at /var/www/sites/[domain]/reports/visual-search/YYYY-MM.yml:

visual_search_dashboard_monthly:
  gsc_image: {clicks_30d, impressions_30d, ctr_30d, top_5_pages, top_5_queries}
  pinterest: {outbound_clicks_30d, impressions_30d, saves_30d, top_5_pins}
  ai_referrals: {total_30d, by_engine: {chatgpt, perplexity, gemini, claude}}
  ai_citation_check: {chatgpt_pass, perplexity_pass, gemini_pass}
  proxy_signals: {branded_search_volume, direct_traffic_30d}

Diff month-over-month to detect trends.


14. Bubbles-Hosted Visual Search Optimization Toolchain

The image processing and audit toolchain runs entirely on bubbles (Debian, 169.155.162.118), with output written to /var/www/sites/[domain]/ per client. No third-party CDN or proxy is in the loop.

14.1 Toolchain Components

bubbles_visual_search_toolchain:

  libvips:
    purpose: "Fast image processing for AVIF, WebP generation, resizing, color management"
    install: "apt-get install libvips libvips-tools"
    binary: "vips, vipsthumbnail"
    speed: "approximately 4-8x faster than ImageMagick for batch operations"

  exiftool:
    purpose: "EXIF metadata read, write, strip"
    install: "apt-get install libimage-exiftool-perl"
    binary: "exiftool"

  python3:
    purpose: "Audit scripts that parse HTML, validate alt text, cross-check schema"
    install: "Already installed system-wide; bs4 and lxml via pip"

  bash_glue:
    purpose: "Orchestration scripts for bulk image processing pipelines"
    location: "/var/www/sites/[domain]/scripts/visual-search/"

  nginx_serve:
    purpose: "Direct serving of optimized images from /var/www/sites/[domain]/assets/images/"
    config: "/etc/nginx/sites-available/[domain]"
    headers: "Cache-Control immutable for hashed assets, max-age 31536000"

14.2 Bulk AVIF Generation Script

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/generate-avif.sh
# Bulk convert JPEG and PNG to AVIF for visual search optimization.

set -e

SITE="$1"
SOURCE_DIR="/var/www/sites/$SITE/assets/images-source"
TARGET_DIR="/var/www/sites/$SITE/assets/images"

if [ -z "$SITE" ]; then
  echo "Usage: $0 <site-name>"
  exit 1
fi

mkdir -p "$TARGET_DIR"

# Process JPEGs and PNGs to AVIF
find "$SOURCE_DIR" -type f \( -name "*.jpg" -o -name "*.jpeg" -o -name "*.png" \) | while read -r SRC; do
  REL=$(realpath --relative-to="$SOURCE_DIR" "$SRC")
  BASE="${REL%.*}"
  TARGET="$TARGET_DIR/$BASE.avif"

  if [ -f "$TARGET" ] && [ "$TARGET" -nt "$SRC" ]; then
    echo "SKIP $REL (target newer)"
    continue
  fi

  mkdir -p "$(dirname "$TARGET")"

  vips copy "$SRC" "$TARGET[Q=55,effort=6]"

  echo "DONE $REL -> $BASE.avif"
done

# Generate WebP fallback for clients still serving WebP
find "$SOURCE_DIR" -type f \( -name "*.jpg" -o -name "*.jpeg" -o -name "*.png" \) | while read -r SRC; do
  REL=$(realpath --relative-to="$SOURCE_DIR" "$SRC")
  BASE="${REL%.*}"
  TARGET="$TARGET_DIR/$BASE.webp"

  if [ -f "$TARGET" ] && [ "$TARGET" -nt "$SRC" ]; then
    continue
  fi

  mkdir -p "$(dirname "$TARGET")"
  vips copy "$SRC" "$TARGET[Q=80]"
done

echo ""
echo "Bulk AVIF and WebP generation complete for $SITE"

14.3 Thumbnail Generation Script

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/generate-thumbs.sh
# Generate multiple thumbnail sizes for srcset and visual search surfaces.

SITE="$1"
SOURCE_DIR="/var/www/sites/$SITE/assets/images"

# Pinterest needs 2:3 vertical, others need standard ratios
SIZES=(400 800 1200 1920)
PINTEREST_HEIGHT=1500

for SRC in $(find "$SOURCE_DIR" -maxdepth 4 -type f -name "*.avif" -not -name "*-thumb-*"); do
  BASE="${SRC%.avif}"

  for SIZE in "${SIZES[@]}"; do
    TARGET="${BASE}-${SIZE}w.avif"
    [ -f "$TARGET" ] && continue
    vipsthumbnail "$SRC" --size "${SIZE}x" --output "$TARGET[Q=55]"
  done

  # Pinterest 2:3 vertical crop
  PIN_TARGET="${BASE}-pinterest.jpg"
  if [ ! -f "$PIN_TARGET" ]; then
    vipsthumbnail "$SRC" --size "1000x${PINTEREST_HEIGHT}" --smartcrop=attention --output "$PIN_TARGET[Q=85]"
  fi
done

echo "Thumbnails generated for $SITE"

14.4 Metadata Audit and EXIF Cleanup

ExifTool handles both the metadata audit and the EXIF strip-and-rewrite. The audit script counts images carrying copyright, description, and GPS tags (GPS is a privacy flag, especially for real estate listings and event photos).

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/audit-metadata.sh
SITE="$1"
IMG_DIR="/var/www/sites/$SITE/assets/images"
REPORT="/var/www/sites/$SITE/reports/visual-search/metadata-audit-$(date +%Y%m%d).txt"
mkdir -p "$(dirname "$REPORT")"
{
  echo "Metadata audit for $SITE ($(date))"
  TOTAL=$(find "$IMG_DIR" -type f \( -name "*.avif" -o -name "*.webp" -o -name "*.jpg" \) | wc -l)
  echo "Total images: $TOTAL"
  echo "With copyright: $(exiftool -if '$Copyright' -p '$FileName' -r "$IMG_DIR" 2>/dev/null | wc -l)"
  echo "With description: $(exiftool -if '$ImageDescription or $Description' -p '$FileName' -r "$IMG_DIR" 2>/dev/null | wc -l)"
  echo "With GPS (privacy review): $(exiftool -if '$GPSLatitude' -p '$FileName' -r "$IMG_DIR" 2>/dev/null | wc -l)"
} > "$REPORT"
cat "$REPORT"

The cleanup script strips all EXIF (removing GPS and camera serial numbers) then re-injects the IPTC copyright and credit fields that visual search engines parse.

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/clean-exif.sh
SITE="$1"
IMG_DIR="/var/www/sites/$SITE/assets/images"
find "$IMG_DIR" -type f \( -name "*.jpg" -o -name "*.jpeg" \) | while read -r IMG; do
  exiftool -overwrite_original -all= "$IMG" 2>/dev/null
  exiftool -overwrite_original \
    -Copyright="Copyright 2026 $SITE" \
    -CopyrightNotice="Copyright 2026 $SITE LLC" \
    -Credit="$SITE" "$IMG" 2>/dev/null
done

14.5 Alt Text and Image Sitemap Audits

Two Python audits run as the regular cron load. Alt text audit flags missing, empty, or short alt attributes across the rendered HTML.

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/audit-alt-text.sh
SITE="$1"
SITE_DIR="/var/www/sites/$SITE"
REPORT="$SITE_DIR/reports/visual-search/alt-audit-$(date +%Y%m%d).txt"
mkdir -p "$(dirname "$REPORT")"
python3 - "$SITE_DIR" "$SITE" > "$REPORT" <<'PY'
import re, sys
from pathlib import Path
site_dir = Path(sys.argv[1]); site = sys.argv[2]
img_re = re.compile(r'<img[^>]*>', re.I)
alt_re = re.compile(r'alt\s*=\s*"([^"]*)"', re.I)
src_re = re.compile(r'src\s*=\s*"([^"]*)"', re.I)
counts = {"total":0,"missing":0,"empty":0,"short":0,"ok":0}
issues = []
for f in site_dir.rglob("*.html"):
    if "node_modules" in str(f) or ".next" in str(f): continue
    try: c = f.read_text(encoding="utf-8", errors="ignore")
    except: continue
    for tag in img_re.findall(c):
        counts["total"] += 1
        a = alt_re.search(tag); s = src_re.search(tag)
        src = s.group(1) if s else "no-src"
        if not a:
            counts["missing"] += 1
            issues.append(f"MISSING {f.relative_to(site_dir)} src={src}")
        else:
            alt = a.group(1).strip()
            if not alt: counts["empty"] += 1; issues.append(f"EMPTY {f.relative_to(site_dir)} src={src}")
            elif len(alt.split()) < 5: counts["short"] += 1; issues.append(f"SHORT {f.relative_to(site_dir)} alt='{alt}'")
            else: counts["ok"] += 1
print(f"Alt audit for {site}:", counts)
for i in issues[:100]: print(i)
PY
cat "$REPORT"

The sitemap generator walks rendered HTML and emits sitemap-images.xml to the site root for submission to Search Console.

#!/bin/bash
# /var/www/sites/[domain]/scripts/visual-search/generate-image-sitemap.sh
SITE="$1"; DOMAIN="$2"
SITE_DIR="/var/www/sites/$SITE"
python3 - "$SITE_DIR" "$DOMAIN" > "$SITE_DIR/sitemap-images.xml" <<'PY'
import re, sys
from pathlib import Path
site_dir = Path(sys.argv[1]); domain = sys.argv[2]
img_re = re.compile(r'<img[^>]*>', re.I)
src_re = re.compile(r'src\s*=\s*"([^"]*)"', re.I)
alt_re = re.compile(r'alt\s*=\s*"([^"]*)"', re.I)
print('<?xml version="1.0" encoding="UTF-8"?>')
print('<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"')
print('        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">')
for f in sorted(site_dir.rglob("*.html")):
    if "node_modules" in str(f) or ".next" in str(f): continue
    rel = f.relative_to(site_dir)
    path = "/" + str(rel).replace("index.html", "")
    if path.endswith("/") and len(path) > 1: path = path[:-1]
    try: c = f.read_text(encoding="utf-8", errors="ignore")
    except: continue
    imgs = []
    for tag in img_re.findall(c):
        sm = src_re.search(tag); am = alt_re.search(tag)
        if not sm: continue
        src = sm.group(1)
        if src.startswith("/"): src = f"https://{domain}{src}"
        elif not src.startswith("http"): continue
        imgs.append((src, am.group(1) if am else ""))
    if not imgs: continue
    print(f"  <url><loc>https://{domain}{path}</loc>")
    for src, alt in imgs:
        esc = alt.replace("&","&amp;").replace("<","&lt;").replace(">","&gt;")
        print(f"    <image:image><image:loc>{src}</image:loc><image:caption>{esc}</image:caption></image:image>")
    print("  </url>")
print('</urlset>')
PY

14.6 Cron Schedule

The recommended cron schedule for the visual search toolchain:

# /etc/cron.d/visual-search-toolchain

# Bulk AVIF regeneration nightly per client
0 2 * * * user /var/www/sites/eurekabathworks.com/scripts/visual-search/generate-avif.sh eurekabathworks.com
0 3 * * * user /var/www/sites/heritagehardwoodfloors.com/scripts/visual-search/generate-avif.sh heritagehardwoodfloors.com

# Sitemap regeneration weekly
0 4 * * 0 user /var/www/sites/eurekabathworks.com/scripts/visual-search/generate-image-sitemap.sh eurekabathworks.com eurekabathworks.com

# AI referral counting weekly
0 5 * * 1 user /var/www/sites/eurekabathworks.com/scripts/visual-search/count-ai-image-referrals.sh eurekabathworks.com >> /var/www/sites/eurekabathworks.com/reports/visual-search/ai-referrals.log

# Alt text audit monthly
0 6 1 * * user /var/www/sites/eurekabathworks.com/scripts/visual-search/audit-alt-text.sh eurekabathworks.com

Adjust paths per client. Each client has its own copy of the scripts under /var/www/sites/[domain]/scripts/visual-search/.

14.7 Nginx Configuration For Image Delivery

The nginx site configuration must allow AI engine user agents and set cacheable headers on image assets.

# /etc/nginx/sites-available/[domain]
server {
    listen 443 ssl http2;
    server_name [domain];
    root /var/www/sites/[domain];

    # Image assets cached aggressively
    location ~* \.(avif|webp|jpg|jpeg|png|gif|svg)$ {
        expires 1y;
        add_header Cache-Control "public, immutable";
        add_header X-Content-Type-Options "nosniff";
        # No Referer blocking; AI engines must be able to load
        # No hotlink protection; image citation depends on embed access
    }

    # Image sitemap
    location = /sitemap-images.xml {
        add_header Cache-Control "public, max-age=3600";
        try_files $uri =404;
    }

    # User-Agent allow list verified for visual search bots
    # GoogleOther, OpenAI-SearchBot, PerplexityBot, ChatGPT-User
    # all served normally; no User-Agent blocking
}

14.8 Toolchain Maintenance Notes

libvips updates rarely; pin via apt-get hold if reproducibility is needed. ExifTool updates frequently to support new camera formats and is safe to keep at latest. Python scripts depend only on stdlib (no pip requirements file). All bash scripts use bash, not zsh or POSIX sh. All paths absolute; cron context is empty. Logs to /var/www/sites/[domain]/reports/visual-search/ (not /tmp) so they survive restarts.


End of Framework

Companion documents:

Want this framework implemented on your site?

ThatDevPro ships these frameworks as productized services. SDVOSB-certified veteran owned. Cassville, Missouri.

See Engine Optimization service ›