Skip to main content

Pinecone Vector Search

The platform vectorizes all managed site data into Pinecone, enabling semantic search and AI-powered queries across the entire site portfolio.


What's Indexed

Each managed site is represented as a vector containing:

DataExample
Site name"ScentLok Retail"
URL"https://scentlok.com"
Platform"WordPress"
Server"WP Engine"
Organization"Nexus Outdoors"
Service tier"Premium"
E-commerce flagYes/No
SSL statusValid/Invalid
Google indexedYes/No
Debug modeOn/Off
Plugin count34

This data is combined into a text summary and converted to a 768-dimensional vector using Google's Gemini embedding model.


Auto-Sync Behavior

When Does It Sync?

Vectors update automatically whenever a site's data changes:

  1. Cron cycle — Every 5 minutes, site-updates.php processes 15 sites
  2. Site updated — The suma_site_updated hook fires
  3. Pinecone auto-sync — Captures the hook and re-vectorizes the site
  4. Vector upserted — Updated vector replaces the old one in Pinecone

Timing

  • After a site sync: vector updates within seconds
  • Full portfolio sync: 4–6 hours (all sites through the queue)
  • A site's vector is always as fresh as its last sync

Manual Sync

Sync All Sites

Triggers re-vectorization of every active site:

  1. Navigate to Suma Management → Pinecone Settings
  2. Click Sync All
  3. Wait for completion (processes sequentially — may take several minutes)
  4. Completion message shows count of synced vectors
note

Syncing all sites makes API calls for each site (embedding generation + Pinecone upsert). With 50+ sites, this takes 5–10 minutes.

Sync Single Site

Re-vectorize one specific site:

  1. Navigate to the site's detail page
  2. Click the Pinecone Sync button
  3. Immediate feedback on success/failure

Viewing Stats

The Pinecone Settings page displays:

StatDescription
Total VectorsNumber of site vectors in the namespace
Namespacemanage-rhinogroup
Dimensions768
ConnectionWhether Pinecone is reachable

Testing Connection

  1. Navigate to Pinecone Settings
  2. Click Test Connection
  3. The system verifies:
    • API key is valid
    • Index is accessible
    • Namespace exists
    • Can read stats

Success: Shows vector count and index info.
Failure: Shows error message (invalid key, network issue, etc.)


Understanding Embeddings

What Is a Vector Embedding?

A vector embedding is a numerical representation of text that captures its meaning. Similar concepts have similar vectors, enabling "semantic search" — finding things by meaning rather than exact keywords.

How It Works Here

Site Data (text) → Gemini Embedding API → 768 numbers → Pinecone Storage

Query (text) → Gemini Embedding API → 768 numbers → Similarity Search

Most Similar Sites

Practical Example

Searching for "hunting apparel with e-commerce" would find:

  • ScentLok (hunting clothing, WooCommerce)
  • Blocker Outdoors (hunting gear, BigCommerce)
  • Not: Rhino Group Documentation (no e-commerce, no hunting)

Metadata & Filtering

Each vector stores metadata that enables filtered searches:

FilterUse Case
platform = "WordPress"Only search WordPress sites
ecommerce = trueOnly e-commerce sites
organization = "GSM Outdoors"Only one client's sites
ssl = falseFind sites with SSL issues

Who Uses This Data?

AI Assistant (Burt)

The osTicket AI assistant queries Pinecone to find relevant site context when answering questions about managed sites.

Future Search Features

The vector index enables planned features like:

  • Natural language site search ("find BigCommerce sites on WP Engine")
  • AI-powered recommendations
  • Automated site categorization

Settings

SettingDefaultDescription
API KeyPinecone API key from dashboard
HostIndex host URL
Namespacemanage-rhinogroupIsolation namespace
Auto-SyncEnabledSync on every site update
Gemini KeyFor embedding generation (falls back to Suma Gemini key)

Troubleshooting

"Connection Failed" on Test

  • Verify API key is correct (copy from Pinecone dashboard)
  • Check Host URL includes full domain (e.g., rhino-tickets-xxx.svc.aped-xxx.pinecone.io)
  • Ensure network allows outbound HTTPS to Pinecone

Vectors Not Updating

  • Check Auto-Sync is enabled in settings
  • Verify the suma_site_updated hook is firing (check debug.log)
  • Try manual sync for one site to isolate the issue
  • Check Gemini API key hasn't expired

Vector Count Doesn't Match Site Count

  • Archived sites have their vectors deleted
  • New sites need one sync cycle before vectorization
  • Development sites are included (toggle in settings)
  • Check for failed syncs in debug.log