Back to Blog

The Technical Architecture Behind PickItBox's DINOv2 Biometric Engine

A deep dive into how we use Hugging Face, DINOv2, and Supabase pgvector to execute sub-second similarity searches across millions of pets.

Posted by

Processing Images at Scale

Biometric pet identification differs entirely from standard classification tasks. We aren't trying to determine "is this a dog?"—we need to determine "is this *specifically* Max the Golden Retriever?"

The DINOv2 Pipeline

We utilize an adaptation of Meta's DINOv2 (self-supervised vision transformer). When an image arrives at our endpoints via the Hugging Face Inference API, the model strips away background noise and extracts a highly dense 768-dimensional vector embedding representing the unique topological features of the animal's face or nose.

Vector Search with pgvector

Storing massive arrays of floating-point numbers is useless without rapid retrieval. PickItBox leverages PostgreSQL extended with pgvector. We utilize HNSW (Hierarchical Navigable Small World) indexing to perform approximate nearest neighbor searches.

-- Our core matching RPC function
CREATE OR REPLACE FUNCTION match_biometrics(
  query_embedding vector(768),
  match_threshold float,
  match_count int
)
RETURNS TABLE (
  pet_id uuid,
  similarity float
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    biometrics.pet_id,
    1 - (biometrics.embedding <=> query_embedding) AS similarity
  FROM biometrics
  WHERE 1 - (biometrics.embedding <=> query_embedding) > match_threshold
  ORDER BY biometrics.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;
The Technical Architecture Behind PickItBox's DINOv2 Biometric Engine