EngineeringMay 12, 2024

The Technical Architecture Behind PickItBox's DINOv2 Biometric Engine

A deep dive into how we use Hugging Face, DINOv2, and Supabase pgvector to execute sub-second similarity searches across millions of pets.

Posted by

PickItBox Engineering

Processing Images at Scale

Biometric pet identification differs entirely from standard classification tasks. We aren't trying to determine "is this a dog?"—we need to determine "is this *specifically* Max the Golden Retriever?"

The DINOv2 Pipeline

We utilize an adaptation of Meta's DINOv2 (self-supervised vision transformer). When an image arrives at our endpoints via the Hugging Face Inference API, the model strips away background noise and extracts a highly dense 768-dimensional vector embedding representing the unique topological features of the animal's face or nose.

Vector Search with pgvector

Storing massive arrays of floating-point numbers is useless without rapid retrieval. PickItBox leverages PostgreSQL extended with pgvector. We utilize HNSW (Hierarchical Navigable Small World) indexing to perform approximate nearest neighbor searches.

-- Our core matching RPC function
CREATE OR REPLACE FUNCTION match_biometrics(
  query_embedding vector(768),
  match_threshold float,
  match_count int
)
RETURNS TABLE (
  pet_id uuid,
  similarity float
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    biometrics.pet_id,
    1 - (biometrics.embedding <=> query_embedding) AS similarity
  FROM biometrics
  WHERE 1 - (biometrics.embedding <=> query_embedding) > match_threshold
  ORDER BY biometrics.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;