RAG & Recherche sémantique
Combiner Django/DRF + vector store (pgvector/FAISS/Weaviate) + LLM pour des réponses sourcées et contextualisées.
DRF · Postgres+pgvector · OpenAI/Mistral ROI: élevé (support, self-service) Effort: moyen+ Complexité: moyenne
Architecture de référence
Client → DRF /api/query → Guard (allowlist + rate limit) → Embed(q) → VectorDB (pgvector IVFFLAT) kNN → topK docs → Rerank (optionnel) → Prompt template (citations) → LLM → Réponse + sources → Cache
Quick Wins
Pièges & Anti-patterns
SQL — extension + index sql
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE ia_document (
id SERIAL PRIMARY KEY,
title TEXT, url TEXT, body TEXT,
embedding vector(1536)
);
-- IVFFLAT (nlist à ajuster selon volume)
CREATE INDEX ON ia_document USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
ANALYZE ia_document;models.py — Document (django-pgvector) python
from django.db import models
from pgvector.django import VectorField
class Document(models.Model):
title = models.CharField(max_length=300)
url = models.URLField(blank=True, null=True)
body = models.TextField()
embedding = VectorField(dimensions=1536, null=True)
created_at = models.DateTimeField(auto_now_add=True)KPIs à suivre
- Latence kNN (p95)
- R@k / MRR
- Taux de réponses ‘avec sources’
- CSAT/Thumbs-up
management command — indexation python
from django.core.management.base import BaseCommand
from ia.models import Document
class Command(BaseCommand):
help = 'Indexe/actualise les embeddings'
def handle(self, *args, **opts):
for doc in Document.objects.filter(embedding__isnull=True)[:1000]:
# text = chunker(doc.body)
# emb = openai.embeddings.create(model='text-embedding-3-large', input=text).data[0].embedding
emb = [0.0]*1536 # placeholder
doc.embedding = emb
doc.save()views.py — endpoint /api/query python
from rest_framework.decorators import api_view
from rest_framework.response import Response
from django.db import connection
@api_view(['POST'])
def semantic_query(request):
q = request.data.get('q','')[:500]
if not q or len(q) < 3:
return Response({'error': 'empty'}, status=400)
# emb = client.embeddings.create(model='text-embedding-3-large', input=q).data[0].embedding
emb = [0.0] * 1536 # placeholder
with connection.cursor() as cur:
cur.execute(
'SELECT id, title, url, body '
'FROM ia_document '
'ORDER BY embedding <=> %s '
'LIMIT 6',
[emb],
)
rows = cur.fetchall()
docs = [
{'id': r[0], 'title': r[1], 'url': r[2], 'body': r[3][:800]}
for r in rows
]
answer = 'Réponse (démo) basée sur les documents les plus proches.'
return Response({'answer': answer, 'sources': docs, 'confidence': 0.62})KPIs à suivre
- % réponses avec ≥2 sources
- Taux de refus légitimes
- p95 latence end-to-end
- Coût / 1k requêtes
Prochaines étapes : Ajouter rerank cross-encoder, filtres par métadonnées et évaluation offline (R@k, MRR) sur un set d’or.
