Stage 2: Re-ranking Methods
============================

This section covers methods for the second stage of the RAG pipeline: precisely scoring 
the candidate documents retrieved in Stage 1.

.. toctree::
   :maxdepth: 2
   :caption: Stage 2 Topics:

   cross_encoders
   llm_rerankers
   rerankers_survey
   literature_survey/index

Overview
--------

Stage 2 re-ranking focuses on **precision over speed**. Since the candidate set is small 
(typically 10-1000 documents), we can afford more expensive computations to get highly 
accurate relevance scores.

Why Re-ranking is Needed
-------------------------

**The Stage 1 Limitation**

Bi-encoders (Stage 1) encode query and document *independently*:

* No interaction between query and document tokens
* Can't perform complex reasoning about relevance
* Limited to similarity in embedding space

**The Stage 2 Solution**

Re-rankers encode query and document *jointly*:

* Full attention between all query-document token pairs
* Can perform complex relevance reasoning
* Much higher accuracy at cost of speed

The Accuracy Gain
^^^^^^^^^^^^^^^^^

Typical improvements when adding Stage 2:

.. code-block:: text

   Dataset: MS MARCO Dev (1000 candidates from Stage 1)
   
   Bi-encoder only:        MRR@10 = 0.311
   + Cross-encoder:        MRR@10 = 0.389  (+25% improvement!)
   
   Dataset: Natural Questions
   Bi-encoder only:        Top-10 Accuracy = 0.68
   + Cross-encoder:        Top-10 Accuracy = 0.81  (+19% improvement!)

**Key Insight**: Re-ranking the top-100 with cross-encoder provides massive gains for 
just 100 forward passes (~5 seconds).

Architecture Types
------------------

Cross-Encoders
^^^^^^^^^^^^^^

**Most Common**: BERT-based cross-encoder

* Concatenates: ``[CLS] query [SEP] document [SEP]``
* Self-attention across all tokens
* Classification head predicts relevance
* Highest accuracy

**Variants:**

* MonoBERT: BERT cross-encoder for binary classification
* MonoT5: T5 model generates "true"/"false" token
* RankT5: T5 generates relevance score directly
* RankLlama: Large language model fine-tuned for ranking

Poly-Encoders
^^^^^^^^^^^^^

**Middle Ground**: Faster than cross-encoder, better than bi-encoder

* Document → Multiple learned "codes" (e.g., 64 codes)
* Query attends to codes
* Much faster than cross-encoder (can pre-compute codes)

LLM Re-rankers
^^^^^^^^^^^^^^

**Latest Trend**: Zero-shot re-ranking with instruction-tuned LLMs

* Prompt LLM: "Is this passage relevant to this query?"
* No training needed
* Can provide explanations
* Expensive but highly effective

Organization of This Section
-----------------------------

**Cross-Encoders** (:doc:`cross_encoders`)

* Traditional BERT-based re-rankers
* MonoT5 and RankT5
* Training strategies
* Implementation guide

**LLM Re-rankers** (:doc:`llm_rerankers`)

* Zero-shot prompting approaches
* RankGPT, RankLlama
* Listwise vs pointwise ranking
* Cost-performance trade-offs

When to Use Stage 2
-------------------

✅ **You Need Stage 2 When:**

* Top-10 accuracy is critical (user sees only first page)
* False positives are costly (e.g., medical, legal)
* You can afford 1-10 second latency
* Final answer quality >> speed

❌ **You Can Skip Stage 2 When:**

* Latency must be < 100ms (real-time autocomplete)
* Top-100 recall is all that matters (no precision needed)
* Very simple queries (BM25 or bi-encoder sufficient)
* Candidates from Stage 1 are already very precise

The Two-Stage Pipeline
-----------------------

**Standard Configuration:**

.. code-block:: python

   # Stage 1: Fast retrieval (100-1000 candidates)
   bi_encoder = SentenceTransformer('BAAI/bge-base-en-v1.5')
   candidates = bi_encoder.search(query, corpus, top_k=100)
   
   # Stage 2: Precise re-ranking (top-10)
   cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
   reranked = cross_encoder.rerank(query, candidates, top_k=10)

**Cost Analysis:**

* Stage 1: 10M docs × 0.00001s = 0.1s (with FAISS)
* Stage 2: 100 docs × 0.05s = 5s (cross-encoder)
* **Total: ~5s** (vs ~140 hours if cross-encoder on full corpus!)

Next Steps
----------

* See :doc:`cross_encoders` for traditional BERT-based re-rankers
* See :doc:`llm_rerankers` for modern LLM-based approaches
* See :doc:`rerankers_survey` for a comprehensive survey of 22+ reranking methods
* See :doc:`../stage1_retrieval/late_interaction` for ColBERT (can replace both stages)