Literature Survey: Retrieval Methods¶
This section contains detailed surveys and analyses of individual papers related to Stage 1 retrieval methods, including dense retrieval, sparse retrieval, hybrid approaches, and hard negative mining strategies.
Papers:
- Hard Negative Mining: A Deep Dive for Building a Unified Mining Library
- Motivation: The Gap in the Ecosystem
- How ColBERTv2 Uses Hard Negative Mining
- Existing Implementations Analyzed
- Theoretically-Grounded Mining Methods
- Principled Sampling Methods
- Synthetic Generation Methods
- Quantitative Impact of Mining Strategies
- MUVERA: Multi-Vector Efficiency (Not Mining)
- Proposed Library Design
- Detailed Library Implementation Analysis
- Summary
- References
Overview¶
The papers in this section provide in-depth technical analysis of key contributions to the retrieval literature. Each survey includes:
Problem formulation and mathematical foundations
Algorithmic innovations with theoretical guarantees
Empirical results on standard benchmarks
Practical considerations for deployment
Connections to other methods in the retrieval-reranking pipeline
Featured Papers¶
- Hard Negative Mining: A Deep Dive for Building a Unified Mining Library
Comprehensive analysis of hard negative mining strategies including ColBERTv2’s training pipeline, theoretically-grounded methods (ANCE, ADORE, RocketQA), and existing implementations (Contrastors, PyLate). Includes proposed API design for a unified mining library analogous to the
rerankerslibrary.
Topics Covered¶
Dense Retrieval: DPR, ANCE, Contriever, and embedding-based methods
Sparse Retrieval: BM25, SPLADE, and learned sparse representations
Hard Negative Mining: Dynamic mining, curriculum learning, false negative handling
Hybrid Methods: Combining dense and sparse retrieval for robustness
Pre-training: ICT, contrastive pre-training, and domain adaptation
Contributing¶
To add a new paper survey to this section:
Create a new
.rstfile following the structure of existing surveysInclude: problem statement, core innovation, theoretical analysis, empirical results
Add the file to the toctree above
Ensure proper citations and links to related papers