Ragify – Retrieval‑Augmented Generation for Smarter Student Queries

Overview

Ragify is a Retrieval‑Augmented Generation (RAG) system built to answer questions about SNDT Women's University by combining real‑time document retrieval with the generative power of GPT‑4o. The project formed the capstone of a bachelor thesis at the Usha Mittal Institute of Technology, Mumbai.

Team

Problem Statement

Students frequently struggle to locate up‑to‑date information scattered across internal PDFs, spreadsheets, and web pages. Traditional keyword search lacks context‑awareness, leading to incomplete or irrelevant answers.

Solution

  1. Embedding Pipeline Documents (PDF, Excel, HTML) are chunked and embedded with OpenAI's text‑embedding‑ada‑002 model.

  2. Vector Indexing Embeddings are stored in a FAISS index (IVF‑PQ) for millisecond‑level similarity search at scale.

  3. Query Flow

graph LR
  Q[User question] --> R[Retrieve top‑k chunks<br/>from FAISS]
  R --> C[Compose context<br/>window]
  C --> G[GPT‑4o generates answer]
  G --> A[Response to user]
  1. CLI Prototype A Python CLI orchestrates retrieval and generation, streaming answers directly in the terminal.

Results

  • Precision@5: 0.89
  • Recall@10: 0.93
  • Mean Reciprocal Rank: 0.84
  • Latency: < 100 ms per query on a 1 M‑chunk corpus.
🚀

Deployed as an internal tool, Ragify cut average information‑search time from 3 minutes to <15 seconds.

Future Work

  • Migrate the vector store to Pinecone for multi‑region replication.
  • Add conversational memory to support follow‑up questions.
  • Wrap the CLI in a web interface for campus‑wide access.

Acknowledgements

Grateful recognition to the faculty of SNDT Women's University for access to institutional documents, and to the OpenAI Researcher Access Program for API credits.

Pranav Ghoghari