Retrieval Augmented SQL Generator for Hard-to-Find Data

Challenge

It is very common for large organizations to accumulate unnecessary data. Sometimes tables will have hundreds of columns with unclear titles, data will be siloed across departments, and no subject matter expert exists. This means that it can take people a very long time to craft even simple queries.

Approach

This tool works by employing Retrieval Augmented Generation (RAG) with an architecture similar to the diagram on the right. In essence, RAG combines a language model with an external knowledge source by first retrieving relevant documents or data based on a user’s query. The model then uses that retrieved context to generate a more accurate response, and in this case, gives it access to important database schemas without the need for fine-tuning.

Contribution

  • Frontend design and implementation with HTML, CSS, TypeScript
  • Secured endpoints by generating and distributing secret tokens, and implementing validation on the backend
  • Frontend and backend implementation of a feedback loop. Collected feedback was stored and vector-embedded in MongoDB, indexed by the user’s prompt. Implemented a RAG system to retrieve the relevant feedback and warn the model of the mistakes it made before.
  • Designed a sign-in system paired with the tokens to allow for user-specific tracking

Interface

* All information and generated sql in this image were fabricated for demonstration purposes