Retrieval Augmented SQL Generator for Hard-to-Find Data
Introduction
It is very common for large organizations to accumulate unnecessary data. Sometimes tables will have hundreds of columns with unclear titles, data will be siloed across departments, and no subject matter expert exists. This means that it can take people a very long time to craft even simple queries. During my most recent rotation at Scotiabank, I had the opportunity to work on an application to help users find data. I designed and implemented the entire frontend, secured endpoints, and created a feedback loop.
How does it work?
This tool works by employing Retrieval Augmented Generation (RAG) with an architecture similar to the diagram on the right. In essence, RAG combines a language model with an external knowledge source by first retrieving relevant documents or data based on a user’s query. The model then uses that retrieved context to generate a more accurate response and in this case gives it access to important database schemas without the need for fine-tuning.

Interface
*all information and generated sql in this image were fabricated for demonstration purposes