RAG Revelations: When Embeddings Unleashed an Epic Saga

After a week of deep diving into the world of embeddings on open-webui, I’ve emerged with not only valuable technical insights but also an unexpected creative awakening. What started as a frustrating technical challenge has evolved into both an enlightening exploration of RAG capabilities and the birth of an ambitious storytelling project.

My journey began with a simple goal: get the embedding features working properly in open-webui. Initial attempts were met with failure, and even consulting Perplexity offered little help. So I reverted to first principles—manual tinkering with settings until something clicked. Success finally came when I implemented the mxbai-embed-large embedding model. I tested several alternatives afterward, but none seemed to offer significant improvements.

To properly evaluate accuracy, I needed content that would be genuinely novel to the models—something they couldn’t possibly have seen in their training data. This is where the spark of creation first ignited. I worked with Claude 3.7 Sonnet and generated a fresh sci-fi story specifically for testing purposes, but found myself unexpectedly invested in the narrative I was crafting.

The fine-tuning process involved considerable experimentation. I eventually settled on 4000-token chunks with 800-token overlaps, setting a Top k of 8 and a minimum score threshold of 0.15. These parameters provided a good balance of context retention and retrieval efficiency.

What followed was a revealing comparison across model generations. I started with smaller language models like llama3.2:4b and the newer gemma3:4B, but accuracy peaked at around 50%—essentially coin-flip territory. The real breakthrough came when I switched to more sophisticated models. Both Deepseek Reasoning and Claude Sonnet 3.7 delivered a perfect 100% accuracy rate.

The technical revelation was clear: no matter how well you optimize your embedding parameters, the inference model’s capabilities remain the limiting factor in RAG applications. It’s like having a perfectly tuned instrument that still needs a skilled musician to produce beautiful music.

But the more profound discovery was how this technical exercise had unleashed something dormant within me. The test story I created wasn’t just a means to an end—it was the beginning of something larger. Emboldened by both my technical discoveries and this creative spark, I’ve embarked on an ambitious new project—creating a saga with the scope and depth of the Lord of the Rings trilogy.

Five days in, I’m still refining the outline, but I’m increasingly captivated by the world and characters taking shape in my imagination. What began as a simple test case has blossomed into a passionate creative endeavor. I find myself thinking about plot arcs and character development during moments when I would normally be contemplating system architectures or model parameters.

This unexpected turn from technologist to epic worldbuilder feels natural, almost inevitable in retrospect. The analytical skills that help me optimize RAG systems translate surprisingly well to constructing coherent narratives and intricate story worlds. I’m even contemplating self-publication once this saga reaches completion—a goal that wasn’t even on my radar a week ago.

My week’s journey highlights something profound about the tech-creativity interface: sometimes our technical explorations lead us to discover new parts of ourselves. As my saga continues to evolve alongside my RAG implementations, I’m embracing this dual path of technological and creative discovery with equal enthusiasm.

The tools may be new, but the tinkerer’s spirit that drives innovation and the storyteller’s heart that brings worlds to life are timeless companions on this unexpected journey.

RAG Revelations: When Embeddings Unleashed an Epic Saga

Comments

Leave a Reply Cancel reply