Been expanding my studies on LLMs using the Qwen3 models and using my own consumer grade hardware (AMD 8845HS, 32 GB of RAM, and RTX 5060Ti w/ 16 GB VRAM). Been focus on RAG these past few weeks, and this is how it is presently implemented:
ChromaDB -> Embedding (Qwen3-0.6b) -> Reranker (BGE-v2-m3) -> Ollama (Qwen3-14b)
When a query is asked, it gets 10 results from the ChromaDB (using the Embedding model), re-ranked, and reduced into the top 3 results, and finally gets fed into the LLM as the context, in addition to the query. This is typical RAG optimization.
The interesting part here is what I think is the "recall problem", where LLMs have diffculty in recalling information, especially when the context gets large.
As a concrete example, after re-ranking and reducing the context to 3 documents, it was concatenated in ascending order, i.e., the document with the highest score is at the start of the concatenation, while the document with the lowest score is at the tail end. Here is the result when I asked it about RTO policy.
The information provided in the context does not mention Company's **Recovery Time Objective (RTO)** policy or any details related to disaster recovery, system restoration timelines, or business continuity planning.
To obtain accurate information about Company's RTO policy, you would need to consult Company's official documentation, service level agreements (SLAs), or contact their support team directly. The context shared here focuses on topics like debit priorities, financial services, and operational considerations, but not on technical resilience or recovery metrics
Now, I know for a fact that the context (or 3 top documents) DOES contain RTO information, but the LLM "forgot" about it. So, I created a new context by concatenating the 3 documents, BUT in reverse order, i.e., the lowest score is at the start, while the top score is at the end... and here's the result
The provided context does not explicitly outline Company's formal **Return to Office (RTO)** policy. However, it highlights the following key points about the RTO transition and employee experiences:
1. **Employee Experiences**:
- Employees shared **mixed feedback** about the RTO transition, including both **positive experiences** and **personal challenges** (e.g., logistical, cultural, or work-life balance concerns).
- Managers are described as addressing employee inquiries with **empathy and flexibility**, suggesting a focus on adaptability during the transition.
2. **Attendance Tracking Concerns**:
- On platforms like Blind, employees raised concerns about **attendance tracking practices**, such as worries that employees might "badge in" (log into the office) without actively working. This implies potential scrutiny around compliance with RTO expectations, though specific policy details (e.g., required days in the office, hybrid models, etc.) are not mentioned.
3. **No Formal Policy Details**:
- The context does not specify whether Company’s RTO policy is fully remote, hybrid, or fully in-office, nor does it mention timelines, eligibility criteria, or other structural details.
### Recommendation:
For precise details about Company’s RTO policy, employees or stakeholders should refer to **official internal communications**, **HR resources**, or **public statements** from Company leadership. The context provided reflects employee sentiment and managerial responses rather than the policy’s formal structure.
I thought that was really interesting, and context engineering is probably a real thing.
Updated... added a simple web interface to my RAG app. It's doesn't keep the previous chat conversations (that is to make it conversational), but I'll add that feature later on.

No comments:
Post a Comment