From Concept To Execution: Operationalizing LLM and GenAI

Published

27th May 2024

Author

Team Elevation

Published

27th May 2024

Author

Team Elevation

From Concept To Execution: Operationalizing LLM and GenAI

We recently spoke with Debdoot Mukherjee (Chief Data Scientist, Head of AI and Demand Engineering at Meesho) about operationalizing Large Language Models (LLMs) in business contexts.

Debdoot has spent 16+ years crafting AI products and building out AI teams and organizations at companies such as IBM Research, ShareChat, Myntra, and Hike.

Drawing from real-world experiences, he explored the immense potential of LLMs while shedding light on the practical challenges of deploying them at scale.

From enhancing search functionality to building customer support chatbots, Debdoot discussed various use cases and shared actionable strategies for effectively integrating LLMs in business operations.

Here are some of the key highlights and takeaways from the session, drawing from all the deployments at Meesho.

1. LLMs are game-changers: They can act as powerful companions for exploring knowledge and understanding complex topics. They enable question-answering capabilities far more powerful than keyword search. LLMs are bridging the gap between languages, making knowledge and information accessible in local languages. They will disrupt entertainment with their ability to generate creative content like videos.

2. LLM applications span all areas: At Meesho, LLMs have been deployed across many areas including user growth & marketing, search, cataloging, customer support, fulfillment & logistics, seller growth, and productivity improvement. LLMs excel at scaling human-intensive workflows efficiently as well as showing promise in improving the efficacy of AI-powered products.

3. LLMs are a boon for interpreting vernacular text: In search query understanding, LLMs provided significant gains in handling long-tail queries (particularly for Meesho users from Tier 3+ cities), correcting misspellings, removing redundant words, and translating queries from Indian languages to English. This improved retrieval and ranking efficacy. Also, LLMs proved to be far superior compared to previous best-in-class models for address translation. They can intelligently figure out when to translate versus transliterate names and handle code-mixed addresses well.

4. Customer review synthesis: LLMs provide comprehensive and efficient summaries of customer reviews, allowing users to quickly gather information about products, surpassing previous NLP technologies. They extract key points and sentiments around different aspects of products from thousands of reviews.

5. Fine-grained Image comparison: Multimodal models can effectively compare images, enabling efficient product quality checks and reducing errors. Models like GPT-4o can do nuanced comparisons of color shades, embroidery patterns, etc., and catch differences that even humans miss at scale.

6. Challenges in operationalizing LLMs: Meesho initially tried using GPT-3.5 and GPT-4 completions directly with some examples and SOPs for customer support chatbots. While the responses were human-like, out-of-the-box models did not adhere well to SOPs, were inconsistent, and were prone to hallucination and jailbreaking.

7. Hybrid approach for better results: To address the limitations of LLMs, Meesho adopted a hybrid approach, combining the strengths of closed-domain task-oriented dialogue systems task-oriented dialogue systems and LLMs. A separate layer was introduced to identify user intents, making the LLM focus on generating responses based on the specific intent detected. GPT-3.5 still struggled, so they used retrieval augmentation (RAG) with examples of previously resolved cases and targeted fine-tuning using data from failure cases. This significantly enhanced performance.

8. Lessons learned: Extensive data cleaning, testing for jailbreak resilience, incremental rollout, and A/B testing were important for a production-ready system. Caching responses, load balancing across regions, and fallback models enabled better availability and reduced costs. Decompose complex tasks into smaller, manageable parts to improve efficiency and accuracy.

9. Tackling latency concerns: To deploy LLMs in latency-sensitive applications like search, the models are run asynchronously, and the results are cached since many queries repeat over time. The cached results are served in real-time.

10. Cost optimization: To balance performance and cost, consider smaller models, fine-tuning, and multi-agent systems. Multi-agent systems allow multiple specialized LLMs to collaborate and reason effectively for complex tasks

11. LLMs for marketing content: LLMs generate creative marketing content like banner images, notification text, and personalized call-to-action. Stable Diffusion with prompt engineering is popular for image generation.

The journey of operationalizing large language models, as exemplified by Debdoot's experience at Meesho, underscores the immense potential of these technologies to drive transformative gains across industries. We hope these insights offer valuable guidance for organizations seeking to harness the power of LLMs across various domains.

If you are building in AI, we'd love to chat more! To engage with our SaaS practice, please reach out to akarsh@elevationcapital.com and poorvi@elevationcapital.com.

Written by Team Elevation

Insights B2B SaaS + AI