Sumit Aryal
Computer Engineer
About Me
I am a Machine Learning Engineer specializing in NLP and LLM, with expertise in architecting production-grade AI systems including RAG workflows, document processing pipelines, and recommendation engines.
I have proven ability to design, containerize, and scale solutions using cloud native stacks (AWS/Kubernetes/Docker). I am passionate about bridging research and operations, from developing novel NLP models for low-resource languages to implementing MLOps practices for AI product delivery.
I also aim to contribute to AI research by leveraging my expertise in Natural Language Processing, Computer Vision, and Large Language Models to create impactful solutions for real-world challenges.
Work Experience
Software Engineer I, Machine Learning
Smart Data SolutionsNovember 2025 - Present
Eagen, MN, USA
- Structured Data Extraction: Developed document processing pipelines using Vision Language Models (VLMs) with custom processing logic for structured data extraction from variable-layout PDFs and scanned images; optimized high-throughput inference with vLLM for low-latency deployment.
- Customer Facing Chatbot: Building a Retrieval-Augmented Generation (RAG) application powering a customer-facing chatbot; integrating vector retrieval, conversational memory, and LLM reasoning to provide accurate, context-aware responses for client inquiries.
Machine Learning Engineer
Root Level AIJanuary 2025 - October 2025
Kathmandu, Nepal
- Engineered an Agentic RAG system: Designed and studied retrieval augmented generation for multi-turn support dialogs. Built a reproducible benchmark suite with synthetic and human-labeled queries, evaluated dense plus sparse hybrid retrieval with cross-encoder reranking, and conversational memory. Reported Recall@k, MRR, and end-to-end latency.
- Retrieval Systems at Scale: Operated a multitenant, sharded Qdrant cluster with replication and write consistency. Studied shard key strategies and tenant-scoped payload indexes for fast filters.
Machine Learning Engineer
DoriITApril 2024 - January 2025
Kathmandu, Nepal
- LLM Integration: Created RAG assistants using OpenAI and Gemini; automated content generation tasks like summaries and descriptions using LLMs.
- Text Analytics & Mentoring: Developed sentiment analysis and NER systems using HuggingFace Transformers; mentored 2 interns through dataset creation to BERT fine-tuning and evaluation.
AI Fellow
FusemachinesJanuary 2023 - August 2023
Kathmandu, Nepal
- Intensive ML/DL curriculum covering regression, neural networks, and transformer-based architectures (BERT); built sentiment analysis, text classification, and image segmentation models.
QA Trainee
Bajra TechnologiesSeptember 2022 - December 2022
Kathmandu, Nepal
- Executed test cases for web applications, automated end-to-end testing with Cypress, and conducted API and load testing using Postman and JMeter.
Education
Pulchowk Campus, IOE, Tribhuvan University
Lalitpur, Kathmandu
Bachelors in Computer Engineering
November 2019 - April 2024
Publications
C = Conference, J = Journal, S = In Submission, T = Thesis
BERT-Based Nepali Grammatical Error Detection and Correction Leveraging a New Corpus
Sumit Aryal, et al. (2024). Presented at IEEE INSPECT-2024, ABV-IIITM, Gwalior, India, December 07-08, 2024.
Nepali Grammar Correction
Sumit Aryal, et al. (2024). Undergraduate Thesis, Pulchowk Engineering Campus, Institute of Engineering, Tribhuvan University.
Selected Open Source Contributions
2025
- Qdrant Sharding: Enabled distributed Qdrant deployments for LlamaIndex with configurable sharding and replication; write consistency controls in QdrantVectorStore; custom shard-key across create/add/delete/query; cluster-aware fixtures; sync/async tests. PR #19652.
- Payload field indexes: Implemented payload_indexes creation in both sync and async flows with fixtures and tests. PR #19743.
- Memory Component: Added missing synchronous wrapper for put_messages in the memory component to align sync and async interfaces. PR #19746.
2025
- Python Agents SDK example: Ready-to-run example setting up an agent with local gpt-oss models, connecting a filesystem MCP server, defining a custom weather tool, and streaming agent responses. PR #14.
Projects
Research Agent
Async research agent with planning, search, reflection, and synthesis. Features an async state machine running concurrent searches with adaptive concurrency and reflection cadence, citation extraction from grounding metadata, and adaptive thinking budgets for synthesis.
RAG Project
Retrieval pipeline with OCR ingest and hybrid vector search. FastAPI service with Postgres and Qdrant, OCR text extraction via Gemini, structured chunking, and dual dense plus sparse embeddings. Hybrid search fuses results using Reciprocal Rank Fusion, returning grounded answers with cited snippets.
Nepali Grammatical Error Correction
As part of my bachelor thesis I built a BERT-based Nepali grammar correction (Nepali GEC) pipeline. I curated a large parallel corpus for the Nepali grammatical error correction task and implemented a system that ingests Nepali text, detects grammatical errors, and proposes context-aware corrections.
HTML Parser using LLM
Developed an API for extracting e-commerce attributes from HTML content. Uses `meta-llama/Meta-Llama-3-8B-Instruct` from Hugging Face's Inference API. Extracts attributes like name, price, description, and images from HTML.
Travel Recommendation System
Developed a travel recommendation web application that generates personalized itineraries for travelers to Nepal based on their preferences and budget using React, Django, Python, and Flask. Implemented collaborative filtering to enhance recommendations.
ML and DL Repository
Developed and maintained a repository of machine learning and deep learning algorithms, including CNN, Linear and Logistic Regression, Decision Trees, and advanced applications like Image Segmentation and Reconstruction.
8 Puzzle Visualizer
Implemented and visualized different algorithms, such as A*, BFS, DFS, IDDFS, and Greedy to solve the 8-puzzle problem using Python and Tkinter.
Bachiyo Game
Mario-like platformer game with various levels and sound effects created using C++ and SFML.
Image Compression
Compressed images using Huffman Tree Algorithm in C++.
Stadium Modeling
Modeled a stadium using Python, Pygame, and Blender.
Skills
Core Languages
ML/DL Stack
Natural Language Processing
ML Infrastructure & MLOps
Data Systems
API & Deployment
Specializations
Soft Skills
Honors & Awards
Best Project Award
Pulchowk Campus, IOE
December 2024
Recognized for excellence in developing "Nepali Grammatical Error Detection and Correction System", an innovative NLP system that addresses the significant challenge of automated grammar correction in the Nepali language using BERT-based models and a novel corpus.
Certifications
Professional Memberships
Nepal Engineering Council
Professional Engineering Body
October 2024 - Present
Active member of Nepal's premier engineering professional body, committed to maintaining high standards of engineering practice and professional development.
Resume / CV
Having trouble viewing the resume? Download it or open it in a new tab using the buttons below.