I research Embodied AI and Cognitive Robotics, building AI systems that connect perception, language, and decision-making in real-world agents. My research develops robust, scalable approaches for intelligent scene understanding with vision-language models (VLMs) and decentralized, multi-agent learning and control to enable reliable action in complex environments.
Python
PyTorch
Habitat-Sim
ROS2
A framework that aligns remote sensing imagery with ground-level visual priors to improve robotic search efficiency using test-time adaptation.
Research on generating 3D maps for complex indoor environments that do not follow the Manhattan world assumption and using a sparse LiDAR sensor.
This project develops a high-fidelity framework for embodied object navigation by leveraging incremental 3D Scene Graphs and foundational Vision-Language Models (VLMs). By moving beyond flat occupancy maps, the system builds a hierarchical semantic representation of the world that captures objects, rooms, and their relationships. The system utilizes Knowledge-Augmented Generation (KAG) to predict structural and semantic properties of unobserved regions, enabling robots to perform complex, cross-modal search missions based on categories, natural language descriptions, or visual exemplars in completely unknown topographies.
This project explores the integration of foundational Vision-Language Models (VLMs) and Large Language Models (LLMs) to redefine the cognitive architecture of intelligent navigation. We investigated a framework that leverages the zero-shot reasoning capabilities of internet-scale foundational models alongside a structured spatial-semantic memory. This enables embodied agents to perform complex, language-driven semantic search tasks—such as finding specific objects described in everyday natural language—by reasoning over environmental uncertainty and past observations in completely novel environments.
This project develops a decentralized, learning-based framework for visibility-based pursuit-evasion in challenging outdoor environments. We focus on enabling teams of mobile agents to systematically clear contaminated spaces and capture adversarial evaders within high-density urban terrains. By integrating multi-agent reinforcement learning with advanced spatial reasoning, the system addresses the critical challenges of building-induced occlusions and limited sensor ranges, allowing for real-time coordinated maneuvers without the need for a central controller.
This project introduces MARVEL (Multi-Agent Reinforcement Learning for Constrained Field-of-View Multi-Robot Exploration), a framework for high-performance, decentralized coordination in large-scale environments. By leveraging Graph Attention mechanisms, MARVEL enables robot teams to reason about teammate intent and spatial dependencies under restricted sensing constraints. Our approach focuses on information-theoretic action pruning to optimize coverage and mission efficiency, facilitating complex collaborative maneuvers in completely unknown topographies without a central controller.
This project introduces STAR (Swarm Technology for Aerial Robotics), a modular, open-source infrastructure designed to bridge the gap between simulation and high-fidelity physical deployments. STAR integrates decentralized task allocation with robust vision-based landmark localization to manage fleets of nano-quadrotors (e.g., Crazyflies) in cluttered environments. The framework provides a high-throughput ROS 2-based communication layer and a hardware-in-the-loop (HIL) sim-to-real pipeline, enabling researchers to validate complex multi-agent algorithms, reactive obstacle avoidance, and swarm behaviors on physical robotic collectives.
This project develops a decentralized framework for context-aware navigation, enabling embodied agents to perform complex path-finding and search tasks in unknown environments without prior maps. By leveraging Graph Attention Networks to encode environmental context, the framework allows robots to reason about the global structure of a space from local observations. This enables a suite of navigation capabilities—from zero-shot exploration to adaptive prior-based path-finding—that outperform traditional geometric planners in both computational efficiency and success rate.