Robot navigation in an unknown environment is a challenge task, due to the lack of spatial awareness and semantic understanding of the environment. Previous works mostly rely on learning-based approaches, which need large amount of training data and lack of generalization ability. The emergence of Large Language Models (LLMs) provides a new way for semantic understanding. This paper proposes a method of LLM-based Frontiers Exploration for visual semantic Navigation (LFENav), which leverages the rich semantic prior knowledge of LLMs to find next subgoals with the input natural language instruction. Firstly, the semantic map is incrementally constructed and the frontiers are redefine from the observed RGB-D images. A prompt mechanism is designed to embody the Chain-of-Thought (CoT) merit of LLMs. We use geometric costs to compensate the information gap of LLMs in understanding the spatial layout of scenes. Based above, a novel exploration policy is designed by integrating LLM scores and geometric costs to select better frontiers worthy of exploring. Experiments on Habitat-Matterport 3D dataset shows that the success rate of this method is up to 0.638, which is the best performance compared with the existing methods. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.