报告题目:Advancing Exploration in Reinforcement Learning: Toward Practical Embodied Control
报告时间:2025年12月22日14:00
报告地点:85porn
B405会议室
报告人:余亮豪
报告人国籍:中国
报告人单位:澳门大学

报告人简介:余亮豪现任澳门大学计算机与信息科学系副教授、数据科学中心主任及智慧城市物联网国家重点实验室设备管理人。其研究涵盖交通数据优化、时空数据库、大型数据可视化、图神经网络学习、多智能体强化学习及众包计算等人工智能与数据工程交叉领域。团队已于 SIGMOD、VLDB、ICDE、NeurIPS、AAAI、ICLR、IJCAI、KDD 等顶级期刊与会议发表论文逾80篇,其中50余篇为第一或通讯作者。近年团队积极参与多项区域与国家重点研发项目,包括:国家重点研发计划:「面向城市公共服务的高效融合与动态认知技术与平台」,提出大规模数据可视化技术并建构开源工具;澳门科技基金重点研发专项:「协同智能驱动的无人驾驶关键技术与平台」;2024年专案:「城市交通感知融合与智能推演技术及应用」,荣获科技发明二等奖。此外,团队成员长期活跃于国际学术社群,曾担任 BigData、IJCAI、ICDE、DASFAA、PAKDD 等多个国际顶会的程序委员会主席、本地组织委员或委员会成员,并于2020年起担任中国青年科技工作者协会信息与电子科学专业委员会委员、澳门特区城市规划委员会委员等职务,积极推动科研与城市发展政策融合。
报告摘要:Exploration remains a key barrier to deploying reinforcement learning in realistic embodied settings, where agents must act under high-dimensional visual observations, sparse and delayed rewards, and often overactuated control interfaces. This talk presents a line of research that makes exploration more practical and scalable by progressively introducing structure into both representation and intrinsic motivation. We first revisit metric-based intrinsic bonuses and propose an effective discrepancy metric with adaptive scaling to improve robustness on hard exploration benchmarks. We then move beyond raw novelty by learning compact representations in a behavioral metric space and rewarding value-diverse, behaviorally distinct trajectories for scalable exploration in high-dimensional environments.
To address long-horizon embodied tasks, we introduce latent “foresight” via diffusion-based self-prediction and a latent-space exploration reward, demonstrating gains in navigation/manipulation and real-world indoor deployment.
Finally, for overactuated musculoskeletal control, we discover disentangled synergy patterns and learn policies entirely in a synergy-aware latent action space, improving efficiency and generalization.
邀请人:王皓
