Research Engineer - Vision Language Action Models for Intelligent Cyber Physical Systems | Research Engineer - Vision Language Action Models for Intelligent Cyber Physical Systems (f/m/div.)

Robert Bosch GmbH

Renningen, Baden-Württemberg, Deutschland

Published Mar 4, 2026

Full-time

No information

Job Summary

As a Research Engineer at Bosch, you will lead the development of cutting-edge Vision-Language-Action (VLA) architectures, enabling AI agents to interpret human instructions and act autonomously within complex physical environments. Your day-to-day work involves connecting multimodal representation learning with long-term control and planning to move beyond reactive AI toward cognitive intelligence. You will be responsible for building scalable infrastructure for training and deployment, including simulation tools and evaluation methods. This role is unique as it bridges the gap between fundamental research and practical industrial application, allowing you to implement advanced AI methods in robotics, automated driving, and smart building systems. You will collaborate with interdisciplinary teams to shape Bosch's long-term strategy for intelligent automation, ensuring that research prototypes are transformed into robust, explainable, and semantically grounded solutions for real-world cyber-physical systems.

Required Skills

Education

Excellent Master's degree in Computer Science, Machine Learning, Robotics, or related technical fields; PhD in Multimodal AI, Robotics, Reinforcement Learning, or Generative AI preferred.

Experience

Multiple years of experience in developing and deploying machine learning solutions in distributed software development teams
Demonstrated industrial software development experience through code contributions in large-scale machine learning projects or benchmarks
Proven track record of academic excellence with publications in leading AI and robotics conferences (e.g., NeurIPS, CVPR, ICRA)
Experience in designing and training multimodal architectures such as Flamingo, GPT-4V, or RT-2
Hands-on experience with visual grounding, cross-modal attention, and instruction-following architectures
Experience with cloud infrastructure and multi-GPU training pipelines

Languages

German (Basic)English (Fluent)

Additional

Submission of GitHub or Kaggle profile links is requested. Role involves driving research into practical innovation across interdisciplinary teams.

Research Engineer - Vision Language Action Models for Intelligent Cyber Physical Systems | Research Engineer - Vision Language Action Models for Intelligent Cyber Physical Systems (f/m/div.)

Robert Bosch GmbH

Job Summary

Required Skills

Education

Experience

Languages

Additional

More Jobs from Robert Bosch GmbH

Control and Instrumentation Technician - Building Automation | MSR-Techniker (w/m/div.) Gebäudeautomation

I&C Technician for Building Automation | MSR-Techniker (w/m/div.) Gebäudeautomation

Software Developer | Softwareentwickler (m/w/div.)