Introduction to Robotics: A Comprehensive Guide to Robotics for Beginners (2024)

Introduction to Robotics: Robotics development has been there for decades, but even after two decades into the 21st century, we are still unable to draw the true potential of robotics, and realize that sci-fi dreams of robots seamlessly integrating into our daily lives and performing complex tasks.

One sci-fi scene that springs to mind is from the anime named PLUTO, where North No. 2, a war robot, tries to learn piano from Paul Duncan, a blind retired maestro. In the anime it was shown that North No. 2 was able to recognize and play the correct notes on the piano but Duncan always felt that North No. 2 was missing the emotional depth and expressiveness that transforms a mere performance into a moving musical experience. This struggle of North No. 2 not being able to master the nuances of that music, actually captures the problem we are currently facing in the field of robotics.

Looking at AI development, it has evolved many folds in the same time frame compared to robotics. We are witnessing the field of NLP coming up with ChatGPT, LLMs, the vision community building models that are able to generate photorealistic images, videos. But, in the field of robotics, the robots are still struggling at tasks as simple as opening a door.

This makes you ask yourself, where are we going wrong, in terms of robotics development? It turns out that, it’s a well known phenomenon, Moravec’s Paradox,

What seems hard to us is actually easy, and what seems easy to us is actually hard.

In the ongoing research there is this idea called “embodiment hypothesis“, there has been only one form of generally intelligent system that has existed in nature that is embodied intelligence. Humans have undergone millions of years of evolution, and looking at evolution makes it evident that the evolutionary process has honed our abilities in sensory-motor control. The fact that you and I can simply pick up and play with an object, is trivial to us.Whereas if you ask people if playing chess or Go is easy, most of them will say No. The fact that it’s hard for us, makes it clear that’s not what we evolved to do. The problems that are hard is taken in a wrong context, it is hard in terms of computation.

In 2009, during the Robotics: Science and Systems (RSS) conference in Atlanta, a group of universities and industry leaders, with support from the Computing Community Consortium (CCC), advocated for establishing a national roadmap to boost robotics development in the United States. It was during this time that the idea of celebrating robotics advancements through National Robotics Week (held annually during the second full week of April) was conceived. The objective of National Robotics Week (RoboWeek) is simple — to spark interest in robotics among students and to share the excitement of robotics with people of all ages.

To contribute to National Robotics Week, we thought of starting a series of articles that will serve as a beginner’s guide to robotics. In this series, we will explore tools and concepts while building a project on an AMR (Autonomous Mobile Robot) designed for a warehouse environment.

We will mostly focus on ROS2 and Gazebo Ignition for development and testing, and as of now we will not touch the hardware side of things. The roadmap of this series will look like the following,

Introduction to Robotics: A comprehensive Guide to Robotics for Beginners (this article)
Getting Started with ROS2 and Gazebo Ignition: Building an Autonomous AMR Robot
Introduction to SLAM: Robot Localization in an Unknown Environment
Introduction to Planning and Control: Getting the Robot from Start to Goal

Why Should You Learn Robotics in 2024 – Introduction To Robotics
Life Cycle of a Robotics Project
Core Components of Robotic Automation
1. Hardware
2. Sensors
3. Robot Perception
4. Deep Learning for Robot Vision
5. Motion Planning
6. Robot Control
Robotics Tools
Job Opportunities in Robotics
Advanced Research in Robotics
Conclusion

Why Should You Learn Robotics in 2024 – Introduction To Robotics

At some point in our lives, we all had a bit of Bob the Builder in us, and each of us has likely entertained the idea of building a robot. Whether it’s extracting a motor from a toy car to install in a toy boat or constructing a remote-controlled car capable of running errands, the spirit of innovation has stirred within us all.

Some of us followed through and built robots, possibly during high school or college, while others did not. For those who didn’t build a robot, it might have been because of limited resources, a lack of knowledge, or insufficient time to dedicate to a project. Now, in 2024, the barriers to entry are lower than ever before. The knowledge gap has narrowed; computing and hardware no longer cost an arm and leg.

Additionally, the opportunities in robotics are more significant than ever. If you’re still on the fence, let me try to convince you why you should consider pursuing a career as a robotics developer or researcher.

2024 is the Year of Robotics:

Robotics has been one of the fields in deep tech that has always made investors think twice before investing. Apart from that, the adaptation of robotic solutions in households and workplaces has not been that prevalent because of low per capita income. But, as the economy of a country grows, the average household income is expected to grow as well, which might result in more automation work being done by robots. I remember in one of the talks in Legged Robot Workshop, ICRA by Jonathan Hurst, where he said,

By the time I retire, um past that, by the time I really need help around the house I hope its an Agility Robotics robot comes in and helps around the home as I’m 90 years old.

Well we definitely need help around the house, even without being 90 years old, for some of the chores that are exhausting and repetitive in nature. Fast forward to 2024, and we are witnessing significant growth in the robotics industry. Everyone is excited about the next big breakthrough in the field of robotics. Researchers and developers are working on integrating AI and Foundation Models with robotics, which is paving the way for a new era in robotics.

Foxglove DAUs are up 102% in Q1. The robotics industry has hit an inflection point over the past few months.
Jensen's GTC keynote was all about robotics.
Tesla is all in on robotics.
Google DeepMind has been a pioneer in robotics for years.
Microsoft is pumping funding into… pic.twitter.com/5RylhvqocL
— Adrian Macneil — 🤖/acc (@adrianmacneil) April 1, 2024

Not only that, but since the beginning of 2024, we have already started seeing many significant investments being done in the robotics industry. For instance, 1x Technologies and Figure AI, both well-known humanoid companies, grabbed investments worth $100 million and $675 million in the months of January and March, respectively. Additionally, a renowned self-driving car company from the UK named Wayve, received a substantial investment of $1.05 billion in May for their work on autonomous mobility through embodied intelligence.

As the year progresses, we can expect to see a continued increase in investments. This financial boost will not only bring more robots into households for general-purpose tasks but also fuel the growth of AI and robotics industries. As a whole, The outlook on robotics growth among developers, researchers, and investors is positive and hope we come out of 2024 as a transformative year in robotics, potentially setting a new standard for what robots can achieve and how they are integrated into our world.

Influence of Nvidia GTC’2024 : Making Robotics Mainstream

GTC ’24 has significantly boosted robot development, especially in the humanoid industry. It was the first time that the growth and development of robotics were prioritized on a global stage. For this purpose, multiple announcements related to robotics were made. For instance, NVIDIA introduced a new general-purpose foundation model (GROOT) aimed at enhancing humanoid robot development. They also launched a new chip(Jetson Thor) tailored for humanoid robots and made major enhancements to their robotics simulation platform(Isaac robotics). Additionally, NVIDIA announced new sets of robotics models, libraries, and hardware designed to further enrich robot functionality and integration.

Big Companies Joining The Robotics Race

Apple is considering home robots as its next big product line.
Tesla is more than halfway through developing its humanoid robot, Optimus.
Hugging Face is hiring a Robotics Engineer from the Tesla Optimus Team.

Recently, the Open Source Robotics Foundation (OSRF) launched the Open Source Robotics Alliance (OSRA), which aims to strengthen the governance and oversight of open-source robotics software projects through this new initiative. OSRA plans to follow in the footsteps of the Linux Foundation and the Eclipse Foundation through a mixed-membership and meritocratic model.

The Robotics Industry Needs More Engineers

The robotics industry, traditionally facing a talent shortage, is expanding as more companies launch robotics divisions and deep tech startups to meet the growing demand for robots. However, recruiting skilled personnel remains a challenge. This need is evident from the numerous inquiries from both current and prospective learners at OpenCV University seeking comprehensive robotics courses.

Growing Robotics Communities and Resources

Despite the long history of Autonomous vehicle development, this field became well-known to people after the DARPA Grand Challenge in 2005, and researchers shifted their focus on developing autonomous vehicles, during 2010-2015 a lot of companies got funded for building self-driving cars for public use. As the time progressed the computation, applications of robots increased which fueled the further growth in research, open source development. Conferences like ICRA, IROS, CASE became more popular and new conferences dedicated to particular fields were being organized. Open-source communities around ROS, OpenCV, Arduino, Tensorflow, Pytorch started to surface.

One good thing about having a community is that it becomes easier to get help from the community if stuck. Robotics competitions like FIRST Robotics, RoboCup, and DARPA Challenges inspired innovation and practical problem-solving skills. A lot of lectures and tutorials on robotics being openly published by multiple institutes, communities. On top of that, the Post pandemic era has seen a huge growth in terms of robotics resource development.

Life Cycle of a Robotics Project

The life cycle of a robotics project involves several distinct phases, each crucial to the successful development and deployment of a robotic system. Here’s a breakdown of the typical stages:

System Architecture:This stage involves defining the robot’s overall structure and design, including selecting its major hardware and software components and laying out its communication systems and dataflow.
Component Selection: Key components such as sensors, actuators, controllers, and computational hardware are selected based on requirements like performance, power consumption, cost, and compatibility.
Modelling in 3D: The physical design of the robot is modeled using 3D CAD software such as Fusion 360, Solidworks, etc. This helps visualize the robot’s structure, optimize the design, and prepare for fabrication.
Building the Hardware: This phase involves the actual assembly of the robot’s hardware components, including the mechanical build and the electronic circuit integration.
Working on Perception, Planning, and Control: Development of the software that enables the robot to perceive its environment, plan its actions, and control its movements. This includes implementing algorithms for calibration, localization, object detection, path planning, and motion control.
Simulations: The robot is simulated within the Gazebo simulation environment before physical testing. The robot’s 3D model is converted into a URDF file that is used in the simulation software. This allows testing the robot’s performance in a controlled virtual environment to predict how it will act in the real world.
Testing:
- Unit testing individual components or modules of the robot are tested to ensure each function works as intended in isolation.
- Integration testing is conducted to ensure that integrated components function together as expected. This phase checks for data flow and interaction errors between modules.
- System testingincludes testing of both hardware and software. The complete system is tested to verify that it meets all specified requirements, including performance, safety, and reliability testing.
Containerization and Deployment: The software components are packaged into containers using tools like Docker to ensure consistency across different development, testing, and production environments. The robot is then deployed in its target environment where it will operate.

This structured approach ensures a thorough development process, from initial design to deployment, making it possible to efficiently handle complex robotics projects while ensuring high performance and reliability standards.

Core Components of Robotic Automation

There are four important components in Robot automation, they are:

Hardware and Sensor
Robot Perception
Motion Planning and
Robot Control

We will be going through each of them in depth in the upcoming sections.

Hardware

The technical aspects of robotics design are multifaceted, covering a range of hardware components tailored to specific tasks and environments. Here’s a detailed overview focusing on the architecture design for processing units, application-specific hardware components and actuators/motors:

Architecture Design for CPU, GPU, and DSP: Robotics systems often require complex computational architectures that include CPUs, GPUs, and DSPs (Digital Signal Processors). The CPU handles general-purpose processing tasks and orchestrates the operation of other hardware components. GPUs are crucial for processing intensive parallel tasks quickly, especially for image and video analyses in robotics. DSPs are used for real-time processing of audio, video, and control sensor data, optimizing tasks that require high-speed numeric calculations. A well-integrated architecture ensures that these components work harmoniously to deliver the desired performance, balancing computational power, energy efficiency, and real-time processing capabilities.
Hardware for Application-Specific Components: Robots designed for specific applications like Unmanned Aerial Vehicles (UAVs) and Autonomous Vehicles (AVs) incorporate specialized hardware. UAVs, for example, include lightweight, high-strength materials and components like GPS modules, altimeters, and gyroscopes for navigation and stability. AVs incorporate a suite of sensors such as LiDAR, radar, and cameras, coupled with advanced computational hardware to support autonomous navigation systems that process vast amounts of data in real time to make split-second decisions.
Actuators / Motors: Actuators and motors are the muscles of robots, converting electrical energy into mechanical motion. Precision servomotors are commonly used for their ability to accurately control angular or linear position, velocity, and acceleration. Hydraulic or pneumatic actuators are employed for robots requiring more substantial force, albeit at the cost of control complexity and setup bulkiness. The selection of actuators and motors largely depends on the required force, speed, accuracy, and power efficiency for the robot’s intended tasks.

The selection of actuators and motors largely depends on the required force, speed, accuracy, and power efficiency for the robot’s intended tasks.

Sensors

As human beings, we have multiple sensory components that assist us in sensing different aspects like vision, hearing, smell, taste and touch. Other than the five fundamental senses, we are also able to feel the change in temperature, and balance ourselves in different terrains. Another sense called proprioception, mediated by the proprioceptors, and mechanosensory neurons located within the muscles, tendons and joints.

So, now the question is – how are robots able to sense the real-world like humans? Perception through stereo vision is one way, but what about the other sensory inputs?

This brings us to sensors, and they are devices with in-built mechanical, electrical or chemical features that allow them to sense different aspects of the real-world environment.Based on the type of sensory mechanism, sensors can be classified as:

Proximity Sensors: These sensors detect the presence or absence of objects near the robot. They often use infrared, ultrasonic, or laser technologies to measure the distance to an object. It can be further classified into two sub-sections:
- Inductive Proximity Sensors – Inductive proximity sensors are devices used to detect the presence of metallic objects without physical contact. They operate on the principle of electromagnetic induction, generating an oscillating magnetic field that changes when a metal object comes near.
- Capacitive Proximity Sensors – These sensors are designed to detect both metallic and non-metallic objects, including liquids and granular materials, through changes in capacitance caused by the presence of an object within their sensing field.
Vision Sensors: These include cameras and computer vision systems that allow robots to interpret visual information. They can be used for object recognition, navigation, and environment mapping.
Force Sensors: These sensors measure the force and torque applied in different directions. They are commonly used in robotic arms to adjust the strength of movement and ensure safety during interactions with objects and humans.
Inertial Measurement Unit (IMU): IMU’s are fundamental in robotics for tasks requiring precise movement and orientation, such as in autonomous vehicles, drones, and humanoid robots.
3D LiDAR (Light Detection and Ranging): It uses laser pulses to capture the three-dimensional features of environments and objects accurately. It emits thousands to millions of lasers per second, measuring the time it takes for each pulse to return after striking an object. These time-of-flight (ToF) measurements enable LiDAR systems to calculate distances precisely, creating detailed 3D maps of the surroundings.
Global Positioning System (GPS): The Global Positioning System (GPS) is a satellite-based navigation system consisting of a network of at least 24 satellites that orbit the Earth, providing time and location information in all weather conditions, anywhere on or near the Earth where there is an unobstructed line of sight to four or more GPS satellites.

Robot Perception

“Build robot partners that augment the human workforce, ultimately enabling humans to be more human” – Agility Robotics

To achieve this, robots must be capable of operating effectively in real-world environments designed and constructed by humans for humans rather than relying on highly controlled or specialized settings.

Below are the two crucial parts of robot perception, and we’ll explore both parts in this article as well in future articles:

Robot Navigation/Localization: The problem of estimating the robot’s pose (x,y, z, roll, pitch, yaw) on the map at each timestep. Roll, pitch and yaw represents the rotation keeping the X,Y and Z axes(respectively) fixed.
Deep Learning: Understanding the semantic information coming from a camera mounted on a robot by performing object detection, tracking, segmentation, classification, etc

What is Robot Localization?

Robot localization is a fundamental aspect of robotics, involving the process by which a robot determines its own position and orientation within a given environment. A robot operating in an unfamiliar environment needs to understand/sense its surroundings and accurately determine its current location on a map in order to navigate toward its target location effectively.

Addressing these challenges requires a robust perception module. Localization, essential for determining the vehicle’s position relative to a global frame, this cannot rely solely on GPS. Because GPS doesn’t give centimeter level accuracy, to achieve that one need to fuse LiDAR or Camera as primary sensors and IMU, encoders, radar, thermal camera, GPS as secondary sensor. If the vehicle is equipped with LiDAR, LiDAR SLAM (Simultaneous Localization and Mapping) is typically used. For vehicles with a stereo camera, Visual-SLAM is applied.

Important Terms in Robot Localization:

Below are few Localization terms that are used intensively in literature,

Pose: pose means the actual location of the robot in the map (translation and rotation).
Odometry: Odometry involves estimating the robot’s movement relative to its previous position by using sensors that monitor the output of actuaries, such as motor encoders.
Localization: Localization is more intricately tied with map building and figuring out where you are on a given map
Mapping: Mapping is the process of creating a 3d reconstruction of the surrounding from pure sensor data(camera, lidar).

Drift: Drift in SLAM or Localization in general refers to the gradual accumulation of errors in a robot’s estimated position and orientation over time, which causes a shift in the estimated location. This is a big issue in SLAM/Localization.
Loop Closure: Loop closure in SLAM (Simultaneous Localization and Mapping) refers to the process of recognizing a previously visited place within the map being constructed, thereby closing a loop. If we omit loop closures, SLAM essentially becomes odometry. This is used to mitigate the problem of multiple registration.

Past and present of SLAM:

The solution to the Simultaneous Localization and Mapping (SLAM) problem has been considered one of the notable achievements of the robotics community over the past decade. SLAM has been formulated and solved as a theoretical problem in various forms. It has also been implemented across a range of domains, from indoor robots to outdoor, underwater, and airborne systems. At a theoretical and conceptual level, SLAM can now be regarded as a solved problem. While SLAM can be considered solved conceptually, significant challenges remain in realizing more general SLAM solutions, particularly in building and utilizing perceptually rich maps that can handle diverse real-world environments.

Past Developments in SLAM:

The concept of SLAM was first introduced at a robotics conference in 1986, during a period when the use of probability in robotics and AI was just beginning. Early work by researchers like Smith, Cheeseman, and Durrant-Whyte laid the foundations for SLAM by developing a statistical approach that helped understand how robots could determine their location and map their environment despite uncertainties in measurements. They and others also showed how sensors like sonar and cameras could aid robot navigation.

Despite early progress, SLAM research encountered significant hurdles. Initially, it was believed that mapping errors in robot-created maps would keep growing, leading researchers to seek temporary fixes. However, a significant discovery in 1995 showed that these errors could be controlled and reduced. It was found that understanding the relationship between different landmarks in the environment was key to improving a robot’s accuracy in estimating its position. This breakthrough, detailed in a survey paper that also coined the term ‘SLAM’, led to advancements in making these systems more efficient and better at integrating new data with existing maps.

Figure 10 shows the first real-time implementation of visual SLAM using stereo cameras, by Andrew John Devision, robotics research group oxford. The particular video is taken in tom building of oxford.

Present Developments in SLAM:

LiDAR SLAM: SLAM (Simultaneous Localization and Mapping) can be performed using different sensors, with cameras and LiDAR being the most effective. These sensors are often used individually with others for improving accuracy. In LiDAR SLAM, there are various types including 2D, 3D, and deep learning-based approaches. Notable papers in these categories include,

Cartographer for 2D SLAM,
LOAM (Lidar Odometry and Mapping) and FLOAM (Fast LOAM) for 3D SLAM,
SuMa++ for deep learning-based SLAM.

Difficulties: Challenges in LiDAR SLAM include high equipment costs, performance issues in degraded environments like long corridors and dusty or dynamic areas, as well as low frequency and increased computational demands.

Visual SLAM and Deep Visual SLAM: Visual SLAM, which uses cameras, can be categorized into mono/stereo SLAM, RGB-D SLAM, Dense, Semi Dense, Sparse SLAM and Deep Learning based SLAM.

Deep learning methods: Cube SLAM, Kimera and PoseNet have recently advanced visual SLAM, achieving results comparable or superior to traditional methods.
In visual SLAM, ORB-SLAM and ORB-SLAM2 are widely-used methodologies.
In Semi Dense and Dense Visual SLAM, SVO and DSO are important methodologies, respectively.
Kintinuous is a standout method in RGB-D SLAM.
The state-of-the-art Gaussian Splatting in 3D reconstructions inspired the Gaussian Splatting SLAM paper at CVPR 2024.

Difficulties: Visual SLAM faces challenges such as feature miss-identification, scale drift, dynamic object disruptions, constrained views, and high computational demands, while deep Visual SLAM struggles with low texture, high dynamic range, motion blur, dynamic changes, and latency issues.

Figure 12: Visual SLAM and Deep Visual SLAM SLAM robotics

Sensor Fusion SLAM: Fusion techniques, which combine different types of sensors, have also been developed to enhance SLAM’s accuracy and reliability. Examples include Visual Inertial SLAM, LiDAR Inertial SLAM, and more complex setups like LiDAR Visual Inertial SLAM and Thermal Inertial SLAM. Noteworthy fusion-based SLAM methods include,

VLOAM for Laser-Visual SLAM,
LIO-mapping for Laser-IMU SLAM,
KTIO for thermal-IMU SLAM,
VINS-MONO for Visual-Inertial SLAM and
OpenVins for Visual-Inertial SLAM.

These methods integrate data from multiple sensors to create more robust and accurate mapping and localization systems.

Difficulties: The key challenges in Multi Sensor Fusion SLAM include lack of adaptability, vulnerability to risks and constraints, reliable data association, and reliable synchronization.

Figure 13: Examples of Different Sensor Fusion SLAM Algorithms

Deep Learning for Robot Vision

Apart from localization, challenges such as object detection, identifying road regulatory elements, derivable area detection, and occupancy prediction can often be addressed using deep learning. However, deep learning struggles with generalization, and the transition from development to deployment is notably challenging. Large models can be slow to predict and inherently uncertain. To improve model generalization and reduce perception failures, active learning is employed, training the model on unique scenarios encountered in operation.

Yet, validating deep learning failures and selecting a small yet diverse dataset for effective training remains problematic. Techniques like pruning and distillation are used to reduce inference time, but deep learning models are still often underutilized due to their unreliability and slow adoption.

Deep Learning has a wide range of applications in robot vision, from segmenting out the derivable region, detecting potential obstacles, identifying traffic symbols, road symbols etc. But, deep learning solutions bring a number of research challenges that can be categorized in three conceptually orthogonal axes:

Learning

In robotic vision, deploying deep machine learning in dynamic, open-set environments presents unique challenges compared to controlled lab settings. Key to success in these conditions are incremental and active learning, which enable machine learning models to continually absorb new information and skills over time without forgetting previously acquired knowledge.

Embodiment

Embodiment, which involves understanding and leveraging both temporal and spatial aspects, is crucial in differentiating robotic vision from computer vision. It presents unique challenges that enhance perception, enable active vision, and utilize environmental manipulation to improve overall visual processing.

Figure 16: Embodiment Challenges for Robotic Vision (Source)

Reasoning:

Inspired by biological mechanisms, there are three fundamental challenges in reasoning for robotic vision systems: reasoning about the semantics and geometry of a scene, reasoning about objects within the scene, and jointly reasoning about both scene and object properties.

Figure 17: Reasoning Challenges for Robotic Vision (Source)

Deep Learning Components in Tesla Autopilot:

Here is one of the earliest videos showcasing what Tesla’s autopilot observes while on the road. The detection and segmentation tasks depicted are primarily performed using deep learning techniques. Here’s a breakdown of “This Is What Tesla’s Autopilot Sees On The Road.”

Drivable Area Segmentation:Tesla Vision detects the drivable area on the road and the different lane lines. Detecting the lane lines helps the car stay in its lane and not go off track.
Road Regulatory Element Detection: Observed carefully, the model also detects regulatory elements such as road signs, such as stop signs in the intersection, traffic lights, and road markings in white(detected in orange). This helps the autopilot anticipate what is ahead on the road.
Vehicle Detection and Distance Estimation: One critical function is detecting incoming vehicles, estimating their distance, and predicting their future trajectory. The Tesla perception stack appears to excel in this area. Notice how the distance between the ego vehicle and the car ahead continues to decrease as it approaches the vehicle in front.
Tesla SLAM: The video below might be challenging to interpret at first glance, but if you look closely at the bottom right corner, you’ll see the SLAM output visualized. This component is crucial for any autonomous system, as it helps determine the vehicle’s current location and its intended destination.
Road Condition Identification and Planning Output: The text displayed on the image provides information about road conditions, such as whether the road is wet, if there is ongoing construction, if the road ahead is foggy, or if there are any obstacles like bumpers on the road.

Motion Planning

Given an initial state and a desired final state , find a time and a set of controls such that the motion satisfies x-of-T equals and is collision free for all t to [0, T].

= start position, = goal position, = total time to reach the goal, = current time

The motion planning problem can take various forms, such as planning a full trajectory with timing constraints or simply devising a collision-free geometric path. When motion planning is specifically used for trajectory generation, it is referred to as Path Planning. This involves navigating a search space to create a route from a starting point to a desired destination, effectively addressing the complexities of movement and timing in dynamic environments. This process is crucial in ensuring that robots can perform tasks efficiently and safely in their operational settings.

Important Terms in Motion Planning:

There are several key terms one needs to understand in motion planning:

Workspace: This is the physical environment in which the robot operates.
Configuration: This refers to a specific position and orientation of the robot within the workspace.
Configuration Space: This space describes all possible valid movements of the robot within the workspace. The configuration space corresponds to the robot’s degrees of freedom.
Trajectory: This term describes the robot’s location at any given time, including its movement speed and the patterns of acceleration or deceleration needed to effectively follow a designated path.
Obstacle Space: This space comprises areas within the workspace where the robot is unable to move, typically due to the presence of physical barriers.
Free Space: This is the set of all configurations that do not result in a collision with obstacles, allowing the robot to move freely.
Target Space: A subset of the free space, this is the goal region where the robot aims to reach. It defines the destination points within the workspace that the robot is trying to access.

Mathematics Behind Motion Planning:

Imagine you have a map of an environment as your workspace, and you know both your current location and your desired destination within that map. Your task is to find the shortest path to your goal. However, there’s a catch, you must avoid colliding with any obstacles along the way.

Given the scenario described, how would you mathematically represent the problem and proceed to solve it?

The above problem can simply be thought of as an optimization problem where we have a function(cost function) that we want to minimize or maximize based on a set of constraints. To understand the concept better let’s look into few problem statements,

Here,

represents the current planned trajectory or path.
Cost is the cost function defined over this path ().
The objective is to find an optimal plan, { }, that minimizes this cost function, subject to a set of constraints, denoted as c, and d.

These constraints can be thought of as representing various physical, operational, or other requirements that the optimal plan must satisfy.

With this knowledge, let’s apply it to solve the mentioned problem of finding the shortest path, where the distance between the start and end points is minimized. We can use the Euclidean distance as a cost function, taking the current and goal positions as inputs. Additionally, to enforce the constraint of avoiding obstacles, we can introduce a penalty whenever the robot collides with an obstacle. Lets, write this mathematically,

= Artificial Potential Fields (APF) [APF is obstacle collision penalty]

And we will be minimizing the Cost Function, based on the APF constraint.

Types Path Planners:

The path planning module ensures the robot navigates safely in dynamic settings. Its main goal is to direct the robot from its starting location to the destination without colliding with any obstacles, subject to vehicle motion constraints. Typically, path planning is divided into two parts,

Global Planner
Local Planner

What is Global Planner?

Using a static global map along with specified start and end positions for the robot, the global planner creates a collision-free route from the start point to the destination. However, this generated global path does not accommodate moving objects and remains unchanged.

What is Local Planner?

The local path planner enhances the global path by adjusting each segment (small parts) to account for dynamically moving objects. This hierarchical method of path planning brings numerous practical benefits. Given a map, start and goal position of the robot, a global path can be estimated. However it’s insufficient due to several dynamic factors such as,

Moving obstacles
Replacement of static elements
Road regulatory element changes
Uneven road surfaces and so on

Classification of Planning Algorithms:

There are different algorithms introduced for path planning such as:

Graph Search Based Planning Algorithms:

Graph-search-based algorithms can be divided into depth-first search, breadth-first search, and best-first search.

The depth-first search algorithm builds a search tree as deep and fast as possible from origin to destination until a proper path is found.
On the other hand breadth-first search creates the search tree as broad and quick as possible toward the target goal.
The best-first search algorithm assigns a numerical value or cost to each node and edge within the search tree, using these values to guide the search by determining if the tree should expand and which branch to extend next.

Examples: Dijkstra’s algorithm, A*, DFS, BSF and Bidirectional Search etc.

Sampling Based Planning Algorithms:

Sampling-based path planners randomly connect points within the state space, forming a graph to identify obstacle-free paths. These algorithms achieve efficiency by not requiring exploration of the entire configuration space. Users have the ability to adjust the number of iterations for generating small branches, which impacts the optimization of paths. However, they face challenges in navigating tight spaces, where random sampling struggles to establish connectivity. Below are various types of sampling-based path planners.

Examples: Rapidly-exploring Random Tree (RRT), Probabilistic Roadmap Method (PRM), RRT Star (RRT*), Batch Informed Trees Star (BIT*) and so on.

Intelligent bionic-based Planning Algorithms:

Another crucial category of global path planning techniques is the intelligent bionic-based method, which is a type of intelligent algorithm that mimics the evolutionary behaviors of insects. These bionic-based methods draw inspiration from natural phenomena and the collective intelligence exhibited by various species in nature. They employ a population-based approach, where a set of candidate solutions is iteratively refined through processes that mimic natural selection, swarm intelligence, or foraging behavior.

Examples: Genetic Algorithm (GA), Ant Colony Optimization (ACO), Artificial Bee Colony Algorithm (ABC), and Particle Swarm Optimization Algorithm (PSO).

Another method is ML-based path planning, where ML models leverage machine learning algorithms to generate optimal paths for robots or autonomous vehicles navigating through complex environments. ML-based methods include supervised learning based models such as SVM, LSTM, CNN, MCTS etc. Supervised learning algorithms like CNN are competent only in static obstacle avoidance by one-step prediction, therefore it cannot cope with time-sequential obstacle avoidance. RL algorithms, e.g. optimal value RL, fit time-sequential tasks. Typical examples of these algorithms include Q learning, nature DQN, double DQN and dueling DQN. Later another method was introduced, which is Policy gradient RL. THere are different algorithms such as DDPG, PPO, TROP etc.

Robot Control

Control in robotics is all about programming and setting up systems that help robots act on their own, or with just a little help from us. It’s crucial for making sure robots move smoothly and do their jobs right, often using feedback systems like closed-loop control. This means the robot constantly adjusts its actions based on what its sensors are telling it. Techniques like PID (Proportional, Integral, Derivative) control are the secret sauce that lets robots hit their marks with great precision and stability.

In the world of modern robotics, we’re also weaving in some smart tech like machine learning and adaptive control to tackle more complicated and ever-changing environments. This boosts the robot’s ability to learn from what’s around it and tweak its actions over time. All of this is powered by robust computing systems that quickly crunch sensor data and whip up control commands, ensuring that robots can keep up with rapid changes around them.

What is PID?

This type of control strategy is a widespread and effective control strategy used in robotics and various engineering applications to maintain a desired output or process condition. The controller’s objective is to minimize the error between a desired setpoint and the actual process variable by adjusting control inputs.

Proportional (P): This component produces an output proportional to the current error value. The proportional response can be adjusted by changing the proportional gain. A higher gain increases the control system’s response speed but can also lead to an unstable system with excessive overshooting if set too high.
Integral (I): The integral term is concerned with the accumulation of past errors. If the process variable has been away from the setpoint for a period of time, the integral term increases, thus helping to eliminate the residual steady-state error that occurs with a pure proportional controller. However, too much integral action can lead to instability and oscillatory behavior.
Derivative (D): The derivative part of the PID controller addresses the rate of change of the error, providing a prediction of future error based on its current rate. This helps in dampening the system’s response and reduces the overshoot and settling time. However, derivative control is sensitive to noise in the error measurement, so it must be carefully tuned to avoid excessive response to transient errors.

As a developer, it’s essential to grasp hardware and software. We will not go into depth about the technical skills required in hardware. Still, it’s essential to know the primary electrical components/tools, such as resistors, capacitors(Arduino, Raspberry Pi, and ESP32, etc.), wires, Diodes, LEDs, power supply(battery), Microcontrollers, different kinds of motors, Relays, Switches, breadboard, multimeter, soldering tool, sensors, AC-DC converters, etc. Having the concept of Power rating is essential.

Robot Operating System (ROS): Robot Operating System is a middle-ware software that enables communication between software and hardware. It was developed by an open-source organization called Open Robotics, and the ROS development has been going on for almost a decade now. They have a range of robotics software and libraries that are in active development and extensively used in industry and academia.
Linux: Experience working on a Linux-based OS, such as Ubuntu, Arch, Manjaro, etc., is essential.
Programming language: It’s very important to know C++ and Python because you will most often use ROS through these two languages. Recently, many companies and open-source organizations have focused on building Rust-based tools, there is also a Rust version of ROS.
Computer Vision and Deep Learning tools: OpenCV, Boost, Eigen, Open3D, PCL, and Libtorch/Pytorch are fundamental tools. Boost and Eigen are multithreading data manipulation libraries, Opencv is the computer vision library, PCL handles point clouds, and Libtorch(c++)/pytorch is used for inference deep learning models. OpenVino and TensorRT can be used for model quantization.
Perception Tools: The SLAM robotics community extensively uses GMapping, ORB-SLAM2, slam_toolbox, etc. For Lidar Localization and Mapping LOAM, FLOAM is used extensively. The robot_calibration ROS package provides tools to calibrate the intrinsic and extrinsic parameters of various sensors mounted on a robot, as well as the kinematics of robotic arms.
Motion Planning Tool: ROS2 has its own planning library called Navigation2. There is also MoveIt 2 for robotic arm manipulation. OMPL (Open Motion Planning Library) is often used in conjunction with ROS2 via integration tools like MoveIt 2 for complex motion planning tasks. Behavior Trees (BTs) are a powerful and flexible tool used in robotics, AI, and game development for controlling the execution of actions and decision-making processes.
Control Tools: ros2_control is a highly versatile framework designed for controlling hardware in ROS2.
Modelling Software: Different robot 3D modeling software is used in industry, such as Solidworks, AutoCAD, Unity, etc.
Simulation Software: Gazebo is one of the most popular software when it comes to robotics simulations, used extensively in academia and industry. Carla, SUMO are very popular for Autonomous Vehicles. For Reinforcement Learning simulators, there are Pybullet, MuJoCo, Isaac Sim etc.
Visualization Tools: People extensively use Rviz2, Foxglove, Rerun, PlotJuggler, etc., to visualize sensor data or instructions coming from the robot.
AI + Robotcs: There has been significant progress in integrating AI with robotics, giving rise to a distinct field of research known as Robot Learning. TDMPC, ALOHA and Mobile ALOHA are one of the significant work in the field, which focus on high-precision manipulation tasks. Recently, Hugging Face open-sourced a library called LeRobot, which implements these and is currently under active development.
Testing and Deployment Software: Pytest and CppUTest are extensively used for unit testing. They are being used after the development code is pushed in GitHub, using GitHub Actions. Docker plays a significant role in robotics deployment. A lot of people seem to use Ansible for automating tasks related to deployment. Grafana, Prometheus is used for resource monitoring like CPU usage, memory consumption, network etc.

Job Opportunities in Robotics

Robotics has defined boundaries of robotics developer and robotics research. As a developer you just need to know about the algorithms you are using and when to use what and how to integrate them with the system. It becomes a researcher’s job to develop new algorithms and figuring out solutions for unique problems.

Robotics has a wide range of applications in different fields, such as Automobile, Agriculture, Aerospace, Military, Healthcare, Manufacturing, Logistics, Service robots etc. And the job roles differ from company to company based on their product. Generally, people working on robotics projects have to either work on the hardware or the software part.

Exploring job opportunities in robotics offers a variety of exciting roles. The responsibilities of an Embedded Engineer include C/C++ programming for embedded systems, firmware development, Linux, and communication protocols like I2C and SPI. An Electronic Engineer focuses on circuit design, PCB layout, and sensor integration. A Robotics Software Engineer develops frameworks and deploys robotics stacks, while a Perception Engineer handles sensor calibration, SLAM, and computer vision tasks. A Motion Planning Engineer specializes in motion planning algorithms and reinforcement learning, and a Control-System Engineer executes planned paths using control algorithms like PID controllers. Firmware Engineers are responsible for proficiency in C/C++, ARM architecture, and electronics schematics. Each role offers unique challenges and opportunities in the dynamic field of robotics.

People Working at the core of robotics can be classified into:

Hardware:
- Embedded Engineer
- Mechanical Engineer
- Electronic Engineer
Software:
- Robotics Software Engineer
  - C++/CUDA Developer
  - Devops Engineer
- Perception Engineer
- SLAM/Localization Engineer
- Deep Learning & Computer VIsion Engineer
- Motion Planning Engineer
- Control-System Engineer
- Locomotion Engineer
- Research Scientist
- Firmware Engineer
- Simulation Engineer
- Design Engineer

Advanced Research in Robotics

Large language models have emerged as one of significant research fields in recent years, solving a lot of adjacent problems and bringing us a little closer to AGI. Models such as GPT-3, GPT-4, Llama, Mistral etc. have been trained on massive text corpora and images, videos giving the models a breath wise as well as depthwise view of the world. This development has sparked new thinking in robotics.

The easier things are harder to learn and the harder things are easier to learn. For example, It’s much easier for a robot to learn to solve a quadratic equation than opening a door. This was one of the motivation behind robot learning.

Let’s understand how the development progresses overtime.

The first iteration of work started with using LLMs for machine human collaboration using language via the paper by google robotics team, named SayCan. The Idea was to plan the sub tasks given a high level task, taking advantage of the reasoning capability of a LLM. It was a success, a paper by name SayCan was published by google robotics team. Through this work, few things became clear,

Strong Hold of LLMs: Reasoning: Large models are quite adept at reasoning and it’s crucial for robotics.
Semantic Knowledge:Large models act as common sense knowledge bases necessary for robots to act in the world. They have the perfect alignment of local and global context of the scene.

But there were quite a few challenges than advantages, such as

Embodied Grounding:What is the robot capable of? What is state is the environment is?
Interactive Decision Making:Robots is naturally interaction rich.
Low Amount and Qality of Data:Expert data in robotics is hard to come by.
Vision and Sensing:Robotic sensing modalities dont align with many foundation models.
Safety Critical:Safety in robotics is crucial and often includes hard constraints.

Then, came PaLM-E, a multimodal embodied language model which takes multi-modal sentences that interleave visual, continuous state estimation, and textual input encodings and produces multiple embodied tasks including sequential robotic manipulation planning, visual question answering, and captioning. At this point we are not only relying on language model, we are also integrating large vision models in the loop.

Later LM-Nav was introduced, which was a mixture of 3 different foundation model vision-navigation model (VNM); a visual-language model (VLM); and a large-scale language model (LLM) for robot navigation.

One of the notable works by Dieter fox at that time is on object picking in unconstrained, cluttered environments using manipulation with the help of segmentation foundation model SAM by meta.

Figure 26: Contact-GraspNet Paper by Nvidia

Vision and text could provide the semantic and reasoning capability but there was no information about the basic physics.

That means if the instruction is to take a bottle and pick it up the model needs to know how much the actuator should move, how tightly it should hold the object etc.

To embed this information to the model a lot of data collection work has started and models are being published. This kind of Visual Language model that has action information is called VLA(Visual-language-action model) Model, one of the works by Gan et al 3D-VLA is notable.

Conclusion

As we continue to push the boundaries of what’s possible in robotics, the importance of continuous learning and adaptation becomes evident. The field is set for significant growth and innovation, driven by ongoing research, investment, and collaboration across the tech community.

Join us as we continue to explore this fascinating field, uncovering new technologies and methodologies that will shape the future of robotics and automation. Let’s embrace the challenges and opportunities that lie ahead, steering towards a future where robots and humans collaborate seamlessly to enhance capabilities and improve lives.

Subscribe & Download Code

If you liked this article and would like to download code (C++ and Python) and example images used in this post, please click here. Alternately, sign up to receive a free Computer Vision Resource Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news.

Download Example Code