Getting-real-world-coordinates-from-image-frame: Difference between revisions
(14 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== Why do we need | == Why do we need it? == | ||
Our | Our goal was to convert coordinates on the image to real world coordinates e.g. we wanted to know object's position relative to robot's position. This is necessary to make a robot understand, where the objects are relative to his position. More generally speaking, this is also needed to figure out the absolute coordinates on the football field. | ||
== Pinhole camera model == | == Pinhole camera model == | ||
We used the pinhole camera model. | |||
I am not going to describe all the theory for that, here is | I am not going to describe all the theory for that, here is the main formula that describes how the objects are projected onto the screen. | ||
<br> | <br> | ||
[[File:Pinhole_Camera_Model.png]] | [[File:Pinhole_Camera_Model.png]] | ||
To learn about this model there are good enough resources available (start with [http://en.wikipedia.org/wiki/Pinhole_camera_model wiki] and [https://www.youtube.com/watch?v=uhP3jrxraMk udacity]), but more | <br> | ||
To learn about this model, there are good enough resources available (start with [http://en.wikipedia.org/wiki/Pinhole_camera_model wiki] and [https://www.youtube.com/watch?v=uhP3jrxraMk udacity]), but focus more on the overall idea, problems we had and tools we used. | |||
== How to map coordinates from 3D to 2D? == | == How to map coordinates from 3D to 2D? == | ||
To be able to | To be able to convert between two coordinate systems, we need to know the intrinsic parameters and extrinsic parameters of the camera. The former describes how any real world object is projected to camera’s light sensors. It consists of parameters such as camera’s focal length, principal point and skew of the image axis. The latter gives information about camera’s pose in the observed environment (3 rotations and 3 translations as we live in a 3-dimensional world). To convert a 2-dimensional screenpoint into a 3-dimensional world point, we also need to make an extra assumption that all the objects that interest us are on a specific plane (e.g. on the floor where Z=0). | ||
== How did we find out the camera parameters and the pose? == | |||
We based our calibration system on the OpenCV implementation of an algorithm that gathers tries to match screen coordinates with known real-world coordinates and hence tries to estimate the required parameters by using many different observations . Because we only need to find those parameters once, we did not see a need for developing anything more complex for that. We found useful OpenCV functions specially designed for finding the camera matrix (intrinsic parameters) and rotation-translation matrix (extrinsic parameters) (see [http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html documentation] for calibrateCamera(), findChessboardCorners(), drawChessboardCorners() and projectPoints() also [http://docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html tutorial] might be useful). If you are interested in the inner workings of these algorithms, you can go and read the documentation or the source code. | |||
After collecting all results in terms of these parameters, we were able to convert real world 3D-points onto the image plane. But as this wasn’t our goal (we wanted 2D -> 3D) we had to reverse this equation. | |||
== How to map 2D points to 3D? == | |||
There was a bit of chaos and many “wasted” days in terms of reversing this equation. We had problems with inverting matrices. We found that the OpenCV Mat::inv() didn’t give good enough results for most matrices – maybe because opencv's pseudo inverse was buggy or we were using it wrongly. We tried same numbers with octave's pinv and got pretty reasonable results, so we recommend to be cautious with pseudo inverting matrices in opencv. | |||
TODO: Dig deeper, what was the problem with not being able to invert those matrices.<br> | |||
TODO: A picture of previous formula altered. | |||
In the end, it turned out that it was really easy to solve the equation.<br><br> | |||
= | When we assume that Z=0 and (a)<sub>3,4</sub> is the camera matrix multiplied by the rotation-translation matrix then we actually get the following system of equations. | ||
<br> | |||
[[File:Inv_eq.png]] | |||
<br> | |||
We don't know the value of s, X and Y. As we have three unknowns and thee equations, we can just solve the equation system and obtain our coordinates u and v. We implemented it using the [http://en.wikipedia.org/wiki/Cramers_rule Cramer's rule]. | |||
== Changing plane rotation while robot is walking == | |||
We noticed some quite heavy tilting happening on the image while Nao was moving, in figures it was approximately +5...-5 degrees over the orthogonal axis relative to the image frame (TODO: something more convincing needed as a figure). Visually, it seemed a lot like this - have a [https://www.youtube.com/watch?v=9PlHgYVYTgQ look]. | |||
The angle that Nao's torso is moving can be easily measured with [http://www.aldebaran-robotics.com/documentation/naoqi/core/almemory-api.html?highlight=memoryproxy#ALMemoryProxy::getData__ssCR AL::ALValue ALMemoryProxy::getData("device")]. Next we made an assumption that Nao's torso is rotating the same amount as it's head and made connection fixed between torso and head. | |||
Having obtained the angle we had to make the camera's pose dependent on the robot's rotation. After some thinking we found that we had to modify the camera's rotation matrix according to Nao's torso rotation in a bit more complex way, as they don't have a linear correlation. | |||
== Performance == | == Performance == | ||
TODO: | TODO: |
Latest revision as of 20:37, 31 May 2013
Why do we need it?
Our goal was to convert coordinates on the image to real world coordinates e.g. we wanted to know object's position relative to robot's position. This is necessary to make a robot understand, where the objects are relative to his position. More generally speaking, this is also needed to figure out the absolute coordinates on the football field.
Pinhole camera model
We used the pinhole camera model.
I am not going to describe all the theory for that, here is the main formula that describes how the objects are projected onto the screen.
To learn about this model, there are good enough resources available (start with wiki and udacity), but focus more on the overall idea, problems we had and tools we used.
How to map coordinates from 3D to 2D?
To be able to convert between two coordinate systems, we need to know the intrinsic parameters and extrinsic parameters of the camera. The former describes how any real world object is projected to camera’s light sensors. It consists of parameters such as camera’s focal length, principal point and skew of the image axis. The latter gives information about camera’s pose in the observed environment (3 rotations and 3 translations as we live in a 3-dimensional world). To convert a 2-dimensional screenpoint into a 3-dimensional world point, we also need to make an extra assumption that all the objects that interest us are on a specific plane (e.g. on the floor where Z=0).
How did we find out the camera parameters and the pose?
We based our calibration system on the OpenCV implementation of an algorithm that gathers tries to match screen coordinates with known real-world coordinates and hence tries to estimate the required parameters by using many different observations . Because we only need to find those parameters once, we did not see a need for developing anything more complex for that. We found useful OpenCV functions specially designed for finding the camera matrix (intrinsic parameters) and rotation-translation matrix (extrinsic parameters) (see documentation for calibrateCamera(), findChessboardCorners(), drawChessboardCorners() and projectPoints() also tutorial might be useful). If you are interested in the inner workings of these algorithms, you can go and read the documentation or the source code. After collecting all results in terms of these parameters, we were able to convert real world 3D-points onto the image plane. But as this wasn’t our goal (we wanted 2D -> 3D) we had to reverse this equation.
How to map 2D points to 3D?
There was a bit of chaos and many “wasted” days in terms of reversing this equation. We had problems with inverting matrices. We found that the OpenCV Mat::inv() didn’t give good enough results for most matrices – maybe because opencv's pseudo inverse was buggy or we were using it wrongly. We tried same numbers with octave's pinv and got pretty reasonable results, so we recommend to be cautious with pseudo inverting matrices in opencv.
TODO: Dig deeper, what was the problem with not being able to invert those matrices.
TODO: A picture of previous formula altered.
In the end, it turned out that it was really easy to solve the equation.
When we assume that Z=0 and (a)3,4 is the camera matrix multiplied by the rotation-translation matrix then we actually get the following system of equations.
We don't know the value of s, X and Y. As we have three unknowns and thee equations, we can just solve the equation system and obtain our coordinates u and v. We implemented it using the Cramer's rule.
Changing plane rotation while robot is walking
We noticed some quite heavy tilting happening on the image while Nao was moving, in figures it was approximately +5...-5 degrees over the orthogonal axis relative to the image frame (TODO: something more convincing needed as a figure). Visually, it seemed a lot like this - have a look.
The angle that Nao's torso is moving can be easily measured with AL::ALValue ALMemoryProxy::getData("device"). Next we made an assumption that Nao's torso is rotating the same amount as it's head and made connection fixed between torso and head. Having obtained the angle we had to make the camera's pose dependent on the robot's rotation. After some thinking we found that we had to modify the camera's rotation matrix according to Nao's torso rotation in a bit more complex way, as they don't have a linear correlation.
Performance
TODO: