Getting-real-world-coordinates-from-image-frame: Difference between revisions

From Intelligent Materials and Systems Lab

No edit summary
 
(16 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Why do we need this part? ==
== Why do we need it? ==
Our purpose was to convert items on the image to real world coordinates e.g. we wanted to know item placement relative to robot's placement. This is necessary to make robot understand where are objects relative to him, and if we are watching a bigger picture, this is necessary to make robot know where it is relative to the soccer field.
Our goal was to convert coordinates on the image to real world coordinates e.g. we wanted to know object's position relative to robot's position. This is necessary to make a robot understand, where the objects are relative to his position. More generally speaking, this is also needed to figure out the absolute coordinates on the football field.


== Pinhole camera model ==
== Pinhole camera model ==
For this, we used pinhole camera model.
We used the pinhole camera model.
I am not going to describe all the theory for that, here is what pinhole model does in one formula:
I am not going to describe all the theory for that, here is the main formula that describes how the objects are projected onto the screen.
[[File:http://docs.opencv.org/_images/math/50a3464c85a412907d91fd8895108ff692eb8d08.png]]
<br>
[[File:Pinhole_Camera_Model.png]]
<br>
To learn about this model, there are good enough resources available (start with [http://en.wikipedia.org/wiki/Pinhole_camera_model wiki] and [https://www.youtube.com/watch?v=uhP3jrxraMk udacity]), but focus more on the overall idea, problems we had and tools we used.


To learn about this model there are good enough resources available (start with [http://en.wikipedia.org/wiki/Pinhole_camera_model wiki] and [https://www.youtube.com/watch?v=uhP3jrxraMk udacity]), but more focus on overall idea, troubles we had and tools we used.
== How to map coordinates from 3D to 2D? ==
To be able to convert between two coordinate systems, we need to know the intrinsic parameters and extrinsic parameters of the camera. The former describes how any real world object is projected to camera’s light sensors. It consists of parameters such as camera’s focal length, principal point and skew of the image axis. The latter gives information about camera’s pose in the observed environment (3 rotations and 3 translations as we live in a 3-dimensional world). To convert a 2-dimensional screenpoint into a 3-dimensional world point, we also need to make an extra assumption that all the objects that interest us are on a specific plane (e.g. on the floor where Z=0).
 
== How did we find out the camera parameters and the pose? ==
We based our calibration system on the OpenCV implementation of an algorithm that gathers tries to match screen coordinates with known real-world coordinates and hence tries to estimate the required parameters by using many different observations . Because we only need to find those parameters once, we did not see a need for developing anything more complex for that. We found useful OpenCV functions specially designed for finding the camera matrix (intrinsic parameters) and rotation-translation matrix (extrinsic parameters) (see [http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html documentation] for calibrateCamera(), findChessboardCorners(), drawChessboardCorners() and projectPoints() also [http://docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html tutorial] might be useful). If you are interested in the inner workings of these algorithms, you can go and read the documentation or the source code.
After collecting all results in terms of these parameters, we were able to convert real world 3D-points onto the image plane. But as this wasn’t our goal (we wanted 2D -> 3D) we had to reverse this equation.
 
== How to map 2D points to 3D? ==
There was a bit of chaos and many “wasted” days in terms of reversing this equation. We had problems with inverting matrices. We found that the OpenCV Mat::inv() didn’t give good enough results for most matrices – maybe because opencv's pseudo inverse was buggy or we were using it wrongly. We tried same numbers with octave's pinv and got pretty reasonable results, so we recommend to be cautious with pseudo inverting matrices in opencv.


== How to map coordinates from 3D to 2D? ==
TODO: Dig deeper, what was the problem with not being able to invert those matrices.<br>
To be able to transform between two coordinate systems it is needed to know camera intrinsic parameters and extrinsic parameters. Former describes how any real world object arrives to camera’s light sensor. It consists of parameters such as camera’s focal length, principal point and skew of the image axis. Latter gives information about camera’s pose in the observed environment (3 rotations and 3 translations as we live in a 3 dimensional world). To convert 2 dimensional point into 3 dimensional world we also need to make an extra assumption that objects that interest us are on a plane that we determine.
TODO: A picture of previous formula altered.


How did we get camera parameters and pose?
In the end, it turned out that it was really easy to solve the equation.<br><br>
We based our calibration system on opencv implementations of finding all of these parameters. Didn’t see any need to make anything topnotch in terms of speed, because we will need to get all these parameters only once and we can use them .. forever. We found useful opencv functions specially designed for finding camera matrix (intrinsic parameters) and rotation-translation matrix (extrinsic parameters) (see [http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html documentation] for calibrateCamera(), findChessboardCorners(), drawChessboardCorners() and projectPoints() also [http://docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html tutorial] might be useful). If you are interested in algorithms what makes it work, go watch the documentation or source code.  
After collecting all winnings in terms of these parameters we were able to convert real world 3D points onto image plane. But as this wasn’t our goal (we wanted 2D -> 3D) we had to keep going.


== How to map 2D point to 3D? ==
When we assume that Z=0 and (a)<sub>3,4</sub> is the camera matrix multiplied by the rotation-translation matrix then we actually get the following system of equations.
There was a bit of chaos and many “wasted” days in terms of reversing this operation. We had problems with inverting matrices. We took the model Opencv Mat::inv() didn’t give right results and some matrix pseudo inverse seemed to be not working – probably these matrices weren’t invertible.  
<br>
[[File:Inv_eq.png]]
<br>
We don't know the value of s, X and Y. As we have three unknowns and thee equations, we can just solve the equation system and obtain our coordinates u and v. We implemented it using the [http://en.wikipedia.org/wiki/Cramers_rule Cramer's rule].


TODO: Dig deeper, what was the problem with not being able to invert those matrices.
== Changing plane rotation while robot is walking ==
We noticed some quite heavy tilting happening on the image while Nao was moving, in figures it was approximately +5...-5 degrees over the orthogonal axis relative to the image frame (TODO: something more convincing needed as a figure). Visually, it seemed a lot like this - have a [https://www.youtube.com/watch?v=9PlHgYVYTgQ look].


In the end we solved equations with [http://en.wikipedia.org/wiki/Cramer's_rule Cramer’s rule].
The angle that Nao's torso is moving can be easily measured with [http://www.aldebaran-robotics.com/documentation/naoqi/core/almemory-api.html?highlight=memoryproxy#ALMemoryProxy::getData__ssCR AL::ALValue ALMemoryProxy::getData("device")]. Next we made an assumption that Nao's torso is rotating the same amount as it's head and made connection fixed between torso and head.
Having obtained the angle we had to make the camera's pose dependent on the robot's rotation. After some thinking we found that we had to modify the camera's rotation matrix according to Nao's torso rotation in a bit more complex way, as they don't have a linear correlation.


== Performance ==
== Performance ==
TODO:
TODO:

Latest revision as of 20:37, 31 May 2013

Why do we need it?

Our goal was to convert coordinates on the image to real world coordinates e.g. we wanted to know object's position relative to robot's position. This is necessary to make a robot understand, where the objects are relative to his position. More generally speaking, this is also needed to figure out the absolute coordinates on the football field.

Pinhole camera model

We used the pinhole camera model. I am not going to describe all the theory for that, here is the main formula that describes how the objects are projected onto the screen.
Pinhole Camera Model.png
To learn about this model, there are good enough resources available (start with wiki and udacity), but focus more on the overall idea, problems we had and tools we used.

How to map coordinates from 3D to 2D?

To be able to convert between two coordinate systems, we need to know the intrinsic parameters and extrinsic parameters of the camera. The former describes how any real world object is projected to camera’s light sensors. It consists of parameters such as camera’s focal length, principal point and skew of the image axis. The latter gives information about camera’s pose in the observed environment (3 rotations and 3 translations as we live in a 3-dimensional world). To convert a 2-dimensional screenpoint into a 3-dimensional world point, we also need to make an extra assumption that all the objects that interest us are on a specific plane (e.g. on the floor where Z=0).

How did we find out the camera parameters and the pose?

We based our calibration system on the OpenCV implementation of an algorithm that gathers tries to match screen coordinates with known real-world coordinates and hence tries to estimate the required parameters by using many different observations . Because we only need to find those parameters once, we did not see a need for developing anything more complex for that. We found useful OpenCV functions specially designed for finding the camera matrix (intrinsic parameters) and rotation-translation matrix (extrinsic parameters) (see documentation for calibrateCamera(), findChessboardCorners(), drawChessboardCorners() and projectPoints() also tutorial might be useful). If you are interested in the inner workings of these algorithms, you can go and read the documentation or the source code. After collecting all results in terms of these parameters, we were able to convert real world 3D-points onto the image plane. But as this wasn’t our goal (we wanted 2D -> 3D) we had to reverse this equation.

How to map 2D points to 3D?

There was a bit of chaos and many “wasted” days in terms of reversing this equation. We had problems with inverting matrices. We found that the OpenCV Mat::inv() didn’t give good enough results for most matrices – maybe because opencv's pseudo inverse was buggy or we were using it wrongly. We tried same numbers with octave's pinv and got pretty reasonable results, so we recommend to be cautious with pseudo inverting matrices in opencv.

TODO: Dig deeper, what was the problem with not being able to invert those matrices.
TODO: A picture of previous formula altered.

In the end, it turned out that it was really easy to solve the equation.

When we assume that Z=0 and (a)3,4 is the camera matrix multiplied by the rotation-translation matrix then we actually get the following system of equations.
Inv eq.png
We don't know the value of s, X and Y. As we have three unknowns and thee equations, we can just solve the equation system and obtain our coordinates u and v. We implemented it using the Cramer's rule.

Changing plane rotation while robot is walking

We noticed some quite heavy tilting happening on the image while Nao was moving, in figures it was approximately +5...-5 degrees over the orthogonal axis relative to the image frame (TODO: something more convincing needed as a figure). Visually, it seemed a lot like this - have a look.

The angle that Nao's torso is moving can be easily measured with AL::ALValue ALMemoryProxy::getData("device"). Next we made an assumption that Nao's torso is rotating the same amount as it's head and made connection fixed between torso and head. Having obtained the angle we had to make the camera's pose dependent on the robot's rotation. After some thinking we found that we had to modify the camera's rotation matrix according to Nao's torso rotation in a bit more complex way, as they don't have a linear correlation.

Performance

TODO: