A while ago I built a computer input device using a laser pointer and regular usb web camera.

It was a pretty simple setup, and I used a lot of existing tools as a jumping off point. Here’s a writeup of my work and details for how to replicate it and what I learned.

First, a video overview:

Materials

At a minimum:

A web camera
A laser pointer

Optionally:

A projector

Technically speaking, the laser is completely optional. In fact, during testing I just had a desktop computer with the camera pointed at a sheet of paper taped to a wall, and I drew with the laser pointer on that sheet of paper and used that as an input device. With the projector, you can turn the setup into a more direct input, as your point translates directly to a point on a screen. But that’s just a bonus. This can be done without the projector.

Physical Setup

Take the camera, point it at something. Shine the laser inside the area where the camera can see. That’s it in a nutshell. However, there are some additional considerations.

First, the more direct the camera, the more accurate it will be. If the camera is off to the side, it will see a skewed wall, and because one side is closer than the other, it’s impossible to focus perfectly, and one side of the wall will be more precise than the far side of the wall. Having the camera point as directly as possible at the surface is the best option.

Second, the distance to the surface matters. A camera that is too far from the surface may not be able to see a really small laser point. A camera that is too close will see a very large laser point and will not have great resolution. It’s a tradeoff, and you just have to experiment to find the best distance.

Third, if using a projector, the camera should be able to see slightly more than the projected area. A border of a few inches up to a foot is preferable, as this space can actually be used for input even if it’s not in the projected space.

Fourth, it’s important to control light levels. If there are sources of light in the view of the camera, such as a lamp or window, then it is very likely the algorithm will see those points as above the threshold, and will try to consider them part of the laser pointer (remember white light is made up of red, green, and blue, so it will still be above the red threshold). Also, if using a projector, the laser pointer has to be brighter than the brightest the projector can get, and the threshold has to be set so that the projector itself isn’t bright enough to go over the threshold. And the ambient light in the room can’t be so bright that the threshold has to be really high and thus the laser pointer isn’t recognized. Again, there are a lot of tradeoffs with the light levels in a room.

Software Packages

I wrote my software in Java. There are two libraries that I depended on heavily:

http://java.sun.com/javase/technologies/desktop/media/jmf/2.1.1/download.html Java Media Framework (JMF) 2.1.1e – This is used for capturing the individual frames of the camera.
https://jai.dev.java.net/binary-builds.html Java Advanced Imaging (JAI) – This is used for doing the affine transforms that translate from camera coordinates to another coordinate system.

The JAI library is not entirely essential, as you could decide not to translate your coordinates, or you could perform your affine transform math to do it and eschew the large library that will go mostly unused. The neat thing about this transform, though, is that it allows for the camera to be anywhere, and as long as it can see the desired area, it will take care of transforming to the correct coordinates. This is very convenient.

The JMF library exists for Windows, Linux, and Mac. I was able to get it working in Windows, but wasn’t able to get it completely working in Linux (Ubuntu Jaunty as of this writing), and I don’t have a Mac to test on.

Basic Theory

The basic theory behind the project is the following; a laser pointer shines on a surface. The web camera is looking at that surface. Software running on a computer is analyzing each frame of that camera image and looking for the laser pointer. Once it finds that pointer, it converts the camera coordinates of that point into screen coordinates and fires an event to any piece of software that is listening. That listening software can then do something with that event. The simplest example is a mouse emulator, which merely moves the mouse to the correct coordinates based on the location of the laser.

Implementation Theory

To implement this, I have the JMF library looking at each frame. I used this frameaccess.java example code as a starting point. When looking at each frame, I only look at the 320×240 frame, and specifically only at the red value. Each pixel has a value for red, green, and blue, but since this is a red laser I’m looking at, I don’t really care about anything but red. I traverse the entire frame and create a list of any pixels above a certain threshold value. These are the brightest pixels in the frame and very likely a laser pointer. Then I average the locations of these points and come up with a single number. This is very important, and I’ll describe some of the effects that this has later. I take this point and perform the affine transform to convert it to screen coordinates. Then I have certain events that get fired depending on some specific things:

Laser On: the laser is now visible but wasn’t before.
Laser Off: the laser is no longer visible.
Laser Stable: the laser is on but hasn’t moved.
Laser Moved: the laser has changed location.
Laser Entered Space: the laser has entered the coordinate space (I’ll explain this later)
Laser Exited Space: the laser is still visible, but it no longer maps inside the coordinate space.

For most of these events, the raw camera coordinates and the transformed coordinates are passed to the listeners. The listeners then do whatever they want with this information.

Calibration

Calibration is really only necessary if you are using the coordinate transforms. Essentially, the calibration process consists of identifying four points and mapping camera coordinates to the other coordinates. I wrote a small application that shows a blank screen and prompts the user to put the laser point at each of the prompted circles, giving the system a mapping at known locations. This writes the data to a configuration file which is used by all other applications. As long as the camera and projector don’t move, calibration does not need to be done again.

Here is a video of the calibration process.

The Code

Here is the camera.zip (3.7mb). It includes the JAI library, the base laser application, the calibrator, and an example application that just acts as a mouse emulator.

Below are a couple snippets of the important stuff.

This first part is the code used to parse each frame and find the laser point, then fire the appropriate events.

/**
* Callback to access individual video frames. This is where almost all of the work is done.
*/
void accessFrame(Buffer frame) {
/***************************************************************************************/
/********************************Begin Laser Detection Code*****************************/
/***************************************************************************************/
// Go through all the points and set them to an impossible number
for (int i = 0;i<points.length;i++){
points[i].x = -1;
points[i].y = -1;
}
int inc = 0; //set our incrementer to 0
byte[] data = (byte[])frame.getData(); //grab the frame data
for (int i = 0;i<data.length;i+=3){//go through the whole buffer (jumping by three because we only want the red out of the RGB
//if(unsignedByteToInt(data[i+2])>THRESHOLD && unsignedByteToInt(data[i+1])<LOWERTHRESHOLD && unsignedByteToInt(data[i+0])<LOWERTHRESHOLD && inc<points.length){//if we are above the threshold and our incrementer is below the maximum number of points
if(unsignedByteToInt(data[i+2])>THRESHOLD && inc<points.length){//if we are above the threshold and our incrementer is below the maximum number of points
points[inc].x = (i%(3*CAMERASIZEX))/3; //set the x value to that coordinate
points[inc].y = i/(3*CAMERASIZEX); //set the y value to the right line
inc++;
}
}
//calculate the average of the points we found
ave.x = 0;
ave.y = 0;
for (int i=0;i<inc;i++){
if (points[i].x!=-1){
ave.x+=points[i].x;
}
if (points[i].y!=-1){
ave.y+=points[i].y;
}
//System.out.println(points[i].x + "," + points[i].y);
}
//System.out.println("-------------------");
if (inc>3){//if we found enough points that we probably have a laser pointer on the screen
ave.x/=inc;//finish calculating the average
ave.y/=inc;
PerspectiveTransform mytransform = PerspectiveTransform.getQuadToQuad(mapping[0].getX(), mapping[0].getY(),
mapping[1].getX(), mapping[1].getY(), mapping[2].getX(), mapping[2].getY(), mapping[3].getX(), mapping[3].getY(),
correct[0].getX(), correct[0].getY(), correct[1].getX(), correct[1].getY(), correct[2].getX(), correct[2].getY(), correct[3].getX(), correct[3].getY());
Point2D result = mytransform.transform(new Point(ave.x,ave.y),null);
in_space = !(result.getX()<0 || result.getY() < 0 || result.getX() > SCREENSIZEX || result.getY() > SCREENSIZEY);
if (!on){
fireLaserOn(new LaserEvent(result, new Point(ave.x, ave.y), last_point, last_raw_point,in_space));
on = true;
}
if (in_space && !last_in_space){
fireLaserEntered(new LaserEvent(result, new Point(ave.x, ave.y), last_point, last_raw_point,true));
}
//                System.out.println(result.getX() + "," + result.getY());
//                System.out.println(last_point.getX() + "," + last_point.getY());
//                System.out.println("----------------------");
if (result.getX()!=last_point.getX() || result.getY()!=last_point.getY()){
fireLaserMoved(new LaserEvent(result, new Point(ave.x, ave.y), last_point, last_raw_point,in_space));
}
else{
fireLaserStable(new LaserEvent(result, new Point(ave.x, ave.y), last_point, last_raw_point,in_space));
}
if (!in_space && last_in_space){
fireLaserExited(new LaserEvent(result, new Point(ave.x, ave.y), last_point, last_raw_point,false));
}
last_time = 0;
last_point = new Point2D.Double(result.getX(), result.getY());
}
else if (last_time==5){//if it's been five frames since we last saw the pointer, then it must have disappeared
if (in_space){
fireLaserExited(new LaserEvent(-1,-1, ave.x, ave.y, (int)last_point.getX(), (int)last_point.getY(), (int)last_raw_point.getX(), (int)last_raw_point.getY(),in_space));
}
fireLaserOff(new LaserEvent(-1,-1, ave.x, ave.y, (int)last_point.getX(), (int)last_point.getY(), (int)last_raw_point.getX(), (int)last_raw_point.getY(),in_space));
on = false;
in_space = false;
}
if (ave.x>0 || ave.y>0 && ave.x<CAMERASIZEX && ave.y<CAMERASIZEY)
fireLaserRaw(new LaserEvent(-1,-1, ave.x, ave.y, -1,-1, (int)last_raw_point.getX(), (int)last_raw_point.getY(),in_space));
last_time++;//increment the last_time. usually it gets set to 0 every frame if the laser is there
last_raw_point = new Point(ave.x,ave.y);//set the last_point no matter what
last_in_space = in_space;
/**************************************************************************************/
/********************************End Laser Detection Code*****************************/
/*************************************************************************************/
}
public int unsignedByteToInt(byte b) {
return (int) b & 0xFF;
}

This next part is pretty standard code for adding event listeners. You can see which laser events are getting passed. I intentionally made it similar to how mouse listeners are used.

Vector<LaserListener> laserListeners = new Vector<LaserListener>();
public void addLaserListener(LaserListener l){
laserListeners.add(l);
}
public void removeLaserListener(LaserListener l){
laserListeners.remove(l);
}
private void fireLaserOn(LaserEvent e){
Enumeration<LaserListener> en = laserListeners.elements();
while(en.hasMoreElements()){
LaserListener l = (LaserListener)en.nextElement();
l.laserOn(e);
}
}
private void fireLaserOff(LaserEvent e){
Enumeration<LaserListener> en = laserListeners.elements();
while(en.hasMoreElements()){
LaserListener l = (LaserListener)en.nextElement();
l.laserOff(e);
}
}
private void fireLaserMoved(LaserEvent e){
Enumeration<LaserListener> en = laserListeners.elements();
while(en.hasMoreElements()){
LaserListener l = (LaserListener)en.nextElement();
l.laserMoved(e);
}
}
private void fireLaserStable(LaserEvent e){
Enumeration<LaserListener> en = laserListeners.elements();
while(en.hasMoreElements()){
LaserListener l = (LaserListener)en.nextElement();
l.laserStable(e);
}
}
private void fireLaserEntered(LaserEvent e){
Enumeration<LaserListener> en = laserListeners.elements();
while(en.hasMoreElements()){
LaserListener l = (LaserListener)en.nextElement();
l.laserEntered(e);
}
}
private void fireLaserExited(LaserEvent e){
Enumeration<LaserListener> en = laserListeners.elements();
while(en.hasMoreElements()){
LaserListener l = (LaserListener)en.nextElement();
l.laserExited(e);
}
}
private void fireLaserRaw(LaserEvent e){
Enumeration<LaserListener> en = laserListeners.elements();
while(en.hasMoreElements()){
LaserListener l = (LaserListener)en.nextElement();
l.laserRaw(e);
}
}

This algorithm is extremely basic and not robust at all. By just averaging the points above the threshold, I don’t take into consideration if there are multiple lasers on the screen. I also don’t filter out errant pixels that are above the threshold by accident, and I don’t filter out light sources that aren’t moving. A more robust algorithm would do a better job and possibly identify multiple laser pointers.
I’m not the first person that has done this, though from what I can tell this is the first post that goes into so much detail and provides code. I have seen other people do this using other platforms, and I have seen other people try to sell this sort of thing. In fact, this post is sort of a response to some people who think they can get away with charging thousands of dollars for something that amounts to a few lines of code and less than $40 in hardware.
Something I’d like to see in the future is a projector with a built in camera that is capable of doing this sort of thing natively, perhaps even using the same lens system so that calibration would be moot.
You may have seen references to it in this post already, but one thing I mention is having the camera see outside the projected area and how that can be used for additional input. Because the laser pointer doesn’t have buttons, its input abilities are limited. One way to get around this is to take advantage of the space outside the projected area. For example, you could have the laser act as a mouse while inside the projector area, but if the laser moves up and down the side next to the projected area it could act as a scroll wheel. In a simple paint application, I had the space above and below the area change the brush color, and the sides changed the brush thickness or changed input modes. This turns out to be extremely useful as a way of adding interactivity to the system without requiring new hardware or covering up the projected area. As far as I can tell, I haven’t seen anyone else do this.
I have seen laser pointers with buttons that go forward and backward in a slideshow and have a dongle that plugs into the computer. These are much more expensive than generic laser pointers but could be reused to make the laser pointer much more useful.
Just like a new mouse takes practice, the laser pointer takes a lot of practice. Not just smooth and accurate movement, but turning it on and off where you want to. Typically releasing the power button on the laser pointer will cause the point to move a slight amount, so if you’re trying to release the laser over a small button, that has to be considered so that the laser pointer goes off while over the right spot.
This was a cool project. It took some time to get everything working, and I’m throwing it out there. Please don’t use this as a school project without adding to it. I’ve had people ask me to give them my source code so they could just use it and not do anything on their own. That’s weak and you won’t learn anything, and professors are just as good at searching the web.
If you do use this, please send me a note at laser@bobbaddeley.com. It’s always neat to see other people using my stuff.
If you plan to sell this, naturally I’d like a cut. Kharma behooves you to contact me to work something out, but one of the reasons I put this out there is I don’t think it has a lot of commercial promise. There are already people trying to, and there are also people like me putting out open source applications.
If you try to patent any of the concepts described in this post, I will put up a fight. I have youtube videos, code, and witnesses dating back a few years, and there is plenty of prior art from other people as well.