I like my giant blog post titles. Nyah.
As a remote worker for almost 5 years now, I live in video conferences. I feel really strongly about the power of seeing someone's face rather than just being a voice on a scratchy speaker phone. I've build an AutoAnswer Kiosk for Lync with some friends that you can get for free at http://lyncautoanswer.com (and read about the code here), I've got a BusyLight so the kids know I'm on a call, and the Holy Grail for the last few years has been a reliable Pan Tilt Zoom camera that I could control remotely.
Related Reading
A few years ago I super-glued a LifeCam camera to an Eagletron TrackerPod and build a web interface to it. I wanted to do this on the cheap as I can't afford (and my boss is into) a $1500 Panasonic IP Camera.
The Solution...er, the Problem
I have found my camera and built my solution. The Logitech BCC950 Conference Cam is the best balance between cost and quality and it's got Pan Tilt and (digital) Zoom functionality. The Zoom is less interesting to me than the motorized Pan Tilt.
Let's think about the constraints.
- A Logitech BCC950 PTZ camera is installed on a Windows machine in my office in Seattle.
- I'm anywhere. I'm usually in Portland but could be in a hotel.
- I may or may not be VPN'ed into work. This means I want to be able to communicate with the camera across networks, traverse NATs and generally not worry about being able to connect.
- I want to be able to control the camera in a number of ways, Web API, whatever, but ideally with cool buttons that are (or look) integrated with my corporate instant messaging system.
There's three interesting parts here, then.
- Can I even control the camera's PTZ functions programmatically?
- Can I relay messages across networks to the camera?
- Can I make a slick client interface easily?
Let's figure them out one at a time.
Can I even control the camera's PTZ functions programmatically?
I looked all over and googled my brains out trying to find an API to talk to the Logitech camera. I emailed the Logitech people and they folks me that the camera would respond to DirectShow APIs. This means I can control the camera without any drivers!
MSDN showed me PROPSETID_VIDCAP_CAMERACONTROL which has an enumeration that includes things like:
This lead me to this seven year old DirectShow .NET library that wraps the hardest parts of the DirectShow COM API. There's a little utility called GraphEdt.exe (GraphEdit) that you can get in the Windows SDK that lets you look at all the DirectShow-y things and devices and filters on your system.
This utility let me control the camera's Zoom but Pan and Tilt were grayed out! Why?
Turns out that this Logitech Camera supports only relative Pan and Tilt, not absolute. Whatever code that creates this Properties dialog was never updated to support a relative pan and tilt but the API supports it via KSPROPERTY_CAMERACONTROL_PAN_RELATIVE!
That means I can send a start message quickly followed by a stop message to pan. It's not super exact, but it should work.
Here's the C# code for my move() method. Note the scandalous Thread.Sleep call.
private void MoveInternal(KSProperties.CameraControlFeature axis, int value)
{
// Create and prepare data structures
var control = new KSProperties.KSPROPERTY_CAMERACONTROL_S();
IntPtr controlData = Marshal.AllocCoTaskMem(Marshal.SizeOf(control));
IntPtr instData = Marshal.AllocCoTaskMem(Marshal.SizeOf(control.Instance));
control.Instance.Value = value;
//TODO: Fix for Absolute
control.Instance.Flags = (int)CameraControlFlags.Relative;
Marshal.StructureToPtr(control, controlData, true);
Marshal.StructureToPtr(control.Instance, instData, true);
var hr2 = _ksPropertySet.Set(PROPSETID_VIDCAP_CAMERACONTROL, (int)axis,
instData, Marshal.SizeOf(control.Instance), controlData, Marshal.SizeOf(control));
//TODO: It's a DC motor, no better way?
Thread.Sleep(20);
control.Instance.Value = 0; //STOP!
control.Instance.Flags = (int)CameraControlFlags.Relative;
Marshal.StructureToPtr(control, controlData, true);
Marshal.StructureToPtr(control.Instance, instData, true);
var hr3 = _ksPropertySet.Set(PROPSETID_VIDCAP_CAMERACONTROL, (int)axis,
instData, Marshal.SizeOf(control.Instance), controlData, Marshal.SizeOf(control));
if (controlData != IntPtr.Zero) { Marshal.FreeCoTaskMem(controlData); }
if (instData != IntPtr.Zero) { Marshal.FreeCoTaskMem(instData); }
}
All the code for this PTZDevice wrapper is here. Once that library was working, creating a little console app to move the camera around with a keyboard was trivial.
var p = PTZDevice.GetDevice(ConfigurationManager.AppSettings["DeviceName"], PTZType.Relative);
while (true)
{
ConsoleKeyInfo info = Console.ReadKey();
if (info.Key == ConsoleKey.LeftArrow)
{
p.Move(-1, 0);
}
else if (info.Key == ConsoleKey.RightArrow)
{
p.Move(1, 0);
}
else if (info.Key == ConsoleKey.UpArrow)
{
p.Move(0, 1);
}
else if (info.Key == ConsoleKey.DownArrow)
{
p.Move(0, -1);
}
else if (info.Key == ConsoleKey.Home)
{
p.Zoom(1);
}
else if (info.Key == ConsoleKey.End)
{
p.Zoom(-1);
}
}
Also easy was a simple WebAPI. (I put the name of the camera to look for in a config file in both these cases.)
[HttpPost]
public void Move(int x, int y)
{
var p = PTZDevice.GetDevice(ConfigurationManager.AppSettings["DeviceName"], PTZType.Relative);
p.Move(x,y);
}
[HttpPost]
public void Zoom(int value)
{
var p = PTZDevice.GetDevice(ConfigurationManager.AppSettings["DeviceName"], PTZType.Relative);
p.Zoom(value);
}
At this point I've got the camera moving LOCALLY. Next, I mail it to Damian (my office buddy) in Seattle and he hooks it up to my office computer. But I need something to control it running on THAT machine...and talking to what?
Can I relay messages across networks to the camera?
Here's the architecture. Since I can't talk point to point between wherever I am and wherever the camera is, I need a relay. I could use a Service Bus Relay which would be great for something larger but I wanted to see if I could make something even simpler.
Since Azure lets me have 10 free websites and automatically supports SSL via a wildcard cert for sites at the *.azurewebsites.net domain, it was perfect for what I needed. I need SSL because it's the only way to guarantee that my traffic not be affected by corporate proxy servers.
There's three parts. Let's start in the middle. What's the Relay look like? I'm going to use SignalR because it will let me not only call methods easily and asynchronously but, more importantly, it will abstract away the connection details from me. I'm looking to relay messages over a psuedo-persistent connection.
So what's the code look like for a complex relay system like this? ;)
using System;
using SignalR.Hubs;
namespace PTZSignalRRelay
{
public class RelayHub : Hub
{
public void Move(int x, int y, string groupName)
{
Clients[groupName].Move(x, y); //test
}
public void Zoom(int value, string groupName)
{
Clients[groupName].Zoom(value);
}
public void JoinRelay(string groupName)
{
Groups.Add(Context.ConnectionId, groupName);
}
}
}
Crazy, eh? That's it. Clients call JoinRelay with a name. The name is the name of the computer with the camera attached. (More on this later) This means that this single relay can handle effectively any number of clients. When a client calls to Relay with a message and group name, the relay then broadcasts to clients that have that group name.
Can I make a slick client interface easily?
I created a super basic WPF app that's just a transparent window with buttons. In fact, the background isn't white or black, it's transparent. It's a SolidColorBrush that is all but invisible. It's not totally transparent or I wouldn't be able to grab it with the mouse!
The buttons use the .NET SignalR library and call it like this.
HubConnection connection = null;
IHubProxy proxy = null;
string remoteGroup;
string url;
private void MainWindow_MouseDown(object sender, MouseButtonEventArgs e)
{
if (e.ChangedButton == MouseButton.Left)
this.DragMove();
}
private async void MoveClick(object sender, RoutedEventArgs e)
{
var ui = sender as Control;
Point p = Point.Parse(ui.Tag.ToString());
await proxy.Invoke("Move", p.X, p.Y, remoteGroup);
}
private async void ZoomClick(object sender, RoutedEventArgs e)
{
var ui = sender as Control;
int z = int.Parse(ui.Tag.ToString());
await proxy.Invoke("Zoom", z, remoteGroup);
}
private async void MainWindow_Loaded(object sender, RoutedEventArgs e)
{
url = ConfigurationManager.AppSettings["relayServerUrl"];
remoteGroup = ConfigurationManager.AppSettings["remoteGroup"];
connection = new HubConnection(url);
proxy = connection.CreateProxy("RelayHub");
await connection.Start();
await proxy.Invoke("JoinRelay", remoteGroup);
}
The client app just needs to know the name of the computer with the camera it wants to control. That's the "GroupName" or in this case, from the client side, the "RemoteGroup." Then it knows the Relay Server URL, like https://foofooserver.azurewebsites.net. The .NET client uses async and await to make the calls non-blocking so the UI remains responsive.
Here's a bunch of traffic going through the Relay while I was testing it this afternoon, as seen by the Azure Dashboard.
The client calls the Relay and the Relay broadcasts to connected clients. The Remote Camera Listener responds to the calls. We get the machine name, join the relay and setup two methods that will respond to Move and Zoom.
The only hard thing we ran into (Thanks David Fowler!) was that the calls to the DirectShow API actually have to be on a UI thread rather than a background thread, so we have to get the current SynchronizationContext and post our messages with it. This results in a little indirection but it's not too hard to read. Note the comments.
private async void MainWindow_Loaded(object sender, RoutedEventArgs e)
{
var deviceName = ConfigurationManager.AppSettings["DeviceName"];
device = PTZDevice.GetDevice(deviceName, PTZType.Relative);
url = ConfigurationManager.AppSettings["relayServerUrl"];
remoteGroup = Environment.MachineName; //They have to hardcode the group, but for us it's our machine name
connection = new HubConnection(url);
proxy = connection.CreateProxy("RelayHub");
//Can't do this here because DirectShow has to be on the UI thread!
// This would cause an obscure COM casting error with no clue what's up. So, um, ya.
//proxy.On("Move",(x,y) => device.Move(x, y));
//proxy.On("Zoom", (z) => device.Zoom(z));
magic = SynchronizationContext.Current;
proxy.On("Move", (x, y) => {
//Toss this over the fence from this background thread to the UI thread
magic.Post((_) => {
Log(String.Format("Move({0},{1})", x,y));
device.Move(x, y);
}, null);
});
proxy.On("Zoom", (z) => {
magic.Post((_) =>
{
Log(String.Format("Zoom({0})", z));
device.Zoom(z);
}, null);
});
try {
await connection.Start();
Log("After connection.Start()");
await proxy.Invoke("JoinRelay", remoteGroup);
Log("After JoinRelay");
}
catch (Exception pants) {
var foo = (WebException)pants.GetBaseException();
StreamReader r = new StreamReader(foo.Response.GetResponseStream());
string yousuck = r.ReadToEnd();
Log(yousuck);
throw;
}
}
Now I've got all the parts. Buttons that call a Relay that then call back, across NAT and networks to the Remote Camera Listener which uses the Camera library to move it.
It works like a champ. And, because the buttons are transparent, I can put them over the Lync window and pretend it's all integrated.
TODO: I'm hoping that someone who knows more about Windows Internals will volunteer to create some code that will automatically move the buttons as the Lync Window moves and position them over the video window in the corner. Ahem.
You can set this up yourself, but I haven't gotten around to making an install or anything. If you have a Logitech BCC950 you are welcome to use my Relay until it costs me something. There's a preliminary download up here so you'd only need the Listener on one side and the Buttons on the other. No drivers are needed since we're using DirectShow itself.
This was great fun, and more importantly, I use this PanTiltZoom System ever day and it makes my life better. The best was that I was able to do the whole thing in C#. From client UI to cloud-based relay to device control to COM wrapper, it was all C#. It makes me feel very empowered as a .NET developer to be able to make systems like this with a minimal amount of code.
Lync Developer Resources
- CodeLync
- Developing Lync (Tom Morgan)
- Justin Morris on UC
- Lync Development by Michael Greenlee
- Lync'd Up (Tom Arbuthnot)
- The Modality Systems blog
Related Links
- Hanselminutes Podcast 242 - The Plight of the Remote Worker with Pete Brown
- Building an Embodied Social Proxy or Crazy Webcam Remote Cart Thing
- Virtual Camaraderie - A Persistent Video "Portal" for the Remote Worker
- Working Remotely from Home, Telepresence and Video Conferencing: One Year Later
Sponsor: Big thanks to this week's sponsor. Check them out, it's a great program, I've done it. Actual live humans will help you get started writing an app! Begin your 30-day journey to creating a Windows Store app or game for Windows 8 or Windows Phone today. Your Idea. Your App. 30 Days.
© 2012 Scott Hanselman. All rights reserved.