Sketchy: Drawing with a humanoid robot

17 May 2017

About a 4-minute read

Drawing with a humanoid robot

This quarter for Stanford’s CS225a, I’m working on “sketchy,” a project to draw pictures with a humanoid robot arm. We’re a group of four working with a Rethink Robotics Sawyer robot.

The end goal of the project is to have the robot “look” at something (via a camera) and draw a version of what it sees using markers. The project goal lends itself easily to a two-part structure:

  1. Capture images of the world and interpret them artistically to plan a drawing
  2. Use the plan to draw the picture with an appropriate control system

With a team of four, we can work most effectively if the project comprises smaller, independent and parallelizable tasks. Splitting the project into trajectory planning using the images of the world versus trajectory control using the output of the planner allows us to work on both parts simultaneously.

Software overview

Now for a bit of project architecture: this system isn’t terribly complex on the whole, but to make trajectory generation and trajectory control two independent development efforts, we need an intermediate representation or API that each half of the whole understands. Since we know what the drawing will look like before it starts, we can pregenerate the entire trajectory and save it to a file that the controller can load when it starts.

From the controls perspective, the task is to hold a marker and move it through the operational space at the correct position and orientation to draw a picture on the paper. At the very least, the position of the marker tip in 3D space matters, so the control task requires at least 3 degrees of freedom. We also want to avoid holding the marker upside-down, which means constraining another 2 degrees of freedom. The third degree of freedom for orientation controls the marker’s rotation about its central axis.

Trajectory specification

Before dividing the development effort, the planning and controls teams need to know what intermediate format to target. The controller needs to understand what the path planner is saying. We want the planner to pregenerate the trajectory and save it to a file that the controller can load when it starts.

Specifying the format of this file also means deciding what matters for a drawing tool trajectory and what doesn’t. For example, does the speed of the marker matter? We might also want to control the angle and the pressure, for varying the stroke style. At the very least, we need control over the position of the marker on the paper.

For our first implementation (and likely our final one, because the quarter is so short) we assume that only the position of the marker matters. We also want to allow multiple colors of marker. So the trajectory file needs to encode the color of the marker and the positions on the page it should draw.

If you’ve worked with 3D printers or CNC tools, this spec probably sounds like the beginnings of GCODE, and it is. We could use GCODE for our trajectory. However, GCODE is complex and supports way more features than we need. It also has a bigger problem (for us): it tightly couples the planning software (traditionally, the CAM program) to the control software (the CNC machine). When it’s time to mill a part, first you must specify every detail of the CNC tools to the CAM software, and then it generates a 3D path by computing all the tool offsets, keepouts, tool changes, etc. On the one hand, the planner’s knowledge of the tool allows it to make arbitrary parts, which is great. The downside is that it takes a lot of extra configuration to specify all the machine parameters, and the trajectory specification starts to encode a lot of information.

We’re just drawing pictures. They’re two-dimensional, and all of the markers are the same shape. So, we can decouple the planner from the controller by pushing some of the complexity into the controller. Instead of the planner doing all the work, it will only decide a 2D path for each marker through a hypothetical rectangular image plane. It will use normalized device coordinates (NDC), which essentially just means that it has no knowledge of physical units. The bottom-left of the image is (0, 0) and the top-right is (1, 1), and every point in the image falls somewhere in between.

The controller will handle things not related to the image itself: it knows about physical units, the size of the paper, and the positions of the markers on the tool carousel. It knows how to change tools when the image specification calls for a new color. It also knows how to project 2D image coordinates (in NDC) to 3D space. This separation of the image specification from the controls allows us to scale the drawing, change the position of the paper, and redesign the end-effector without regenerating the trajectory plans.

Trajectory file format

We still need a spcific format in which to store the trajectory information. We don’t want to invent a new format, and CS225a is already using jsoncpp, a C++ library for parsing JSON files. JSON is nice because it can store arbitrary key/value pairs, nested in objects. It also supports arrays. Assuming we can encode the trajectory compactly enough, it would be a convenient format, and we get the parser for free (since we already use it in other code).

Imagining just one marker drawing one continuous line, a trajectory is a list of points. Remember that we only care about specifying the action in the image plane, so each point in the list is two-dimensional and each coordinate takes on values from 0 to 1. Let’s also add a way to specify the color of the marker, and call this a “tool path”.

To draw a picture, we just need a list of tool paths. When one ends and another begins, we’ll assume that the controller can handle moving away from the drawing and changing tools. To interrupt a line and start drawing somewhere else with the same color (i.e. to lift the marker temporarily), we can simply start a new tool path with the same color.

The result is a JSON trajectory file like the following:

  "sequence": [
      "tool": 0,
      "points": [
        [0, 0],
        [1, 0],
        [1, 1],
        [0, 1],
        [0, 0]
      "tool": 4,
      "points": [
        [0.1, 0.1],
        [0.9, 0.1],
        [0.9, 0.9],
        [0.1, 0.9],
        [0.1, 0.1]

This file specifies two concentric squares, the first drawn using tool 0 and the second with tool 4. Specifying the tool by number raises a (small) red flag, because it means the planner still needs to know what tools the controller has in each position. For complete separation of concerns, the tool specifier should probably be the desired color instead of a tool number. However, the planner does need to know what colors are available anyway, so some cross-domain knowledge is appropriate here. We might change this interface as we finalize what colors the robot will use.

Next step: controls

In the next post, I’ll talk about the controls we’re using to realize these planned trajectories!