Implementing Project Category Pages with Gatsby

6 September 2021

About a 6-minute read

In this post, we’ll look at how to add category tags to posts in Gatsby and generate listing pages for the posts in each category. I just switched from Hexo to Gatsby to generate this site, and unlike Hexo, Gatsby doesn’t come with the opinionated distinction between posts and pages, and it also doesn’t include a tagging or category system out of the box. For this site, I wanted to group my blog posts by the projects they’re about, in the same way a category system typically works on Hexo or Wordpress.

To configure Gatbsy, you can either install plugins or write code in one of the relevant config files. I looked for a “category” plugin, and found a deprecated one that isn’t maintained anymore, which I find suprising (doesn’t anyone else want categories? no one?). Not finding a plugin, I dove in to the config docs to add categories myself.

There are config files for different functional areas of the site, including gatsby-config.js for global settings and plugins and gatsby-node.js to configure how the Gatsby server runs. All the work of adding category pages goes in gatsby-node.js.

Before we dive in to the code to add category pages for projects, let’s review the Gatsby build process.

The Gatsby Site Lifecycle

Out of the box, Gatsby can read files to create “nodes,” which represent arbitrary groups of information in its internal GraphQL API, like blog posts or author bios. I’m using some markdown plugins with Gatsby, and those create nodes of type Mdx and MarkdownRemark for various Markdown flavors.

When the build server starts, it sources the nodes from the sourcing plugins (like the filesystem source plugin that reads from files, or the Markdown plugins transforming files to rendered Markdown), applies transformations to the sourced nodes, makes the nodes available in the GraphQL API, and finally generates pages by querying the API. Each page exports its query, indicating what Gatbsy needs to do to generate the page.

Gatsby provides lifecycle hooks so you can write code to augment its abilities. To manage the categories of project posts for this site, I wanted to make Project nodes top-level entities in the GraphQL API. Out of the box, Gatbsy (understandably) has no concept of projects, so we have to create these nodes as we see project tags on blog posts. To create the project nodes, we can add a handler to the onCreateNode lifecycle hook and observe the creation of the nodes for the blog posts and the project labels associated with them.

Markdown and Child Nodes

My site uses plain Markdown (handled by the markdown-remark source plugin) and MDX, the React-friendly extension of plain Markdown (handled by the MDX source plugin). The MDX source plugin creates one MDX node for each Markdown file in my content directory. The Markdown files have optional metadata in their frontmatter, like the title and the project name. In the onCreateNode handler, we can read the project names from every MDX node created and create corresponding project nodes.

When the Gatsby development process starts, the MDX source plugin creates each MDX node once. Nodes are immutable, so if I modify one of the Markdown files, the development server deletes the corresponding file node and MDX node (and any further children) and recreates the file and MDX nodes (any any further children).

Tagging Nodes with Projects

We’re almost to the code, but we have to decide how to model the project-post relationship in the GraphQL API. By the time the hooks we declare in gatbsy-node.js run, the MDX plugin and other source plugins have already created their nodes. The MDX frontmatter will contain a project name if it was specified in the Markdown, as a string. Given the project name string, it’s possible to link project pages to nodes and vice versa, but Gatsby additionally provides a way for nodes to link to each other with foreign keys, making their peers available inline in GraphQL rather than requiring a separate call to perform a join.

There are two options for creating this foreign key relationship between MDX nodes and project nodes:

  1. “Modify” the existing MDX nodes to add the project link to them. The nodes are immutable, but Gatsby provides the createNodeField action to associate additional fields with them in the special fields subobject.
  2. Create a new node from each MDX node, supplying the project foreign key as part of the node’s definition. The new node is the MDX node’s child, so when the MDX changes, the child will be deleted and recreated.

I opted for the second option, to create a new derived node for each MDX node. Deriving new nodes from the MDX also lets us make new types of nodes, each with its own field set, rather than piling everything into extra fields attached to the MDX nodes.

While we’re making new node types, we might as well differentiate between different types of content, even if it is all Markdown underneath. I added an additional type field to each post’s frontmatter to distinguish between blog posts and static pages that should not appear in the list of blog posts. Those are Post and Page nodes respectively.

Okay, too much talk and not enough code! Here is the snippet of gatsby-node.js that creates the posts and pages:

exports.onCreateNode = ({
  node,
  getNode,
  getNodesByType,
  actions: { createNode, createNodeField, createParentChildLink, deleteNode },
  createNodeId,
  createContentDigest,
}) => {
  // Field definitions omitted for brevity. `node` is the parent MDX node.
  const newNode = {
    id: createNodeId(node.id), // derive a stable ID from the parent ID
    parent: node.id, // add the parent to the child's `parent` field
    internal: {
      type: postCategory === 'posts' ? 'Post' : 'Page', // frontmatter indicates the type
      content: '', // I left the content on the parent instead of copying it
      contentDigest: createContentDigest(''),
    },
    project___NODE: node.frontmatter.project, // project foreign key
    category: postCategory, // redundant with `type`, but might as well include
    slug: slug, // the generated post URL
    title: String(node.frontmatter.title || slug || ''),
    lede: String(node.frontmatter.lede || ''),
    date: date,
  };

  // Create the new Post / Page
  createNode(newNode);
  // Add the child to the parent's `children` array
  createParentChildLink({ parent: node, child: newNode });
}

The important aspects of this design are:

  • The child Post and Page nodes have stable IDs, generated by passing the parent ID to Gatsby’s createNodeId helper (line 11).
  • The derived Post and Page nodes point to their MDX parent, so when it is deleted the derived nodes will also be deleted (line 12).
  • The internal.type field determines which GraphQL resolver will serve these nodes (line 14). More on that later.
  • The internal.content (line 15) and internal.contentDigest (line 16) fields are empty, leaving the content on the MDX parent. These nodes could duplicate the content, but the parent has additional field like timeToRead and body that are resolved dynamically in GraphQL and are not available at this stage of the build, so we will need to look at the parent node anyway.
  • The special project___NODE (line 18) field is a foreign key to the node representing the project for this post or page. The reserved ___NODE suffix causes Gatsby to treat the field as a Node ID and resolve it to the Node’s content in the GraphQL API.

Rendering Posts and Pages from the GraphQL API

The above onCreateNode hook creates Post and Page nodes corresponding to MDX nodes from Markdown files. Now instead of querying the allMdx and mdx GraphQL resolvers to list MDX and get details, we can query two new pairs of automatically generated resolvers:

  • allPost and post for Post nodes
  • allPage and page for Page nodes

I find these dedicated resolvers convenient, but it’s also possible to filter plain MDX nodes based on their frontmatter.

Creating Project Nodes

Now that we have Post and Page nodes with foreign keys to project nodes, we need to create the project nodes. Creating the project nodes is straightforward with one caveat: when a post node changes, its project might change. If the changed post node was the only one for its project, the previous project will have no posts anymore and we should delete it.

I chose to use the project name, like “RC Airplane,” as its ID. Another option would be to derive a UUID from the name or slugify it. Here’s the code to create the project node, from the same onCreateNode function in gatsby-node.js:

  const projectNode = {
    id: project,
    internal: {
      type: `Project`,
      content: '',
      contentDigest: createContentDigest(''),
    },
    name: project,
    slug: getProjectPath(project),
  };
  if (!getNode(projectNode.id)) {
    createNode(projectNode);
  }

The project node has empty internal.content, uses the project name as its ID, and has a custom slug field representing its URL in slugified form, like /projects/rc-airplane for the example above.

To clean up the orphaned projects with no posts, we add one additional step to the end of onCreateNode:

  // Clean up orphaned projects if node changes have left them
  const projectNodes = getNodesByType('Project');
  projectNodes.forEach((project) => {
    const projectPosts = project.fields.posts___NODE;
    if (
      projectPosts.length === 1 &&
      projectPosts.includes(newNode.id) &&
      newNode.project___NODE !== project.id
    ) {
      deleteNode(project);
    }
  });

We use the Gatsby getNodesByType helper to get all known projects, then look for projects whose only post is the node being created. If the node being created refers to a different project, then we know the other project(s) pointing to it are orphaned and should be deleted:

By adding and deleting projects on the fly as nodes are created, we support the Gatsby build process and also the live-updating development mode. As we edit source files, nodes retrigger the onCreateNode hook and we add and remove projects as needed. As a result, the development and production builds of the site should match exactly, minimizing surprises after we deploy the site. The same code creates projects in development and production builds, but nodes aren’t deleted when building for production, so the code to delete projects only ever runs in development mode.

Here’s what our shiny new allProject query looks like in Gatsby’s GraphiQL explorer:

allProject query

Each project contains an array of its posts, and each of those posts can resolve to details like the post title right in the same query! Rather than querying for all post IDs and then querying a second time to get all the post details, we have everything in one trip to the API.

Thanks for reading! If you’re looking to add category or project pages to your Gatsby site, I hope you found this long and probably confusing post helpful. Feel free to leave a comment!

Comments