Slug::Engine

While developing my blog in Rails, I resolved to include a mechanism for assigning arbitrary, SEO-friendly URLs to my content, similar to what the Path module in Drupal does. Such a feature is common in popular blogging engines like WordPress and Blogger, so it didn’t seem such a far-fetched feature.

Implementing the functionality turned out to be much more difficult that I had first imagined. The problems I encountered, which led me deep into the bowels of Rails to understand and resolve, have left me with a vastly improved understanding of how a Rails application works. Below, I detail some of those problems and how I overcame them.## Challenges

URL Pattern

One of the biggest challenges I had to deal with was how to route (and serve) requests for content when I couldn’t be assured that the request would follow any significant pattern. It would be easy to implement a standard slug pattern like :year/:month/:id-:slug that would apply universally to my content, but what if I want a slug to look like foo/bar/baz?

Content Type

To make matters worse, an arbitrary slug wouldn’t guarantee anything about the type of content it represented. It might be a blog post, a photo or some other type of content. How was I going to get the requests routed to the right controller?

Requirements

In addition to allowing arbitrary “slugs” to be attached to arbitrary content, I also wanted to ensure that the implementation was not a simple HTTP 301 redirect. A slug of /2011/11/slug-engine, for example, had to display the content at that URL. For most of my content, I would not be exposing the normal, REST-ful URL patterns (like /posts/123) unless absolutely necessary.

Solutions

I knew Drupal maintained a ‘router’ table in its database, which is used to test for matches with incoming requests. The ‘router’ table essentially worked as a dictionary, telling Drupal how to re-write and forward the request on the server side. This seemed like a good way to start my solution.

Model

I created a Permalink model with attributes for the slug, target content_id and target content_type. Permalink would track the relationship between a given piece of content and the slug assigned to it. Since the content can be anything, the belongs_to relationship is polymorphic.

    class Permalink < ActiveRecord::Base
      belongs_to :content, :polymorphic => true
    end

In content classes, such as Post, I would need to declare the has_one side of the relationship. Since the belongs_to side is polymorphic, the :as option is required here to properly setup the association.

    class Post < ActiveRecord::Base
      has_one :permalink, :as => :content, :dependent => :destroy, :autosave => true
    end

I later moved the code for the content side of the relationship into a reusable module and added features like an overridable default slug, automatic lazy instantiation of the associated Permalink, and finder methods for locating content by Permalink.

Controller

My PermalinksController was initially quite simple, containing only a #show method which looked up content by the given :slug. I was able to easily route requests to this controller using route globbing.

    get '*slug' => 'permalinks#show'

I quickly discovered two problems with this route.

First, arbitrary slugs can potentially collide with other, higher priority routes! If this happened, the content would never get rendered. For example, if a higher priority route, get 'about' => 'about#index' existed as well as a slug value of about, the slug version would be unreachable, since the higher priority route would always match first.

Second, because the glob matches the entire path, the route became a black hole. This meant that the route should be the lowest priority in my routes, but I couldn’t control routes added by gems and 3rd party engines. The routes added by the high_voltage gem, for example, were completely obscured.

Solving the first problem was just a matter of validating that the given slug did not match any other recognizable route, and it turned out to be fairly straight forward once I figured out how to access the routes at runtime.

    class Permalink < ActiveRecord::Base
      belongs_to :content, :polymorphic => true
      validate :not_system_slug

    private
      def not_system_slug
        begin
          route = Rails.application.routes.recognize_path "/#{slug}"
          errors.add :slug, "is a reserved system route" unless route[:controller] == "permalinks"
        rescue ActionController::RoutingError
          # No route matches, so that's probably good
        end
      end
    end

The second problem was much more difficult. Somehow, I had to convince Rails to pretend like the route didn’t match if a corresponding slug did not exist in the database. I needed a way to “pre-filter” the request and skip the route entirely if it would result in a 404.

Deep in the heart of [Rails'][] configuration I found Rack, and deeper still, Rack::Mount. This trio actually sets up a Rack::Mount::RouteSet as the sort of endpoint of the rack (hey, Rails itself is just a Rack app!) wherein each route is actually a Rack app in and of itself. The RouteSet tests each route in config/routes.rb successively until a route’s “app” returns a non-catch status, e.g., a response which does not include a X-Cascade: pass HTTP header. So that’s interesting. There must be a a way to take advantage of this, I mused.

Attempt #1: Rack App

I first tried implementing my “filter” as a simple Rack app, which you can easily route to in Rails.

    get '*slug' => SlugApp

This method proved to be incredibly fast and efficient at checking for the existence of a slug and abstaining from handling the request if no match was found. Rendering any semblance of a view, though, was extremely difficult without the niceties of the Rails framework and especially ActionController.

Attempt #2: Rails::Metal

I improved my Rack app by extending from ActionController::Metal, the most minimalist controller possible (while still a Rack app!). Routing was still simple. ActionController::Metal.action returns a Rack app for the given action.

    get '*slug' => SlugMetal.action(:show)

This improved the existence check for a slug by offering access to the params hash (which contained the captured :slug parameter), and rendering was improved by including the ActionController::Rendering module, but it still had problems:

  1. ActionController::Metal was not a descendant of ApplicationController, so some helper methods were inaccessible
  2. Various view functionality still didn’t work ‘out of the box’
  3. Route helpers in the view incorrectly interpreted some slugs as a CGI['SCRIPT_NAME'] prefix, which resulted in mutated paths, even in routes like root_path, which pointed to the wrong location when rendering content via its slug.
  4. Route parameters were not parsed properly for lower priority routes 1

All of these issues could be worked out with some tinkering, but I was wasting a lot of time trying to figure it out, and everything I did to overcome these issues felt terribly hackish. I further discovered that there are two similar yet different ways to route requests to a Rack app inside Rails. I discuss those subtle differences in a separate post.

Attempt #3: Middleware?

Next I tried making my slug filter into a piece of Rack Middleware, but that turned out to be even further from the right approach. If the middleware “passed”, it would pass the entire Rails endpoint. The middleware would essentially run as an ‘around’ filter to Rails and, at best, the middleware could take care of the response on its own, which lead me back to the rendering problems I had with Rails::Metal.

Attempt #4: Engine!

Finally, I took some more time to carefully read through the (limited) [documentation on Rails::Engine][]. Maybe I had been hopped on just the right amount of caffeine and energy drink, but it just made sense. An engine was perfect. I had been hesitant to try this solution at first because the idea of mounting one Rails app inside another seemed totally overkill. However, there are some real benefits to this approach:

  1. Isolated (or not) namespace: prevents or allows Engine code from interacting with Application code
  2. Dedicated routes: allows an engine to define exactly and only the routes it should concern itself with
  3. URL namespacing: allows my engine to be mounted at “/” or “/p” without affecting Engine code
  4. No middleware by default: keeps the nested Rails app light and fast (but the engine can have middleware) 5: Support for X-Cascade: pass header: lets lower priority routes handle the request if the Engine abstains (the Engine is, after all, just a Rack app)

I get the additional benefits of working within the Rails framework, which gives me access to: 1. Parameter parsing that works like it’s supposed to 2. Full view rendering and helper support

My Slug::Engine ended up incorporating both a controller for handling slugs that existed and a piece of middleware for filtering out and abstaining on slugs that didn’t exist.

The controller, then turned out fairly simple. It 1) looks up the Permalink by slug, 2) looks up the related content by Permalink, and renders it using a partial. The #find_by_permalink method allows the content’s model class to impose additional restrictions, like scoping out unpublished Posts, for example. Rails already had built in functionality to figure out what partial to render based on the class of the object passed in, so I just leveraged that.

I won’t go too deep into the implementation details, though, since source code generally speaks for itself.

View

There isn’t much to say about the view, since Permalink doesn’t actually have its own view code. Since the whole Engine relies on Rails’ ability to select the proper partial for a given object, it’s just a matter of ensuring that those partials have been provided.

I did take things one step further, though, and allowed partials to exist in contexts. For example, if the slug’s content turned out to be a Category, which is nothing more than a list of Post objects, it would be helpful to render a different partial for Posts in the context of a Category. Adding this little line to my controller solved that requirement:

    prepend_view_path "app/views/#{@content_type.pluralize.underscore}"

I could then have a template structure like this:

    app
      |_ views
         |_ posts
         |  |_ _post.html.erb
         |_ categories
            |_ _categories.html.erb
            |_ posts
               |_ _posts.html.erb

Then, when rendering a single Post, such as at /2011/11/my-first-post, the partial at app/views/posts/_posts.html.erb would be used. However, when rendering a list of Posts in the Code category, the partial at app/views/categories/posts/_post.html.erb would get used.

Wrap Up

That about wraps up the overview of my Slug::Engine gem. Please check out the source code on GitHub and let me know what you think in the comments!


1Since a route had been ‘matched’, the framework parsed the path to capture any path variables, like :slug. However, those parameters are cached on the request object and parsing is only performed once as long as they already exist. This meant that although my Metal app could return control to the RouteSet, the damage was done. The parameters had been parsed and any downstream controllers that might match would not get a params hash containing the parameters defined in their own routes, but instead get the cached params hash containing the :slug parameter. I eventually figured out how to remove the cached params in my Metal app in case no matching slug was found, but this just stunk of massive hackery.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>