clock menu more-arrow no yes mobile

Filed under:

Associating code with data

Let’s say you want to build a page layout tool with configurable widgets that users can place on different pages. Because these widgets have configuration data associated with them (say, a background image uploaded by the user) it’s not a simple matter of inserting a static HTML string into a page—you have to first load the widget’s configuration data and then use it to render the widget’s template. This in itself isn’t a huge problem, but it does raise the question of how to associate widget configuration data, stored in the database, with code and a template, which are traditionally stored on the filesystem.

This is not quite as straightforward as rendering, say, a Rails controller action where the path of execution is usually defined by the routing and controller code. In our app the database needs to be queried before we know what code to execute and what template to render. The code must be located and executed dynamically.

Luckily, with scripting languages like Ruby and Python, the distinction between code and data is blurry. Arbitrary strings can be generated or fetched at runtime and executed as code, which means code can be stored anywhere strings can be stored. We’re not limited to keeping code in our app, we can keep it in…

1. The Database

Imagine we’ve designed our widget system with database-backed models like this:

Widget
name
description
code
template

WidgetPlacement
widget_id
page_id
position
settings

One great thing about keeping our widget logic and templates in the database, along with their user-specified configuration, is the simplicity of the resulting object. All data is stored in a single location, and a Widget object is capable of rendering itself anywhere. However, there are some serious drawbacks as well, including:

  • To test production widget code we have to duplicate part of our production database in our test environment. Because we don’t want to use production data for testing (generally a bad idea) our test database becomes a hybrid of live data and dummy data which all needs to be kept in sync so foreign key constraints aren’t violated and associations are kept intact. It’s an ugly situation.
  • Editing the code is somewhat annoying. Even if we use an in-browser code editor (like CodeMirror) the code will not be managed by a version control system, which makes bug tracking hard and which should make you feel generally queasy anyway.

So, that brings us back to storing our widget code on our old friend…

2. The Filesystem

This solution is familiar, but a robust solution might not be as simple as you think. Here’s the problem: when rendering a page, we fetch the configured widget objects from the database, but since the rendering code isn’t in the database, we have to know where in the filesystem to look. Some common solutions to this problem are:

  • Use a single directory and name our template files using database IDs of widget objects.
  • Use a single directory and name our template files using filesystem-compatible slugs of widget objects.

However, the first solution is awful for the developer to navigate (you need to look at the database to know which file to open since the names aren’t descriptive), and none of them work well when you decide to move files around to different directories and need to keep the database in sync. Since identifying information for each widget is duplicated, appearing in both the file system path and database row, things can get out of sync really easily. This is why it’s generally a bad idea to tie code to things stored in the database.

As far as I know, there is no great all-encompassing solution to this problem, but one thing that can help is to treat the widgets database table as read-only except by a process that scans the filesystem for widget templates and builds or updates the table based on the code in a specified directory tree. This process would be run on every code deploy and every run of the test suite. We can use a serialized data format like YAML to store whatever data we want about our widget (description, available configuration settings, etc). The only question is: where in our widget files do we put this data?

I can think of two precedents for this:

A. Use comments at the top of the file, like Sprockets does for CSS and JavaScript directives. For example:

# uuid: B8FA2B5D-167C-470D-9382-EE23ECA5C73C # name: Event Calendar # description: Graphical calendar with listing of custom events.

class Calendar < Widget   ... end 

This is easy to parse using Ruby:

yaml = File.read(filepath)[/\A(#.*)\n[^#]/m, 1].gsub(/^#/, "") data = YAML::load(yaml) || {} 

B. Use Ruby’s __END__ construct. This is an odd trick which causes Ruby’s parser to stop before the end of a file. You can then access the string that comes between __END__ and the actual end of file through the DATA object. For example:

class Calendar < Widget   ... end

__END__

uuid: B8FA2B5D-167C-470D-9382-EE23ECA5C73C name: Event Calendar description: Graphical calendar with listing of custom events. 

Sinatra uses this trick to allow embedding of templates in the same file as your application code. Unfortunately, because DATA is an IO object which only provides access to the data in the current file it’s really only useful in single-file programs. Thus we cannot use it directly for our current purposes. However, if you like the purity of using __END__ (so Ruby will not spend any time on your data lines when it parses the code before execution, and you don’t have to prefix each line with #), it’s easy enough to parse on your own:

yaml = File.read(filepath).sub(/\A.*\n__END__\n/m, '') data = YAML::load(yaml) || {} 

This, then, has been an exploration of techniques for including executable code in objects as if it were data. I haven’t offered any definitive solutions because I haven’t found any. Hopefully this article will be a useful starting point for discussion of this common problem. If you have any other solutions, implemented or not, I’d love to hear about them.