About a year ago, several developers on our team locked ourselves in a room to figure out how we could efficiently get production-like data in our non-production Chorus environments. At that time, we had a rake task that would create some barebones communities and posts, and we had a shared staging server with a year-old copy of our production database. As a result, we found that our team was spending a lot of time manually creating things like posts and photos before they were able to get to the actual development, design, or QA task at hand.
Our solution to this problem is what we call “data snapshots”, or, more informally, “slurps.” Since our production database was much too large to completely copy to other environments, we created a utility that semi-intelligently builds a representative subset of production data. (In specific terms, this means that, although we copy over the entirety of most tables, we only have data from entries and comments, two of our largest tables, from the last week.)
Using this utility, we are able to reduce our 120 GB production MySQL database to 650 MB of gzipped SQL, which only ends up being 5 GB on a developer’s laptop. Beyond data size reduction, the slurp routine has some other important responsibilities, including anonymizing user data and intelligently using (but not duplicating) production assets, like author-uploaded photos, that are stored on S3. Having our full 300+ Chorus sites — with real-world posts, comments, sidebar widgets, and so on — present in our development, QA, and staging environments has profoundly impacted the way our team works.
I gave a talk about our data snapshot system at the DC Ruby Users Group a few weeks ago. Sadly, there is no video, but the slides from my talk can be found here.