asemanfar - a blog about programming

 

Posts tagged with "tips"

Using a bloom filter to optimize a SQL query

February 08, 2010

Several weeks ago, we started hitting some bottle necks with our database. We have 3 MySQL machines: 1 master, 1 online slave and 1 offline slave. The online slave started lagging pretty badly, about 0.46 seconds per second. This post is about one of the things I did to alleviate the load on this machine and eliminate slave lag.

The slave database was only used for a couple types of queries: comments and user search. This meant that people would post comments, see them (write-through to memcache) and then see them disappear next time they came back (they didn't exist on the slave yet). In addition, our user search page would be out-dated and individuals' details would be incorrect. In order to find out which of these queries was a bigger contributor to slave lag, I watched the output of 'show full processlist' and looked for queries that took a long time (you can alternatively enable the slow query log, but that has a bit of overhead and MySQL 5.0 requires a restart).

The offending query quickly became clear:

   1  SELECT * FROM users WHERE facebook_uid in (<long list of integers>) and installed = 1;
   2  # Installed means they have added our application on Facebook

This query is executed to find your list of Facebook friends who are also playing our game. The way MySQL executes this query is by looking up each value in the list of integers in the index on facebook_uid, which for many people is several hundred to thousands of integers, and checking the installed field on those that had a row. The biggest problem with this query is that a lot of time was wasted looking up people who were not in the database or are not installed; we couldn't avoid querying for the people who are playing the game since we needed at least a little bit of information about them to show to the user.

We needed a way to efficiently reduce this huge list of users to remove the ones who would not be part of the result set. There are 10s of millions of installed users, so we can't simply keep a set in memory and have it be faster than MySQL's index lookups. I decided to use a bloom filter, which is one of the most memory efficient ways to check set membership. In this case, the set is defined as Facebook User IDs whose installed field is equal to '1' in our database. Using a bloom filter, we were able to significantly reduce the number of users we checked in the database; and because bloom filters can't have false negatives, we'd never incorrectly exclude an installed user from that query.

There are a couple bloom filter implementations for Ruby, but neither seemed to provide a service interface. So I decided to write one: http://github.com/arya/bloom_filter. This bloom filter has been in production for a couple weeks now. It's currently doing adds in an average of 2.92 milliseconds, and filters an average of 317 integers in average 12.7 milliseconds.

The bloom filter can be used either as a service, or in process. Check out the readme for details.

Capistrano, meet Twitter: Keep your users updated through Twitter

June 04, 2008

I recently deployed a beta of an application I'm working on and I thought it'd be nice to keep users updated about any new features or updates through twitter. So I wrote a spiffy Rails plugin that just adds a capistrano recipe for updating twitter statuses when you deploy.

capistrano_twitter just prompts you for a status update when you deploy and then updates the twitter status for the specified user (in config/deploy.rb). Of course, if you leave the status blank, no update will be made.

Check it out for yourself: http://github.com/arya/capistrano_twitter

Javascript Dependency Manager and Google AJAX Libraries

May 29, 2008

On Tuesday, Google released a new service where they host popular javascript libraries such as prototype and jQuery. This allows you to link to their copy of the libraries instead of your own so you can reduce the load on your server and speed up the client's experience.

They plan on archiving all stable versions of each library and allowing you to choose which you want to link to. They also provide compressed versions of all the libraries except scriptaculous and prototype (I'm not sure why only those two are left out).

Anyway, I've added support for using the Google copy of these libraries into my javascript dependency manager plugin. To use it, upgrade to the newest version of the plugin and create an initializer with the following code:

   1  JavascriptHelper::JavascriptHelperConfig.use_google_ajax_libs = true

Now when you include prototype, jQuery or any of the other supported libraries using the javascript dependency manager plugin, it will refer to the Google hosted file. By default the Google hosted libraries are not used, so users who want to stick to their own ways can continue doing so without any changes.

You can specify library versions and preference for compressed versions in the initializer as well, view the readme for details on how to do that.

ActiveRecord AssociationTypeMismatch: User expected, got User

May 13, 2008

Look familiar? ActiveRecord, fortunately, tells you when you try to assign an object of the wrong class to an association so you'd get this error if you tried to assign a String as the Comment for a Post (which is the correct usage). But sometimes you'll get the error in this case:

   1  @comment = @question.comments.build(params[:comment])
   2  @comment.user = current_user # raises "User expected, got User"

Now just earlier today, I got this "User expected, got User" error message and it stumped for me a few minutes. I then realized it's because I changed the User class, which usually isn't a problem since Rails reloads your models in the development environment. But since the Comment class is in a plugin (acts_as_commentable) and plugins aren't reloaded every request, the Comment class was expecting the User class as-it-was when mongrel was started (and the plugin was loaded).

Sooo.... to save developers between 5 seconds to 5 hours, I added this message to the exception raised (only if the two classes have the same name):

   1  (did you change the User class? try restarting mongrel)

I wrote a patch for the changes to add the helpful message. It's super-simple, but hey, I gotta start somewhere. I also made a ticket on the Rails LightHouse project, hopefully it'll get pulled into Rails.

Update: I found this guide on contributing to rails and see that I'm not supposed to fork Rails in GitHub unless its for something big. So I deleted my fork.