Typhoeus: A High Speed, Parallel HTTP Library for Ruby

Tháng Năm 14, 2009

Source: http://www.pauldix.net/2009/05/breath-fire-over-http-in-ruby-with-typhoeus.html

Typhoeus is a mythical greek god with 100 fire breathing serpent heads. He’s also the father of the more well known Hydra. Like the fearsome beast, Typhoeus is a fearsome Ruby library that enables parallel HTTP requests while cleanly encapsulating handling logic. Specifically, it uses libcurl and libcurl-multi to run HTTP really fast. Further, it’s designed with the focus of creating client libraries that work with web services. These could be external services like Twitter or systems like CouchDB and SimpleDB or custom web services that you write yourself.

The libcurl interface is contained within the library. Rather than trying to get Curb to do what I wanted, I decided to start with a clean slate and write the c bindings myself. Other than the libcurl interface, it has a nice DSL for creating classes and client libraries for web services.

The inspiration for the library came from an interview with Amazon CTO Werner Vogels. In the interview he states that when a user visits the Amazon.com home page it calls out to up to 100 different services to construct the single page before returning it to the user. I like Amazon’s approach of a services architecture, specially AWS, and wondered if I could do the same thing in Ruby. Typhoeus is the result of that effort.

I set up a benchmark to test how the parallel performance works vs Ruby’s built in NET::HTTP. The setup was a local evented HTTP server that would take a request, sleep for 500 milliseconds and then issued a blank response. I set up the client to call this 20 times. Here are the results:

 net::http 0.030000 0.010000 0.040000 ( 10.054327)
 typhoeus 0.020000 0.070000 0.090000 ( 0.508817)

We can see from this that NET::HTTP performs as expected, taking 10 seconds to run 20 500ms requests. Typhoeus only takes 500ms (the time of the response that took the longest.)

Hopefully I’ve whetted your appetite. Before I get to the code examples and the API, I’d like to put in a plug for my employer kgb. They were nice enough to let me release this library as open source. We’re also hiring for good front end rails developers, designers, and anyone with experience in search, information retrieval, and machine learning. Please drop me a line if you dominate code and if you’re interested in joining an awesome team.

Finally, on the the codez. Here are some usage examples and notes from the readme (gist for easier reading).

# here's an example for twitter search
# Including Typhoeus adds http methods like get, put, post, and delete.
# What's more interesting though is the stuff to build up what I call
# remote_methods.
class Twitter
 include Typhoeus
 remote_defaults :on_success => lambda {|response| JSON.parse(response.body)},
 :on_failure => lambda {|response| puts "error code: #{response.code}"},
 :base_uri => "http://search.twitter.com"

 define_remote_method :search, :path => '/search.json'
 define_remote_method :trends, :path => '/trends/:time_frame.json'

tweets = Twitter.search(:params => {:q => "railsconf"})

# if you look at the path argument for the :trends method, it has :time_frame.
# this tells it to add in a parameter called :time_frame that gets interpolated
# and inserted.
trends = Twitter.trends(:time_frame => :current)

# and then the calls don't actually happen until the first time you
# call a method on one of the objects returned from the remote_method
puts tweets.keys # it's a hash from parsed JSON

# you can also do things like override any of the default parameters
Twitter.search(:params => {:q => "hi"}, :on_success => lambda {|response| puts response.body})

# on_success and on_failure lambdas take a response object. 
# It has four accesssors: code, body, headers, and time

# here's and example of memoization
twitter_searches = []
10.times do
 twitter_searches << Twitter.search(:params => {:q => "railsconf"})

# this next part will actually make the call. However, it only makes one
# http request and parses the response once. The rest are memoized.
twitter_searches.each {|s| puts s.keys}

# you can also have it cache responses and do gets automatically
# here we define a remote method that caches the responses for 60 seconds
klass = Class.new do
 include Typhoeus

 define_remote_method :foo, :base_uri => "http://localhost:3001", :cache_responses => 60

klass.cache = some_memcached_instance_or_whatever
response = klass.foo 
puts response.body # makes the request

second_response = klass.foo
puts response.body # pulls from the cache without making a request

# you can also pass timeouts on the define_remote_method or as a parameter
# Note that timeouts are in milliseconds.
Twitter.trends(:time_frame => :current, :timeout => 2000)

# you also get the normal get, put, post, and delete methods
class Remote
 include Typhoeus

Remote.put("http://", :body => "this is a request body")
 {:params => {:post => {:author => "paul", :title => "a title", :body => "a body"}}})

Important Update:I should have mentioned that some bits of C were pulled from Todd Fisher’s update to curb for the multi interface. The easy code is completely different and some of the C stuff in Multi has changed. However, a good chunk of the multi code comes straight from there. Thanks Todd. Sorry for not mentioning it earlier.


Trả lời

Mời bạn điền thông tin vào ô dưới đây hoặc kích vào một biểu tượng để đăng nhập:

WordPress.com Logo

Bạn đang bình luận bằng tài khoản WordPress.com Đăng xuất / Thay đổi )

Twitter picture

Bạn đang bình luận bằng tài khoản Twitter Đăng xuất / Thay đổi )

Facebook photo

Bạn đang bình luận bằng tài khoản Facebook Đăng xuất / Thay đổi )

Google+ photo

Bạn đang bình luận bằng tài khoản Google+ Đăng xuất / Thay đổi )

Connecting to %s

%d bloggers like this: