Fixing a long standing bug in the Ruby Amazon S3 Library
The AWS::S3 library for Ruby has been around since the release of Amazon S3 in 2006; hundreds, if not thousands, of applications use it. Consequently, it is not usually “the suspect” when looking for the cause of intermittent access errors to S3. However, we recently found and fixed an error that has been present in the signature calculation method since the library was first released.
We use S3 as the backing store for Onehub Workspaces, and we do a lot of S3 operations. During routine log monitoring we noticed a slow, but persistent, stream of HTTP 403 (Unauthorized Access) errors from S3. These errors were not frequent enough to cause problems for our customers; applications using S3 should be designed anticipate errors, and retry. Still, we felt that further investigation was warranted.
To manage the logs generated by all of the Onehub services, we use Papertrail. Papertrail allows us to run a real-time search against our production logs, showing us requests to S3 like this:
https://s3.amazonaws.com/<bucket>/<object>?AWSAccessKeyId=<ouraccesskey>&Expires=1328127911&Signature=l74ewTX9hh0s2oiLoIY83V%2BlLuM%3D
The components used to calculate the signature are well documented by Amazon. When a signature fails, S3 will provide the components it attempted to use in the XML returned with the error message. We noticed that the signatures in these errors were different than those that should have been calculated for the provided Expires time. We monkey-patched the #canonical_signature method of AWS::S3::Authentication::Signature to handle a closure.
module AWS
module S3
class Authentication
class Signature
private
def encoded_canonical
digest = OpenSSL::Digest::Digest.new('sha1')
b64_hmac = [OpenSSL::HMAC.digest(digest, secret_access_key, canonical_string)].pack("m").strip
if options[:debug_proc]
options[:debug_proc].call(sprintf("AWS::S3::Authentication::Signature - request %s encoded canonical: %s %s canonical: [%s]", @request.path, b64_hmac, CGI.escape(b64_hmac), canonical_string))
end
url_encode? ? CGI.escape(b64_hmac) : b64_hmac
end
end
end
end
end
This enabled us to pass in an option to AWS::S3#url_for containing a closure with our debugging method.
options = options.merge({:debug_proc => lambda{|x| logger.warn(x)}})
the_url = AssetStore.url_for(key_name, options)
We put this through testing, and into production, then waited for the next error to appear.
AWS::S3::Authentication::Signature - request /<bucket>/<keyname> encoded canonical: l74ewTX9hh0s2oiLoIY83V+lLuM= l74ewTX9hh0s2oiLoIY83V%2BlLuM%3D canonical: [GET#012#012#0121328127912#012/<bucket>/<keyname>] https://s3.amazonaws.com/<bucket>/<object>?AWSAccessKeyId=<OURACCESSKEY>&Expires=1328127911&Signature=l74ewTX9hh0s2oiLoIY83V%2BlLuM%3D
From here we could see the error. The Expires time used to calculate the signature was different that the time provided in the URL. The value 1328127911 is in the URL, while 1328127912 was used to calculate the signature!
But why?
It took a bit of digging through the AWS::S3 source, but we found the culprit. When generating these S3 URLs, we pass an expires_in option to AWS::S3#url_for.
# Signature is the abstract super class for the Header and QueryString authentication methods. It does the job
# of computing the canonical_string using the CanonicalString class as well as encoding the canonical string. The subclasses
# parameterize these computations and arrange them in a string form appropriate to how they are used, in one case a http request
# header value, and in the other case key/value query string parameter pairs.
class Signature < String #:nodoc:
attr_reader :request, :access_key_id, :secret_access_key, :options
def initialize(request, access_key_id, secret_access_key, options = {})
super()
@request, @access_key_id, @secret_access_key = request, access_key_id, secret_access_key
@options = options
end
private
def canonical_string
options = {}
options[:expires] = expires if expires?
CanonicalString.new(request, options)
end
memoized :canonical_string
def encoded_canonical
digest = OpenSSL::Digest::Digest.new('sha1')
b64_hmac = [OpenSSL::HMAC.digest(digest, secret_access_key, canonical_string)].pack("m").strip
url_encode? ? CGI.escape(b64_hmac) : b64_hmac
end
def url_encode?
!@options[:url_encode].nil?
end
def expires?
is_a? QueryString
end
def date
request['date'].to_s.strip.empty? ? Time.now : Time.parse(request['date'])
end
end
# Provides query string authentication by computing the three authorization parameters: AWSAccessKeyId, Expires and Signature.
# More details about the various authentication schemes can be found in the docs for its containing module, Authentication.
class QueryString < Signature #:nodoc:
constant :DEFAULT_EXPIRY, 300 # 5 minutes
def initialize(*args)
super
options[:url_encode] = true
self << build
end
private
# Will return one of three values, in the following order of precedence:
#
# 1) Seconds since the epoch explicitly passed in the +:expires+ option
# 2) The current time in seconds since the epoch plus the number of seconds passed in
# the +:expires_in+ option
# 3) The current time in seconds since the epoch plus the default number of seconds (60 seconds)
def expires
return options[:expires] if options[:expires]
date.to_i + expires_in
end
def expires_in
options.has_key?(:expires_in) ? Integer(options[:expires_in]) : DEFAULT_EXPIRY
end
# Keep in alphabetical order
def build
"AWSAccessKeyId=#{access_key_id}&Expires=#{expires}&Signature=#{encoded_canonical}"
end
end
The #initialize method is the entry point, but most of the work is done by #build. The bug was immediately apparent once we looked at #expires. Because #build calls #expires, and then #encoded_canonical calls it later, the date used can change. The #date method uses Time.now, if these calls happened on different seconds, they would result in different values. The solution is to memoize the time; it could be done in #expires or #date.
def expires return options[:expires] if options[:expires] @expires ||= date.to_i + expires_in end
Interestingly, this error is only possible if the expires_in option is used. We suspect most people either use the library’s DEFAULT_EXPIRY or pass in an expires option, both of which cause #expires to avoid the call to #date.
After a bit of testing we put this code into production and have eliminated these errors, resulting in better performance for our customers. We have also submitted a pull request to the library maintainer.