Some tricks to use S3 in your web application

After the post on my dojo.io.bind experience, here's what I learned in using S3. I'm using the S3 library for Ruby published by Amazon.

Uploads

When you save an entry with a file attached in MyOwnDB, the Rails action handling the form checks the size of the file. If it is within the limits allowed for the account, it saves the information needed locally in the database (let's say in the table "files"), and then saves the file on S3. The key identifying the file on S3 is based on the id of its entry in the database. As this is only known after the row has been created, I needed to alias the save method as _save, and call it in the save method I defined and in which I send the file to S3:
#This is in the model file.rb
alias _save save
  def save
     #first create the row in the database, so it has an id assigned
     _save
     #then build the s3 key, based on the id assigned to the just saved row
     s3_key = build_s3_key
     #and write it in an attribute for performance reasons, saving the entry again.
     write_attribute(:value, o)
     _save

    #now we can send the file to S3
    res = @s3_conn.put(@@bucket_name, s3_key, 
                    @attachment.read, { 'Content-Type' => @attachment.content_type,
                    "Content-Length" => @attachment.size.to_s,
                    "Content-Disposition"=> "attachment;filename=\\"#{@attachment.original_filename}\\"" }
             )
    if res.http_response.code!= "200" or res.http_response.message != "OK"
          #we have a problem
          raise StandardError.new("S3 error. Response code: \
                  #{res.http_response.code} and message: \
                  #{res.http_response.message}")
     end
Some notes:
  • The @s3_conn variable is the result of this call: S3::AWSAuthConnection.new(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  • we save the entry in the database normally, by calling the original save method, which we aliased as _save. This uses all ActiveRecord's magic, and assigns an id to the entry
  • once we have called _save, we have access to the id, and can build the S3 key underwhich we'll save the file
  • when sending the file to S3, we set several attributes, including the Content-Type, which set the mime-type and at download time will propose the user to open the file with the correct software, and the Content-Disposition, so that the original file name is proposed in the "Save As..." dialog (if I recall correctly, it would otherwise propose to save it under the file named as the key identifying the file on S3)
  • if we have an error when sending the file to S3, we raise an exception

Downloads

A file attachment is displayed as a http link, but this link doesn't point to S3, but to a MyOwnDB link, which will check the validity of the request, and then redirect to the S3 URL of the file. There are several reasons to go that way:
  • All alidation is done on the request, such as: is the user logged on, does he have the right to access this file, etc
  • Links to S3 generated by MyOwnDB have a validity limited in time. If we displayed links to S3 in web pages, users would have to refresh the page if the validity of the displayed links were to be expired
  • S3 doesn't have precise accounting possibilities, and so displaying links to S3 would make it impossible for MyOwnDB to know how much traffic a user generates. As stated in the Terms of Service, a transfer initiated is counted as a full transfer, even if interrupted. To be fair to users, transfer limits have been adapted to take this into account. Moreover, the maximum size of an attachment is small enough compared to the transfer limit, so that the impact of an interrupted transfer is very limited.
The url the browser is forwarded to is built like this:
     generator = S3::QueryStringAuthGenerator.new(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
     generator.expires_in = DELAY_IN_SECONDS
     return generator.get(@@bucket_name, s3_key)

Deletions

As for saving the entry, we need to do a bit more in the destroy method, but here we don't need to alias it:
  def destroy
    #do the standard thing
    super
   #and now delete the S3 file
    begin
    @s3_conn.delete(@@bucket_name,s3_key)
    rescue Exception => e
        #treat exceptions
    end
  end

Accounting

Although S3 is very cheap, some accounting is needed as it is not free, and MyOwnDB can't pay unlimited storage to its users. The way I implemented it is the following:
  • Each account may only have a limited number of files saved at a given time. This is possible as each upload and deletion is tracked.
  • Each account may only have a limited bandwidth usage for file transfers. This is again possible as downloads also go through the MyOwnDB server. There is a 10% margin to accept downloads even though the limit has been reached. Uploads are blocked once the limit is reached though.
  • The size of a file attachments is limited.

Conclusion

Although S3 lacks some capabilities in accounting, notably usage per bucket, it is easy to work around it and propose a clear solution to the users of a service using S3 as storage solution.