Author Archive

Streaming data to S3 with ruby

Wednesday, March 29th, 2006

One of the downsides of the ruby S3 example code is that it doesn’t support streaming of data (it loads the entire file into memory). It turns out, however, that all that is needed to stream data is a tweak to the ‘request’ method in Net::HTTP.

  require 'net/http'
require 'S3'
require 'pp'

#
# Replace the request method in Net::HTTP to sniff the body type
# and set the stream if appropriate
#
module Net
  class HTTP
    alias __request__ request

    def request(req, body = nil, &block)
      if body != nil && body.respond_to?(:read)
        req.body_stream = body
        return __request__(req, nil, &block)
      else
        return __request__(req, body, &block)
      end
    end
  end
end

#
# Connect to s3 using the ruby API provided by Amazon
#
conn = S3::AWSAuthConnection.new("[PUBLIC]", "[PRIVATE]", false)

#
# Stream a testfile to S3
#
open("testfile") do |stream|
  pp response = conn.put('BUCKET_NAME',
                         "testfile",
                         stream,
                         {
                           "x-amz-acl" => "public-read",
                           "Content-Type" => "text/plain",
                           "Content-Length" =>  FileTest.size("testfile").to_s
                         }
                        )
end

#
# Send a testfile in memory to S3
#
pp response = conn.put('BUCKET_NAME',
                       "testfile",
                       File.read('testfile'),
                       {
                         "x-amz-acl" => "public-read",
                         "Content-Type" => "text/plain"
                       }
                      )

  

A few notes about the code

  • When streaming you have to supply the ‘Content-Length’ header
  • I had an error about S3.rb calling strip on non-strings, I changed line 49 to ‘interesting_headers[lk] = value.to_s.strip’
  • Make sure you replace PUBLIC, PRIVATE, and BUCKET_NAME with appropriate values

Offloading web traffic using Amazon’s S3 service

Tuesday, March 28th, 2006

We have a couple of fairly high traffic sites that have large images designed to be used for desktop backgrounds. To save a bit of bandwidth, we decided to give Amazon’s S3 webservice a spin.

Signing up was fairly painless. They required a credit card (so they could bill us $.15/G storage and $.20/G transfer). After I signed up I quickly received an email that contained a link to my public and secret keys.

This is a fairly new service and the client tools are just getting started. For my purposes of uploading several images, I decided to use jSh3ll to ‘browse’ my S3 storage and a custom ruby script to upload a large amount of files.

After downloading and installing jSh3ll, I created my first bucket:
(more…)

Long arguments and getops

Tuesday, March 7th, 2006

I recently had a need to adapt a script that recrawls a site with nutch. One of my design goals was to use the same command line options as the Fetchtool (one of the steps I had to take to recrawl a site).

It became apparent fairly quickly that bash’s built-in ‘getopts’ didn’t support long command line arguments, so I had to fall back on getopt.

Here is the portion of the script that parses the command line arguments:

set -- `getopt -n$0 -u -a --longoptions="depth: adddays: topN:" "h" "$@"` || usage
[ $# -eq 0 ] && usage

while [ $# -gt 0 ]
do
    case "$1" in
       --depth)   depth=$2;shift;;
       --adddays) adddays=$2;shift;;
       --topN)    topN=$2;shift;;
       -h)        usage;;
       --)        shift;break;;
       -*)        usage;;
       *)         break;;            #better be the crawl directory
    esac
    shift
done

Deconstructing this bit by bit:

(more…)

IE ‘Code’ bug

Thursday, February 16th, 2006

We had a problem with how code was being displayed on the blog. IE wasn’t respecting the width of the code block and it was throwing the entire layout off.

Here is how I fixed it:

pre code {
 padding: 5px;
 display: block;
 overflow: auto;
/*I had to add this so ie wouldn't put scrollbars both ways.  Seems to be ignored by firefox*/
 overflow-y: visible;
/*I had to add this for ie to respect the width.  I don't know why it didn't respect its container's width*/
 width: 450px;
}

Quick and dirty cryptfs

Wednesday, February 15th, 2006

Here is a quick rundown on how to create a cryptfs partition that mounts during boot with Fedora Core 4.

Configure the kernel (this may already be done for you):

Device Drivers --->
  Multiple devices driver support (RAID and LVM) --->
    [*] Multiple devices driver support (RAID and LVM)
      < >   RAID support
      <*>   Device mapper support
      <*>     Crypt target support

Cryptographic options --->
  <M>   MD5 digest algorithm
  <M>   SHA1 digest algorithm
  <M>   AES cipher algorithms
   .... 

Install the userland tools:

yum install cryptsetup

Create the filesystem and format it:

cryptsetup -c blowfish -s 64 create fs_name /dev/sda2
mkfs.ext3 /dev/mapper/fs_name

Create /etc/init.d/crytptinit:

if [ -b /dev/mapper/fs_name ]; then
  /sbin/cryptsetup remove fs_name
fi
/sbin/cryptsetup -c blowfish -s 64  create fs_name /dev/sda2

Run it at the runlevels you want:

cd /etc/rc3.d
ln -s ../init.d/cryptinit S08cryptinit

Create the mount point:

mkdir /mount_point

Edit /etc/fstab:

/dev/mapper/fs_name /mount_point               ext3   defaults 0 0

Reboot. You will be asked for your passphrase when the machine boots.

Information in this post was gleaned from several places. Here is one
that matches closely.

OSS Projects

Friday, February 10th, 2006

Mission Data has written a few open source packages that have been used in many projects. Sometimes in fairly large places.

  • SQLProcessor - Takes (some of) the tedium out of dealing with the JDBC API directly written by Darren Day and Leslie Hensley.
  • Lakeshore - A component based web framework written by Leslie Hensley and Steven Yelton.

Lakeshore even got a bit of press.