Archive for March, 2006

Streaming data to S3 with ruby

Wednesday, March 29th, 2006

One of the downsides of the ruby S3 example code is that it doesn’t support streaming of data (it loads the entire file into memory). It turns out, however, that all that is needed to stream data is a tweak to the ‘request’ method in Net::HTTP.

  require 'net/http'
require 'S3'
require 'pp'

#
# Replace the request method in Net::HTTP to sniff the body type
# and set the stream if appropriate
#
module Net
  class HTTP
    alias __request__ request

    def request(req, body = nil, &block)
      if body != nil && body.respond_to?(:read)
        req.body_stream = body
        return __request__(req, nil, &block)
      else
        return __request__(req, body, &block)
      end
    end
  end
end

#
# Connect to s3 using the ruby API provided by Amazon
#
conn = S3::AWSAuthConnection.new("[PUBLIC]", "[PRIVATE]", false)

#
# Stream a testfile to S3
#
open("testfile") do |stream|
  pp response = conn.put('BUCKET_NAME',
                         "testfile",
                         stream,
                         {
                           "x-amz-acl" => "public-read",
                           "Content-Type" => "text/plain",
                           "Content-Length" =>  FileTest.size("testfile").to_s
                         }
                        )
end

#
# Send a testfile in memory to S3
#
pp response = conn.put('BUCKET_NAME',
                       "testfile",
                       File.read('testfile'),
                       {
                         "x-amz-acl" => "public-read",
                         "Content-Type" => "text/plain"
                       }
                      )

  

A few notes about the code

  • When streaming you have to supply the ‘Content-Length’ header
  • I had an error about S3.rb calling strip on non-strings, I changed line 49 to ‘interesting_headers[lk] = value.to_s.strip’
  • Make sure you replace PUBLIC, PRIVATE, and BUCKET_NAME with appropriate values

Offloading web traffic using Amazon’s S3 service

Tuesday, March 28th, 2006

We have a couple of fairly high traffic sites that have large images designed to be used for desktop backgrounds. To save a bit of bandwidth, we decided to give Amazon’s S3 webservice a spin.

Signing up was fairly painless. They required a credit card (so they could bill us $.15/G storage and $.20/G transfer). After I signed up I quickly received an email that contained a link to my public and secret keys.

This is a fairly new service and the client tools are just getting started. For my purposes of uploading several images, I decided to use jSh3ll to ‘browse’ my S3 storage and a custom ruby script to upload a large amount of files.

After downloading and installing jSh3ll, I created my first bucket:
(more…)

Ruby Oracle DBI ActiveRecord in 7 steps

Wednesday, March 22nd, 2006

Setting up ruby to work with Oracle seems to be a pain for a lot of people. Here are the steps I follow to set it up on a linux box from nothing to Active Record or DBI in 7 steps.

  1. Gather the installation sources you will need. You have to be registered with oracle to get their instant client packages.
    Download the ruby oci8 drivers
    Download the oracle instant client
    You want the following packages (these examples assume the zip format):

    • Instant Client Package - Basic or Instant Client Package - Basic Lite
    • Instant Client Package - SDK
    • Instant Client Package - SQL*Plus (optional but nice to have)
  2. (more…)

Credit card type and luhn check in ruby

Wednesday, March 15th, 2006

I was looking at implementing a luhn and credit card type check the other day in java and I noticed that there seems to be a lack of code for doing this in ruby. So I figured I would put something together for doing the checks in ruby.

The following function will do a luhn check for a given number (any number not just credit card numbers). The luhn algorithm is fairly simple, if you want to learn more about it check here.

def luhnCheck(ccNumber)
  ccNumber = ccNumber.gsub(/\\D/, '')
  cardLength = ccNumber.length
  parity = cardLength % 2

  sum = 0
  for i in 0...cardLength
    digit = ccNumber[i] - 48

    if i % 2 == parity
      digit = digit * 2
    end

    if digit > 9
      digit = digit - 9
    end

    sum = sum + digit
  end

  return (sum % 10) == 0
end

Before running the luhn check you may want to verify that you have a valid card type or at least one you want to accept. The following function will do that based on the current bin ranges for the differenct companies as of today (for more on this see the following: credit card number information and BIN range information). N.B. Bin ranges change from time to time so this will become dated. It should be easy enough to find the updated ranges.

def ccTypeCheck(ccNumber)
  ccNumber = ccNumber.gsub(/\\D/, '')
  case ccNumber
    when /^3[47]\\d{13}$/ then return "AMEX"
    when /^4\\d{12}(\\d{3})?$/ then return "VISA"
    when /^5\\d{15}|36\\d{14}$/ then return "MC"
    when /^6011\\d{12}|650\\d{13}$/ then return "DISC"
    when /^3(0[0-5]|8[0-1])\\d{11}$/ then return "DINERS"
    when /^(39\\d{12})|(389\\d{11})$/ then return "CB"
    when /^3\\d{15}|1800\\d{11}|2131\\d{11}$/ then return "JCB"
    else return "NA"
  end
end

GIS Geocoding experiments

Wednesday, March 8th, 2006

I’ve been evaluating a couple different mapping software packages recently and the other day I noticed that the same addresses geocoded (for those who don’t know what geocoding is you can find out more about it here) to different locations. They are mostly the same but I figured it was interesting enough to do some more digging and see how different mapping services compared.I looked at the following services. Some of them are commercial services with open apis (ESRI and mapquest) and some of them are non-commercial services with open apis (yahoo and google although google does not have a geocoding api).

For google I viewed the resulting values for latitude and longitude that were generated from a search for the address. For yahoo and ESRI I used their REST geocoding apis and for mapquest I used their java api to their commercial service since their openapi service is only in beta currently.I took 5 addresses located at different points in the US and one in Canada and mapped the returned latitude and longitude from each service. Here are the results:
(more…)

Long arguments and getops

Tuesday, March 7th, 2006

I recently had a need to adapt a script that recrawls a site with nutch. One of my design goals was to use the same command line options as the Fetchtool (one of the steps I had to take to recrawl a site).

It became apparent fairly quickly that bash’s built-in ‘getopts’ didn’t support long command line arguments, so I had to fall back on getopt.

Here is the portion of the script that parses the command line arguments:

set -- `getopt -n$0 -u -a --longoptions="depth: adddays: topN:" "h" "$@"` || usage
[ $# -eq 0 ] && usage

while [ $# -gt 0 ]
do
    case "$1" in
       --depth)   depth=$2;shift;;
       --adddays) adddays=$2;shift;;
       --topN)    topN=$2;shift;;
       -h)        usage;;
       --)        shift;break;;
       -*)        usage;;
       *)         break;;            #better be the crawl directory
    esac
    shift
done

Deconstructing this bit by bit:

(more…)