I used this to extract referrer urls from the http server log, piping the output through deduplication, i.e. FilterInputDeletingDuplicateLines, into sort. (Or deduplicate with uniq after sorting.) The appended variant extract-referrals emits both the referred url and its referrer. ---- #!/usr/local/bin/scsh \ -e main -s USAGE: extract-qurls.scm < INPUT > OUTPUT Copy quoted urls from std INPUT to std OUTPUT, sans quotes, line by line. This INPUT line i577B5019.versanet.de - - [18/Nov/2007:01:36:44 +0100] "GET /pix/broad.jpg HTTP/1.1" 200 11371 "http://phat.xxx/fog.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9" would contribute the url http://phat.xxx/fog.html to the OUTPUT. !# ;; Extract (heuristically) quoted urls from (http log on) stdin ;; copy them (sans quotes) to stdout, line by line (define (extract-urls) (awk (read-line) (record) () ((: "\"" (submatch (: (+ alphabetic) "://" (* any))) "\" ") => (lambda (match) (display (match:substring match 1)) (newline))))) (define (main args) (if (= (length args) 1) (extract-urls) (format #t "Usage: ~a < INPUT > OUTPUT~%" (first args)))) ---- A variant ;; Extract the referred local url and its quoted referrer. ;; Copy them to stdout (sans quotes, separated by a tab), ;; line by line (define (extract-referrals) (awk (read-line) (record) () ((: "GET " (submatch (: (+ (~ whitespace)))) (* any) "\"" (submatch (: (+ alphabetic) "://" (* any))) "\" ") => (lambda (match) (format #t "~a\t~a~%" (match:substring match 1) (match:substring match 2)) ))))