Posts Tagged ‘cl-ppcre’

find hyperlinks with lisp

October 23, 2010

here is a quick-and-dirty script to extract all the unique links from a web page.

it uses cl-ppcre to extract the hyperlinks-like strings from a target string. tested using drakma as web client.

(asdf:oos 'asdf:load-op :drakma)
(asdf:oos 'asdf:load-op :cl-ppcre)

(defparameter *url-re* "href\ *=\ *['\"](\\S+)['\"]")

(defun find-links (str)
  (let ((urls '()))
    (ppcre:do-register-groups
      (u) (*url-re* str nil :start 0 :sharedp t)
      (pushnew u urls :test #'equalp))
  (nreverse urls)))

(print
  (find-links (drakma:http-request "https://lbolla.wordpress.com")))

there are 139 links on this page…