here is a quick-and-dirty script to extract all the unique links from a web page.
it uses cl-ppcre to extract the hyperlinks-like strings from a target string. tested using drakma as web client.
(asdf:oos 'asdf:load-op :drakma)
(asdf:oos 'asdf:load-op :cl-ppcre)
(defparameter *url-re* "href\ *=\ *['\"](\\S+)['\"]")
(defun find-links (str)
(let ((urls '()))
(ppcre:do-register-groups
(u) (*url-re* str nil :start 0 :sharedp t)
(pushnew u urls :test #'equalp))
(nreverse urls)))
(print
(find-links (drakma:http-request "https://lbolla.wordpress.com")))
there are 139 links on this page…
Advertisements
Leave a Reply