The Letters From Terra search facility
Constructed with surprising ease, the
LFT search facility is now online and ready for your
eager appreciation. Apart from being a (hopeful) enhancement
to the site, this utility is more a programming project
in itself; based on asp scripts written by your fair
author, all of the search algolrithms and database construction
is my own. I've always been inexplicably distrustful
of the search tools offered on several webpage enhancement
sites; this is odd, since I use their counters and guestbooks
with indifference, and usually my own modifications;
but there seems something appaling to me in allowing
the product of your hours of labour injecting content
into the barren framework of a website to be shredded
by an external spider and exhibit in combinations unknown
and unapproved to myself. In summary, then, this part
of this site is the product of an idle rainy day, and
my irrational protectiveness over my web content; perhaps
I am a little deluded over the utility this will hold
to the general viewer of my site, but it was certainly
fun to put together, and reinforced my seething hatred
of microsoft office, which can't be a bad thing. The
first problem I faced was the extraction of the written
content of the site; LFT contains around 100 proper
pages, each with the content largly confined to the
azure box which your eyes now rests upon, the entire
site being based around the same dreamweaver template.
I build a simple parsing program (christened 'Marthe',
after Cincinnatus' wife in Invitation to a Beheading),
which extracted all words external to metatags in the
body part of HTML files selected off the hard drive.
Common anomolies, such as the ubiquitous nbsp;
were eliminated through direct filters, and an alphanumeric
selector was used to ensure only real words were extracted.
This data was then written to a text file using the
CSV (comma seperated values) format, but using the useless
squiggle (~) as a seperator to leave the text as undamaged
as possible, with the body of the text divided into
segments of 25 words each (this seemed the most auspicious
value for flexibility without rendering the results
unreadble). After messing around for some hours with
the damned import tool (which, for the record, doesn't
bloody work properly) in MS Access, I managed to transfer
this data intact into an access database, which opened
the way for the asp utilities to be written. The aim
is to create a search tool which produces results in
a similar manner to google, with short exerpts from
the text being presented to justify the result (my theory
is that this, and not any accuracy or scale, is the
secret of google's success). In order to function, the
search portion of the site sits on my wonderful free
ASP server, hosted by the kind people at 1asphost.com
(it really is an excellent service: if you want to try
your hand at asp, check it out); but upon selecting
a result to view, you'll be transfered back to the tiscali
server and the bulk of the LFT site. It should be noted
that this search facility IN NO WAY PROVIDES ACCESS
TO THE METABOLITE DATABASE, as yet. There is a perfectly
good, and much more detailed search specific for that
part of the site under the LFT metabolite database section,
which can be accessed through the main menu to the left
(under biochemistry, oddly). Anyway, enjoy this facility;
if it's of any use whatsoever to one person that justified
my bored afternoon. Which is what life's about!
Back to Search Letters
from Terra | Back
to Letters from Terra Home