About me

Vladimir Nabokov

Books
Programming
Biochemistry
Aim | Communicate

 


 

 

     

The Letters From Terra search facility

Constructed with surprising ease, the LFT search facility is now online and ready for your eager appreciation. Apart from being a (hopeful) enhancement to the site, this utility is more a programming project in itself; based on asp scripts written by your fair author, all of the search algolrithms and database construction is my own. I've always been inexplicably distrustful of the search tools offered on several webpage enhancement sites; this is odd, since I use their counters and guestbooks with indifference, and usually my own modifications; but there seems something appaling to me in allowing the product of your hours of labour injecting content into the barren framework of a website to be shredded by an external spider and exhibit in combinations unknown and unapproved to myself. In summary, then, this part of this site is the product of an idle rainy day, and my irrational protectiveness over my web content; perhaps I am a little deluded over the utility this will hold to the general viewer of my site, but it was certainly fun to put together, and reinforced my seething hatred of microsoft office, which can't be a bad thing. The first problem I faced was the extraction of the written content of the site; LFT contains around 100 proper pages, each with the content largly confined to the azure box which your eyes now rests upon, the entire site being based around the same dreamweaver template. I build a simple parsing program (christened 'Marthe', after Cincinnatus' wife in Invitation to a Beheading), which extracted all words external to metatags in the body part of HTML files selected off the hard drive. Common anomolies, such as the ubiquitous nbsp; were eliminated through direct filters, and an alphanumeric selector was used to ensure only real words were extracted. This data was then written to a text file using the CSV (comma seperated values) format, but using the useless squiggle (~) as a seperator to leave the text as undamaged as possible, with the body of the text divided into segments of 25 words each (this seemed the most auspicious value for flexibility without rendering the results unreadble). After messing around for some hours with the damned import tool (which, for the record, doesn't bloody work properly) in MS Access, I managed to transfer this data intact into an access database, which opened the way for the asp utilities to be written. The aim is to create a search tool which produces results in a similar manner to google, with short exerpts from the text being presented to justify the result (my theory is that this, and not any accuracy or scale, is the secret of google's success). In order to function, the search portion of the site sits on my wonderful free ASP server, hosted by the kind people at 1asphost.com (it really is an excellent service: if you want to try your hand at asp, check it out); but upon selecting a result to view, you'll be transfered back to the tiscali server and the bulk of the LFT site. It should be noted that this search facility IN NO WAY PROVIDES ACCESS TO THE METABOLITE DATABASE, as yet. There is a perfectly good, and much more detailed search specific for that part of the site under the LFT metabolite database section, which can be accessed through the main menu to the left (under biochemistry, oddly). Anyway, enjoy this facility; if it's of any use whatsoever to one person that justified my bored afternoon. Which is what life's about!

Back to Search Letters from Terra | Back to Letters from Terra Home

 
 
Letters from Terra | Updated 28th February 2004 | By Jonathan Ayling