<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:series="http://unfoldingneurons.com/"
	>

<channel>
	<title>Tom Gidden &#187; indexing</title>
	<atom:link href="http://gidden.net/tom/tag/indexing/feed/" rel="self" type="application/rss+xml" />
	<link>http://gidden.net/tom</link>
	<description></description>
	<lastBuildDate>Sun, 01 May 2011 10:35:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Inverted Index Searching as Stored Procedures</title>
		<link>http://gidden.net/tom/2008/06/17/inverted-index-searching-as-stored-procedures/</link>
		<comments>http://gidden.net/tom/2008/06/17/inverted-index-searching-as-stored-procedures/#comments</comments>
		<pubDate>Tue, 17 Jun 2008 09:38:20 +0000</pubDate>
		<dc:creator>Tom Gidden</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Techie]]></category>
		<category><![CDATA[fulltext]]></category>
		<category><![CDATA[Google Code]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[inverted-index]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[search-engine]]></category>
		<category><![CDATA[searching]]></category>
		<category><![CDATA[stored procedures]]></category>
		<category><![CDATA[triggers]]></category>

		<guid isPermaLink="false">http://gidden.net/tom/?p=52</guid>
		<description><![CDATA[An old colleague of mine has persuaded me to release an implementation of an "inverted index"-based search library, written solely as MySQL Stored Procedures.  Our combined work is now available on Google Code.

It's been a long time since my last post... I've been spending most of my time recovering (still!) and working on a new game project for some old friends.  On the way, I've learned ActionScript 3, Flash, Papervision3D, Box2DFlash, and a whole slew of other fun stuff.  As soon as the project sees the light of day, I'll be announcing it here.
I've also done a bit of consultancy work on the way which catalysed the stored procedure work I'm releasing here.  As I mentioned in a previous post, I had an article published in php&#124;architect Magazine in which I presented some simple code to do database searches in PHP.
The basic concept of this method is to index content in a database table by separating it into words and storing the locations of those words in a separate table.  This is about the most basic form of search engine, other than the "wildcard search" that many database developers seem to use, unfortunately.  Wildcard searches are incredibly inefficient and unscalable.  The other alternative is the MySQL (and MyISAM) specific FULLTEXT approach, which I've never been particularly happy with.
The Inverted Index technique is nothing new, and certainly not rocket science.  However, I find that many (esp. younger) programmers haven't heard of this method, and rely on external libraries, specific hacks (such as FULLTEXT), or usually the dreaded "string LIKE '%foo%'" construction which can stop a MySQL server in its tracks.
It's no substitute for a proper search engine library, such as Apache Lucene, but it does have the benefit of being easily integrated into other queries.  The problem of searching documents is one thing, but sometimes you just need to search an address field, biography field, or something like that, while still searching on other columns as well.  While some external libraries will allow integration with MySQL through UDFs, that adds a whole extra maintenance load and is also not usually possible on shared database servers.
The approach is fairly easy to implement in an application language such as Perl or PHP.  I've been doing it for years.  However, it's still involved quite a lot of setup and maintenance.
Instead, I've written an implementation that does everything within the database itself using triggers and stored procedures.
This allows the programmer to treat the data table as a simple table, and then call a single stored procedure to perform a search.  More complicated queries can be constructed using the same data, but still keeping the same triggers in place to do the dirty work.
While this approach might not be the most efficient way of doing things, I think it's the most self-contained and simplest to use.  It should work on all MySQL storage engines, and I imagine it could be adapted to run on other RDBMSes altogether.
Anyway, I mentioned the code I'd written to my old colleague, Stig Palmquist, and he felt that it was valuable work that could do with being released as an open-source project.  Moreover, he was eager to contribute to the effort. So, it wasn't long before we started a Google Code project, and we've both significantly improved the code since.  It's open-sourced under the Apache License 2.0.]]></description>
		<wfw:commentRss>http://gidden.net/tom/2008/06/17/inverted-index-searching-as-stored-procedures/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search Engine Article in php&#124;architect Magazine</title>
		<link>http://gidden.net/tom/2006/09/21/search-engine-article-published/</link>
		<comments>http://gidden.net/tom/2006/09/21/search-engine-article-published/#comments</comments>
		<pubDate>Thu, 21 Sep 2006 20:58:25 +0000</pubDate>
		<dc:creator>Tom Gidden</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Techie]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[inverted-index]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[pdo]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[php-architect]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[search-engine]]></category>

		<guid isPermaLink="false">http://gidden.net/tom/2006/09/21/search-engine-article-published/</guid>
		<description><![CDATA[I just got the regular monthly email from php&#124;architect Magazine informing me that this month's issue is ready to download, and listing all the wonderful things inside.  Turns out they went ahead and published the article I wrote for them a couple of months ago.

The article was titled "How To Write Your Own Search Engine", although they've retitled it and updated it a bit.  It covers the use of the inverted index technique to write a search tool using MySQL.  I've been doing this kind of technique for a few years now, and I've had to explain it to colleagues so many times, it's nice to finally have an article I can give them instead.
I think it came out well, although I may have spotted a small bug introduced when they tweaked the code I wrote... I'm not sure, as I'm pretty tired and hopped up on morphine, amitriptyline and codeine (all prescribed!) to try to loosen me up after a fairly painful ten-hour trip to London and back for my six-month post-operative appointment with my orthopaedic surgeon.
Anyway, I *think* they've missed out a strtolower() call in Listing 3 around line 17 or 18, thus making searches case-sensitive, even though the indexer downcases everything.  As a result, I'm not sure the search will actually work unless you always type in lowercase.
Well, not to worry... whoever did the code clearup must've had a fairly tedious job of getting rid of some of my code style idiosyncracies in an effort to make it more "standard", so I don't begrudge them the odd bug or two.
It was a bit of a surprise seeing the article, as I hadn't heard back from them since I'd submitted the original draft a while ago.  Other than a few updates to PDO rather than the mysqli based code I submitted, they haven't made many changes, so I guess my draft was okay as it was.]]></description>
		<wfw:commentRss>http://gidden.net/tom/2006/09/21/search-engine-article-published/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

