"Web search technology is a huge subject, encompassing:
-
networking (spidering the web),
-
string and markup-language manipulation (parsing HTML)
-
proprietary file formats (searching Word, Excel, PDF, etc)
-
language and text-parsing (finding words & sentences in documents, stemming and other linguistic analysis),
-
algorithms (finding matches, AND/OR queries, combining multiple word results)
-
performance (both increasing spidering speed, and making large catalogs fast to search)
-
user interface (presenting search input options, and results)
Searcharoo.net hardly touches the surface on any of these topics
but it does attempt to introduce them with an open-source C# implementation of a search engine that you can download and use on your website."


