8/11/2006; 2:28:26 AM
An Open Source search engine ... for my FlowMining project ...
I have been looking around for a while for an Open Source search engine that can be used for my FlowMining project ... I'm thinking of giving this one a try.
ASPSeek 1.1.4 (Development). An Internet search engine. [freshmeat.net]About three or four years ago, I was doing a lot of research into Proxy/Cache engines and the various applications that can be developed on top of them. Here at Novell we developed one of the most scalable Proxy/Cache engines available on Intel hardware - BorderManager - and then created an appliance version called ICS that has been licensed to a range of hardware vendors. Since then, we have spun off that division of the company as a company called Volera. Although many people had a focus on the 'bandwidth savings' that a Proxy/Cache might deliver, I really saw two core capabilites that excited me - content distribution networks and community services. I'll comment on content distribution networks laster ... this search engine relates to community services ...
One of my interests is in the areas of community, and leveraging the power of the humans operating in teams. All of this contributes to efficient and productive operation of communities and other organizations. When a Proxy/Cache is shared by numerous people in any particular organization or community, the Proxy/Cache could be enhanced with additional services that would provide value for the community. For myself, I would often find that as I cruised the net, I would later want to find the web page where I had seen a particular comment or statement made. This caused me to look for a solution where I had my own personal search engine ... something that would index the content that I was reading as I read it. I would then be able to go back and search through the pages that I had read ... not the entire Internet.
My leap to the concepts of FlowMining was when I realized that if a group of user were using a Proxy/Cache, we could have the content of that cache indexed automatically. So that as a team, we would now be populating the search engine with the web pages that our team had found and read. Doing this, we could leverage some our 'human web crawling' capabilites. If I implemented this in a Proxy/Cache engine, then we would actually be 'Mining' the 'Flow' of content through the Proxy ... FlowMining.
Articles like this one in KMWorld outline the issues related to the current trends and techniques for creating taxonomies of information. I'm thinking that a community of people might benefit from a taxonomy that they built. And so I now have found the search engine ... and am going to give it a try to hook it up to our Proxy/Cache engine and see what I can create ... this will be fun ...