Open Search Server
Open Search Server is both a modern crawler and search engine and a suite of high-powered full text search algorithms. Built using the best open source technologies like lucene, zkoss, tomcat, poi, tagsoup. Open Search Server is a stable, high-performance piece of software.
- Multi-languages indexing
- The crawlers go through web sites and file systems to rapidly and easily build your index.
- Numerous document formats are supported, such as XML, HTML/XHTML, Adobe? PDF, Microsoft? Word?, PowerPoint?, OpenOffice?, etc
- Quick integration thanks to an XML interface via HTTP queries (XML over HTTP) and PHP classes
- The web interface is built around the power offered by the Zkoss framework. It runs with the main Ajax browsers. This RIA-type interface is as comfortable to use as that of a heavy client
comments powered by Disqus
Jx9 is an embeddable scripting engine that implements a Turing complete programming language based on JSON. Jx9 is the ideal library to use in applications that require modern and efficient scripting support such as games, database systems, text editors, network applications and so forth. Jx9 also natively supports multi-threading and the concept of separate engine handles and virtual machines.
ASPseek is an Internet search engine software developed by SWsoft.ASPseek consists of an indexing robot, a search daemon, and a CGI search frontend. It can index as many as a few million URLs and search for words and phrases, use wildcards, and do a Boolean search. Search results can be limited to time period given, site or Web space (set of sites) and sorted by relevance (PageRank is used) or date.
UnQLite is a in-process software library which implements a self-contained, serverless, zero-configuration, transactional NoSQL database engine. UnQLite is a document store database as well a standard Key/Value store. The database file format is cross-platform, you can freely copy a database between 32-bit and 64-bit systems or between big-endian and little-endian architectures.
Grub Next Generation is distributed web crawling system (clients/servers) which helps to build and maintain index of the Web. It is client-server architecture where client crawls the web and updates the server. The peer-to-peer grubclient software crawls during computer idle time.
An open source .NET web crawler written in C# using SQL 2005/2008. Arachnode.net is a complete and comprehensive .NET web crawler for downloading, indexing and storing Internet content including e-mail addresses, files, hyperlinks, images, and Web pages.
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix is designed to respect the robots.txt exclusion directives and META robots tags, and collect material at a measured, adaptive pace unlikely to disrupt normal website activity.
mnoGoSearch for UNIX consists of a command line indexer and a search program which can be run under Apache Web Server, or any other HTTP server supporting CGI interface. mnoGoSearch for Unix is distributed in sources and can be compiled with a number of databases, depending on user's choice. It is known to work on a wide variety of the modern Unix operating systems including Linux, FreeBSD, Mac OSX, Solaris and others.
Search SpiderSaaral Soft Search Spider is created using perl with GTK+ as a front end. It is basically a Web Spider, Which can find links from given seed site also search in the found web pages. In simple terms, It works like a web search engine without indexing any data. saaral-soft-search-spider is also a good example for Perl GTK2 usage and Using GTK2 widgets inside perl threads. How to Run?To Run Windows BinarySearchSpider is also available as standalone executable for windows OS (Windows XP