Displaying 1 to 10 from 97 results
Grub Next Generation is distributed web crawling system (clients/servers) which helps to build and maintain index of the Web. It is client-server architecture where client crawls the web and updates the server. The peer-to-peer grubclient software crawls during computer idle time.
Open Search Server is both a modern crawler and search engine and a suite of high-powered full text search algorithms. Built using the best open source technologies like lucene, zkoss, tomcat, poi, tagsoup. Open Search Server is a stable, high-performance piece of software.
ASPseek is an Internet search engine software developed by SWsoft.ASPseek consists of an indexing robot, a search daemon, and a CGI search frontend. It can index as many as a few million URLs and search for words and phrases, use wildcards, and do a Boolean search. Search results can be limited to time period given, site or Web space (set of sites) and sorted by relevance (PageRank is used) or date.
Pavuk is a UNIX program used to mirror the contents of WWW documents or files. It transfers documents from HTTP, FTP, Gopher and optionally from HTTPS (HTTP over SSL) servers. Pavuk has an optional GUI based on the GTK2 widget set.
a python spider ,easy customization
Andjing Web Crawler 0.01 pre AlphaAndjing is a basic web crawler/spider written in PHP and running in CLI environment. Requirements:PHP MySQL To Do:Change database using SQLite instead of MySQL to save more CPU resource. What You Can Do:You can modify this application into a powerfull email harvester and or content crawler. Application Usage:Extract the files Create database and table from SQL dump file included Edit config.php and change as needed Run C:\\andjing>php.exe andjing.php http://some
starting from a original words, crawl their relative words. And save them in a txt file. for example from the word "ä¸Šæµ·å¤©æ°”", we can crawl the following words: ä¸Šæµ·å¤©æ°”é¢„æŠ¥, ä¸Šæµ·ä¸€å‘¨å¤©æ°”, ä¸Šæµ·ä¸€å‘¨å¤©æ°”é¢„æŠ¥, ä¸Šæµ·å¤©æ°”é¢„æŠ¥æŸ¥è¯¢, ä¸Šæµ·æ˜Žå¤©å¤©æ°”, ä¸Šæµ·çš„å¤©æ°”, ä¸Šæµ·å¤©æ°”æŸ¥è¯¢,\tä¸Šæµ·æ˜Žå¤©å¤©æ°”é¢„æŠ¥, ä¸Šæµ·ä»Šæ—¥å¤©æ°”, ä¸Šæµ·ä¸‹å‘¨å¤©æ°”
Bayes-Swarm is a research project, its aim is to spider web sources (news portals, blogs and online newspapers) and extract correlations between such sources. ImportantBayes-Swarm is no longer under active development. The http://www.bayes-swarm.com no longer hosts the spider frontend interface. Feel free to navigate the documentation and use the code for your own purposes, just don't expect many new features to be released... For any info, including access to spidered data (roughly 8Gb of tar g
AraÃ±a web testing libraryAraÃ±a ("spider" in Spanish) is a simple web testing library, written in C#. It can be used to integrate simple testing of web applications into unit testing, so the parts of your web application can be tested separately as well as how they work together. AraÃ±a can follow links, post forms and, through simple CSS selectors, ensure that the content on the pages of your web application is what you expect, and thus can be tested with unit test assert statements. AraÃ±a us
Aranya is spider, using distributed architecture. this project is to complete a safe, efficient, and Configurable Internet information collection system, through the profile, it can provide effective data(pages, photos, etc.) for many kinds of search engines.