Semantic Vectors - Creating and Searching Semantic Vector using Lucene
The Semantic Vectors package uses a Random Projection algorithm, a form of automatic semantic analysis. Other methods supported by the package include Latent Semantic Analysis (LSA) and Reflective Random Indexing. Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. This library is used in semantic analysis and text mining.
comments powered by Disqus
The S-Space Package is a collection of algorithms for building Semantic Spaces as well as a highly-scalable library for designing new distributional semantics algorithms. Distributional algorithms process text corpora and represent the semantic for words as high dimensional feature vectors.
Version 0.1 is out! For more info just read the doctest examples in the QueryLSA and UpdateLSSP modules. Then just complete the interfacedb.py to use it on your own corpus database. A python Latent Semantic Analisys (LSA) module for making search queries in a database corpus. I did this python module while working for a school project at INSA in Rouen, France. The module interacts with a database using custom DB queries located in the interfacedb class. When using with an other DB model only thi
Ephyra is a modular and extensible framework for open domain question answering (QA). The system retrieves accurate answers to natural language questions from the Web and other sources. The goal is to give researchers the opportunity to develop new QA techniques without worrying about the end-to-end system.
GATE excels at text analysis of all shapes and sizes. It provides support for diverse language processing tasks such as parsers, morphology, tagging, Information Retrieval tools, Information Extraction components for various languages, and many others. It provides support to measure, evaluate, model and persist the data structure. It could analyze text or speech. It has built-in support for machine learning and also adds support for different implementation of machine learning via plugin.
OpenPipe is an open source scalable platform for manipulating a stream of documents. A pipeline is an ordered set of steps / operations performed on a document to convert from its raw form to something ready to be put into the index. The operations performed on documents include language detection, field manipulation, POS tagging, entity extraction or submitting the document to a search engine.
SMILA is an extensible framework for building search solutions to access unstructured information in the enterprise. Besides providing essential infrastructure components and services, SMILA also delivers ready-to-use add-on components, like connectors to most relevant data sources. Using the framework as their basis will enable developers to concentrate on the creation of higher value solutions, like semantic driven applications etc.
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day.
Pattern is a web mining module for the Python programming language.It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, Wordnet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks). The module is bundled with 30+ example scripts. http://www.clips.ua.ac.be/pages/pattern The project has moved to github: http://github.com
Greensearch - Enterprise search platform build from java . This use LSI(Latent Semantic Indexing) an
Green Search is open source enterprise search platform build from java . This use LSI(Latent Semantic Indexing) and SDDï¼ˆSemi-Discrete Decompositionï¼‰algorithm. This software can index 1,000,000 Japanese wikipedia pages, And can search. Green Searchã�¯ã‚ªãƒ¼ãƒ—ãƒ³ã‚½ãƒ¼ã‚¹ä¼�æ¥å†…æ¤œç´¢ã‚¨ãƒ³ã‚¸ãƒ³ã�§ã�™ã€�æ½œåœ¨çš„æ„�å‘³æ¤œç´¢ï¼ˆæ¦‚å¿µæ¤œç´¢ï¼‰ã‚’å�¯èƒ½ã�¨ã�—ã�¾ã�™ã€‚ã‚¢ãƒ«ã‚´ãƒªã‚ºãƒ ã�«SDDã‚’åˆ©ç”¨ã�—ã�¦ã�„ã�¾ã�™ã€‚ ã�“ã�®ã‚·ã‚¹ãƒ†ãƒ ã�¯ã€�æ—¥æœ¬èªžã�®wikipedia 1,000,000 ãƒšãƒ¼ã‚¸ã‚’ã‚¤ãƒ³