Semantic Vectors - Creating and Searching Semantic Vector using Lucene

The Semantic Vectors package uses a Random Projection algorithm, a form of automatic semantic analysis. Other methods supported by the package include Latent Semantic Analysis (LSA) and Reflective Random Indexing. Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. This library is used in semantic analysis and text mining.



http://code.google.com/p/semanticvectors/

Bookmark and Share          2427



comments powered by Disqus


Related Products

S-Space - A scalable software library for semantic spaces

The S-Space Package is a collection of algorithms for building Semantic Spaces as well as a highly-scalable library for designing new distributional semantics algorithms. Distributional algorithms process text corpora and represent the semantic for words as high dimensional feature vectors.

Read more

Ultraslicklsa - A python Latent Semantic Analisys ( LSA ) module for making search queries.

Version 0.1 is out! For more info just read the doctest examples in the QueryLSA and UpdateLSSP modules. Then just complete the interfacedb.py to use it on your own corpus database. A python Latent Semantic Analisys (LSA) module for making search queries in a database corpus. I did this python module while working for a school project at INSA in Rouen, France. The module interacts with a database using custom DB queries located in the interfacedb class. When using with an other DB model only thi

Read more

Ephyra - Question Answering System

Ephyra is a modular and extensible framework for open domain question answering (QA). The system retrieves accurate answers to natural language questions from the Web and other sources. The goal is to give researchers the opportunity to develop new QA techniques without worrying about the end-to-end system.

Read more

Gate - General Architecture for Text Engineering

GATE excels at text analysis of all shapes and sizes. It provides support for diverse language processing tasks such as parsers, morphology, tagging, Information Retrieval tools, Information Extraction components for various languages, and many others. It provides support to measure, evaluate, model and persist the data structure. It could analyze text or speech. It has built-in support for machine learning and also adds support for different implementation of machine learning via plugin.

Read more

OpenPipe - Document Pipeline

OpenPipe is an open source scalable platform for manipulating a stream of documents. A pipeline is an ordered set of steps / operations performed on a document to convert from its raw form to something ready to be put into the index. The operations performed on documents include language detection, field manipulation, POS tagging, entity extraction or submitting the document to a search engine.

Read more

SMILA - Unified information access architecture

SMILA is an extensible framework for building search solutions to access unstructured information in the enterprise. Besides providing essential infrastructure components and services, SMILA also delivers ready-to-use add-on components, like connectors to most relevant data sources. Using the framework as their basis will enable developers to concentrate on the creation of higher value solutions, like semantic driven applications etc.

Read more

ANTLR - ANother Tool for Language Recognition

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day.

Read more

Pattern-for-python - Pattern is a web mining module for the Python programming language.

Pattern is a web mining module for the Python programming language.It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, Wordnet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks). The module is bundled with 30+ example scripts. http://www.clips.ua.ac.be/pages/pattern The project has moved to github: http://github.com

Read more

Greensearch - Enterprise search platform build from java . This use LSI(Latent Semantic Indexing) an

Green Search is open source enterprise search platform build from java . This use LSI(Latent Semantic Indexing) and SDD(Semi-Discrete Decomposition)algorithm. This software can index 1,000,000 Japanese wikipedia pages, And can search. Green Search�オープンソース�業内検索エンジン���潜在的�味検索(概念検索)を�能����。アルゴリズム�SDDを利用�����。 ��システム��日本語�wikipedia 1,000,000 ページをイン

Read more

Language Detection - Language Detection Library in Java

This is a language detection library implemented in plain Java. It detects language of a text using naive Bayesian filter. It is 99% over precision for 53 languages.

Read more

Related Tags
Browse projects by tags.