TagSoup - HTML/XML parser for Haskell
TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping. The library provides a basic data type for a list of unstructured tags, a parser to convert HTML into this tag type, and useful functions and combinators for finding and extracting information.
http://community.haskell.org/~ndm/tagsoup/
comments powered by Disqus
Related Products
TagSoup - SAX-compliant parser in Java
TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.
Omssa-parser - java based parser for omssa omx files
OMSSA ParserNews What is OMSSA Parser? Downloads Using OMSSA Parser OMSSA Viewer OMSSA Parser Jar File UML Class Diagram Modification Details Scaled Values OMSSA Enumerations Converting OMSSA OMX Files to PRIDE XML Result Analysis Maven Dependency Troubleshooting Memory Settings MSBioSeq Screenshot Example OMX File Spectra Identifications OMSSA Parser Publications: Barsnes et al: Proteomics 2009 Jul;9(14):3772-4. If you use OMSSA Parser as part of a paper, please include the reference above. Sea
Xmlcc - A platform independent object-oriented C++ library for generating, writing and parsing XML a
XMLCCXMLCC is a C++ library for handling XML using Design Patterns especially the Composite Pattern. AboutXMLCC allows for generating XML structures using a hierarchical object-oriented model that can be written to an XML file easily. Parsing is available by several parsers; a DOM like parser building the complete object-oriented model that can be searched for XML tags afterwards, or a SAX like parser that can by specialized to an XML structure by implementing an API. Both parsers are char by ch
Yghtmlparser - Rapid Java HTML Parser Project
IntroductionThis is the private project to research and develop the Java HTML parser. There are a number of open-source HTML parser developed by using JAVA but most of those parsers cannot parse some web pages correctly because of ambiguousness of HTML syntax and some of the parsers are too heavy to use. Developing the HTML parser is definitely differ from XML parser because HTML parser MUST solve and cover the ambiguous syntax by itself. For example, 'BR' tag is usually used only open tag but i
Php-mime-mail-parser - PHP Mime Mail Parser
This project strives to create a fast and efficient PHP Mime Mail Parser Class using PHP's MailParse Extension. Example Usage<?phprequire_once('MimeMailParser.class.php');$path = 'path/to/mail.txt';$Parser = new MimeMailParser();$Parser->setPath($path);$to = $Parser->getHeader('to');$from = $Parser->getHeader('from');$subject = $Parser->getHeader('subject');$text = $Parser->getMessageBody('text');$html = $Parser->getMessageBody('html');$attachments = $Parser->getAttachments();?>There are three i
Polparser - Lightweight generic text parser in Obj-C
PolParser is lightweight generic text parser in Obj-C for Mac OS X Leopard and later. PolParser creates a tree from the parsing of the input text. It currently supports various text formats like XML, RSS, Atom, HTML, Apple Property Lists, CSV... as well as source code for C style languages like C, C++, Obj-C..., and it's quite easy to add support for new text formats or languages. The fact PolParser generates a tree makes it quite easier to use than NSScanner & friends for complex parsing and ba
Jssaxparser - A SAX 2 parser written in Javascript
Javascript SAX 2 ParserA light weight JavaScript SAX 2 parser which reads an XML text and triggers standardized SAX 2 events. IntroductionThat parser is able to read XML and its associated DTD. It will throw the events of : contentHandler errorHandler dtdHandler entityResolver declarationHandler lexicalHandler conforming to specification at http://www.saxproject.org/ . How to use itImport library<script type="text/javascript" src="../jssaxparser/sax.js"></script><script type="text/javascript" sr
Arbica
Arabica is an XML and HTML processing toolkit, providing SAX, DOM, XPath, and partial XSLT implementations, written in Standard C++.
Amberalert404 - Amber Alert 404 pagina code. Laat uw foutmelding ook nuttig zijn.
Amber Alert 404 paginaOverDit project heeft tot doel een eenvoudige foutmelding te maken voor Apache+PHP voor Amber Alert (http://www.amberalertnederland.nl/) Naar een idee van: http://tech.bluesmoon.info/2010/02/missing-kids-on-your-404-page.html Dit idee leek mij dusdanig zinvol dat ik een php variant heb gemaakt voor Amber Alert. Vragen en opmerkingen kunnen naar adrianus<apestaartje>warmenhoven<punt>nlHoe te gebruikenHet gebruik is heel eenvoudig: Kopiëer de code van 404.php in een editor,
Simple-lexing-parsers-4-scala - Scala's RegexParsers augmented with a lightweight built-in lexic
Simple Lexing Parsers for ScalaThis software defines a lightweight (30Kb all-inclusive jar file) Scala trait SimpleLexingParsers that augments Scala's RegexParsers with a built-in lexical-analyzer (or lexer). Parsers that extend SimpleLexingParsers have lexical analysis capabilities similar to those of Lex, Flex, and JLex. SimpleLexingParsers obviates the need for a bunch of utility classes (StandardTokenParsers, JavaTokenParsers, RegexParsers, etc.). This package includes the traits slp.Standar