TagSoup - HTML/XML parser for Haskell

TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping. The library provides a basic data type for a list of unstructured tags, a parser to convert HTML into this tag type, and useful functions and combinators for finding and extracting information.



http://community.haskell.org/~ndm/tagsoup/

Bookmark and Share          4514



comments powered by Disqus


Related Products

TagSoup - SAX-compliant parser in Java

TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.

Read more

Omssa-parser - java based parser for omssa omx files

OMSSA ParserNews What is OMSSA Parser? Downloads Using OMSSA Parser OMSSA Viewer OMSSA Parser Jar File UML Class Diagram Modification Details Scaled Values OMSSA Enumerations Converting OMSSA OMX Files to PRIDE XML Result Analysis Maven Dependency Troubleshooting Memory Settings MSBioSeq Screenshot Example OMX File Spectra Identifications OMSSA Parser Publications: Barsnes et al: Proteomics 2009 Jul;9(14):3772-4. If you use OMSSA Parser as part of a paper, please include the reference above. Sea

Read more

Xmlcc - A platform independent object-oriented C++ library for generating, writing and parsing XML a

XMLCCXMLCC is a C++ library for handling XML using Design Patterns especially the Composite Pattern. AboutXMLCC allows for generating XML structures using a hierarchical object-oriented model that can be written to an XML file easily. Parsing is available by several parsers; a DOM like parser building the complete object-oriented model that can be searched for XML tags afterwards, or a SAX like parser that can by specialized to an XML structure by implementing an API. Both parsers are char by ch

Read more

Yghtmlparser - Rapid Java HTML Parser Project

IntroductionThis is the private project to research and develop the Java HTML parser. There are a number of open-source HTML parser developed by using JAVA but most of those parsers cannot parse some web pages correctly because of ambiguousness of HTML syntax and some of the parsers are too heavy to use. Developing the HTML parser is definitely differ from XML parser because HTML parser MUST solve and cover the ambiguous syntax by itself. For example, 'BR' tag is usually used only open tag but i

Read more

Php-mime-mail-parser - PHP Mime Mail Parser

This project strives to create a fast and efficient PHP Mime Mail Parser Class using PHP's MailParse Extension. Example Usage<?phprequire_once('MimeMailParser.class.php');$path = 'path/to/mail.txt';$Parser = new MimeMailParser();$Parser->setPath($path);$to = $Parser->getHeader('to');$from = $Parser->getHeader('from');$subject = $Parser->getHeader('subject');$text = $Parser->getMessageBody('text');$html = $Parser->getMessageBody('html');$attachments = $Parser->getAttachments();?>There are three i

Read more

Polparser - Lightweight generic text parser in Obj-C

PolParser is lightweight generic text parser in Obj-C for Mac OS X Leopard and later. PolParser creates a tree from the parsing of the input text. It currently supports various text formats like XML, RSS, Atom, HTML, Apple Property Lists, CSV... as well as source code for C style languages like C, C++, Obj-C..., and it's quite easy to add support for new text formats or languages. The fact PolParser generates a tree makes it quite easier to use than NSScanner & friends for complex parsing and ba

Read more

Jssaxparser - A SAX 2 parser written in Javascript

Javascript SAX 2 ParserA light weight JavaScript SAX 2 parser which reads an XML text and triggers standardized SAX 2 events. IntroductionThat parser is able to read XML and its associated DTD. It will throw the events of : contentHandler errorHandler dtdHandler entityResolver declarationHandler lexicalHandler conforming to specification at http://www.saxproject.org/ . How to use itImport library<script type="text/javascript" src="../jssaxparser/sax.js"></script><script type="text/javascript" sr

Read more

Arbica

Arabica is an XML and HTML processing toolkit, providing SAX, DOM, XPath, and partial XSLT implementations, written in Standard C++.

Read more

Amberalert404 - Amber Alert 404 pagina code. Laat uw foutmelding ook nuttig zijn.

Amber Alert 404 paginaOverDit project heeft tot doel een eenvoudige foutmelding te maken voor Apache+PHP voor Amber Alert (http://www.amberalertnederland.nl/) Naar een idee van: http://tech.bluesmoon.info/2010/02/missing-kids-on-your-404-page.html Dit idee leek mij dusdanig zinvol dat ik een php variant heb gemaakt voor Amber Alert. Vragen en opmerkingen kunnen naar adrianus<apestaartje>warmenhoven<punt>nlHoe te gebruikenHet gebruik is heel eenvoudig: Kopiëer de code van 404.php in een editor,

Read more

Simple-lexing-parsers-4-scala - Scala&#39;s RegexParsers augmented with a lightweight built-in lexic

Simple Lexing Parsers for ScalaThis software defines a lightweight (30Kb all-inclusive jar file) Scala trait SimpleLexingParsers that augments Scala's RegexParsers with a built-in lexical-analyzer (or lexer). Parsers that extend SimpleLexingParsers have lexical analysis capabilities similar to those of Lex, Flex, and JLex. SimpleLexingParsers obviates the need for a bunch of utility classes (StandardTokenParsers, JavaTokenParsers, RegexParsers, etc.). This package includes the traits slp.Standar

Read more

Related Tags
Browse projects by tags.

Follow feeds Follow bestopensource on Twitter Follow bestopensource on Facebook


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.

Do you provide Consulting, Training, Support for any open source products. Register your business

Tag Cloud >>