Module Ez_html.XmlParser

type t

Abstract type for an Xml parser.

type source =
| SFile of string
| SChannel of Stdlib.in_channel
| SString of string
| SLexbuf of Stdlib.Lexing.lexbuf

Several kind of resources can contain Xml documents.

val make : unit -> t

This function returns a new parser with default options.

val prove : t -> bool -> unit

This function enable or disable automatic DTD proving with the parser. Note that Xml documents having no reference to a DTD are never proved when parsed (but you can prove them later using the Dtd module (by default, prove is true).

val resolve : t -> (string -> Dtd.checked) -> unit

When parsing an Xml document from a file using the Xml.parse_file function, the DTD file if declared by the Xml document has to be in the same directory as the xml file. When using other parsing functions, such as on a string or on a channel, the parser will raise every time Xml.File_not_found if a DTD file is needed and prove enabled. To enable the DTD loading of the file, the user have to configure the Xml parser with a resolve function which is taking as argument the DTD filename and is returning a checked DTD. The user can then implement any kind of DTD loading strategy, and can use the Dtd module functions to parse and check the DTD file (by default, the resolve function is raising Xml.File_not_found).

val check_eof : t -> bool -> unit

When a Xml document is parsed, the parser will check that the end of the document is reached, so for example parsing "<A/><B/>" will fail instead of returning only the A element. You can turn off this check by setting check_eof to false (by default, check_eof is true).

val parse : t -> source -> Xml_types.xml

Once the parser is configurated, you can run the parser on a any kind of xml document source to parse its contents into an Xml data structure.

val concat_pcdata : t -> bool -> unit

When several PCData elements are separated by a \n (or \r\n), you can either split the PCData in two distincts PCData or merge them with \n as separator into one PCData. The default behavior is to concat the PCData, but this can be changed for a given parser with this flag.