A SAX based parser.
More...
#include "ACEXML/parser/parser/Parser.h"
ACEXML_Parser::ACEXML_Parser |
( |
void |
| ) |
|
ACEXML_Parser::~ACEXML_Parser |
( |
void |
| ) |
|
|
virtual |
int ACEXML_Parser::check_for_PE_reference |
( |
void |
| ) |
|
|
private |
Check for a parameter entity reference. This is used to check for the occurrence of a PE Reference withing markupDecl. Additionally this function consumes any leading or trailing whitespace around the PE Reference.
- Return values
-
Number | of whitespace characters skipped. |
Dispatch errors to ErrorHandler.
void ACEXML_Parser::fatal_error |
( |
const ACEXML_Char * |
msg | ) |
|
|
private |
Dispatch fatal errors to ErrorHandler.
int ACEXML_Parser::get_quoted_string |
( |
ACEXML_Char *& |
str | ) |
|
|
private |
Get a quoted string. Quoted strings are used to specify attribute values and this routine will replace character and entity references on-the-fly. Parameter entities are not allowed (or replaced) in this function. (But regular entities are.)
- Parameters
-
str | returns the un-quoted string. |
- Return values
-
0 | on success, -1 otherwise. |
int ACEXML_Parser::getFeature |
( |
const ACEXML_Char * |
name | ) |
|
|
virtual |
Look up the value of a feature. This method allows programmers to check whether a specific feature has been activated in the parser.
Implements ACEXML_XMLReader.
void * ACEXML_Parser::getProperty |
( |
const ACEXML_Char * |
name | ) |
|
|
virtual |
Initialize the parser state.
- Return values
-
0 | if parser was initialized correctly else -1. |
Check if a character c is a whitespace.
- Return values
-
1 | if c is a valid white space character. 0 otherwise. |
Check if a character c is a BaseChar.
- Return values
-
1 | if c is a valid BaseChar character, 0 otherwise. |
Check if a character c is a valid Char.
- Return values
-
1 | if c is a valid character. 0 otherwise. |
Check if a character c is a valid CharRef character.
- Return values
-
1 | if c is a valid character reference character, 0 otherwise. |
Check if a character c is a CombiningChar.
- Return values
-
1 | if c is a valid CombiningChar character, 0 otherwise. |
Check if a character c is a Digit.
- Return values
-
1 | if c is a valid Digit character, 0 otherwise. |
Check if a character c is an Extender.
- Return values
-
1 | if c is a valid Extender character, 0 otherwise. |
Check if a character c is a Ideographic.
- Return values
-
1 | if c is a valid Ideographic character, 0 otherwise. |
Check if a character c is a Letter.
- Return values
-
1 | if c is a valid Letter character, 0 otherwise. |
Check if a character is an acceptable NameChar.
- Return values
-
1 | if c is a valid NameChar character, 0 otherwise. |
Check if a character c is a Digit.
- Return values
-
1 | if c is a valid Digit character, 0 otherwise. |
Check if a character is a PubidChar.
- Return values
-
1 | if c is a valid PubidChar character, 0 otherwise. |
Very trivial, non-conformant normalization of a systemid.
void ACEXML_Parser::parse |
( |
const ACEXML_Char * |
systemId | ) |
|
|
virtual |
Parse an XML document from a system identifier (URI).
Implements ACEXML_XMLReader.
int ACEXML_Parser::parse_attlist_decl |
( |
void |
| ) |
|
|
protected |
Parse an "ATTLIST" decl. Thse first character this method expects is always the 'A' (the first char) in the word "ATTLIST".
- Return values
-
0 | on success, -1 otherwise. |
Parse an attribute name.
- Return values
-
str | String containing the value of the attribute name if successful. |
0 | otherwise. |
int ACEXML_Parser::parse_atttype |
( |
void |
| ) |
|
|
protected |
Parse a AttType declaration.
AttType ::= StringType | TokenizedType | EnumeratedType StringType ::= 'CDATA' TokenizedType ::= 'ID' [VC: ID] [VC: One ID per Element Type] [VC: ID Attribute Default] | 'IDREF' [VC: IDREF] | 'IDREFS' [VC: IDREF] | 'ENTITY' [VC: Entity Name] | 'ENTITIES' [VC: Entity Name] | 'NMTOKEN' [VC: Name Token] | 'NMTOKENS'
EnumeratedType ::= NotationType | Enumeration NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' [VC: Notation Attributes] [VC: One Notation Per Element Type] [VC: No Notation on Empty Element] Enumeration ::= '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')' [VC: Enumeration]
int ACEXML_Parser::parse_attvalue |
( |
ACEXML_Char *& |
str | ) |
|
|
protected |
Parse an attribute value.
- Parameters
-
str | String containing the value of the attribute if successful. |
- Returns
- 0 if attribute value was read successfully, -1 otherwise.
int ACEXML_Parser::parse_cdata |
( |
void |
| ) |
|
|
protected |
Parse a CDATA section. The first character should always be the first '[' in CDATA definition.
- Return values
-
int ACEXML_Parser::parse_char_reference |
( |
ACEXML_Char * |
buf, |
|
|
size_t & |
len |
|
) |
| |
|
protected |
Parse a character reference, i.e., " " or "". The first character encountered should be the '#' char.
- Parameters
-
buf | points to a character buffer for the result. |
len | In/out argument which initially specifies the size of the buffer and is later set to the no. of characters in the reference. |
- Return values
-
0 | on success and -1 otherwise. |
int ACEXML_Parser::parse_child |
( |
int |
skip_open_paren | ) |
|
|
protected |
Parse a cp
non-terminal. cp
can either be a seq
or a choice
. This function calls itself recursively.
- Parameters
-
skip_open_paren | when non-zero, it indicates that the open paren of the seq or choice has already been removed from the input stream. |
- Return values
-
0 | on success, -1 otherwise. |
int ACEXML_Parser::parse_children_definition |
( |
void |
| ) |
|
|
protected |
Parse the "children" and "Mixed" non-terminals in contentspec.
The first character this function sees must be the first open paren '(' in children.
- Return values
-
0 | on success, -1 otherwise. |
int ACEXML_Parser::parse_comment |
( |
void |
| ) |
|
|
protected |
Skip over a comment. The first character encountered should always be the first '-' in the comment prefix "@<@!–".
int ACEXML_Parser::parse_conditional_section |
( |
void |
| ) |
|
|
protected |
Parse a conditionalSect declaration.
Parse a content declaration.
int ACEXML_Parser::parse_defaultdecl |
( |
void |
| ) |
|
|
protected |
Parse a DefaultDecl specification.
int ACEXML_Parser::parse_doctypedecl |
( |
void |
| ) |
|
|
protected |
Parse the DOCTYPE declaration. The first character encountered should always be 'D' in doctype prefix: "@<@!DOCTYPE".
void ACEXML_Parser::parse_element |
( |
int |
is_root | ) |
|
|
protected |
Parse an XML element. The first character encountered should be the first character of the element "Name".
- Parameters
-
is_root | If not 0, then we are expecting to see the "root" element now, and the next element's name need to match the name defined in DOCTYPE definition, i.e., this->doctype_. |
- Todo:
- Instead of simply checking for the root element based on the argument is_root, we should instead either pass in some sort of validator or allow the function to return the element name so it can be used in a validator.
int ACEXML_Parser::parse_element_decl |
( |
void |
| ) |
|
|
protected |
Parse an "ELEMENT" decl. The first character this method expects is always the 'L' (the second char) in the word "ELEMENT".
- Return values
-
0 | on success, -1 otherwise. |
Parse the encoding name in an XML Prolog section.
- Parameters
-
str | String containing the encoding name if successful. |
- Returns
- 0 if the string was read successfully, 0 otherwise.
void ACEXML_Parser::parse_encoding_decl |
( |
void |
| ) |
|
|
protected |
Parse a EncodingDecl declaration.
int ACEXML_Parser::parse_entity_decl |
( |
void |
| ) |
|
|
protected |
Parse an "ENTITY" decl. The first character this method expects is always the 'N' (the second char) in the word "ENTITY".
- Return values
-
0 | on success, -1 otherwise. |
int ACEXML_Parser::parse_entity_reference |
( |
void |
| ) |
|
|
protected |
int ACEXML_Parser::parse_entity_value |
( |
ACEXML_Char *& |
str | ) |
|
|
protected |
int ACEXML_Parser::parse_external_dtd |
( |
void |
| ) |
|
|
protected |
Parse an ExternalID or a reference to PUBLIC ExternalID. Possible cases are in the forms of:
SYSTEM 'quoted string representing system resource' PUBLIC 'quoted name of public ID' 'quoted resource' PUBLIC 'quoted name we are referring to'
The first character this function sees must be either 'S' or 'P'. When the function finishes parsing, the input stream points at the first non-whitespace character.
- Parameters
-
publicId | returns the unquoted publicId read. If none is available, it will be reset to 0. |
systemId | returns the unquoted systemId read. If none is available, it will be reset to 0. |
- Return values
-
0 | on success, -1 otherwise. |
int ACEXML_Parser::parse_external_subset |
( |
void |
| ) |
|
|
protected |
Parse an external subset. This does the actual parsing of an external subset and is called by
- See Also
- parse_external_dtd.
int ACEXML_Parser::parse_ignoresect |
( |
void |
| ) |
|
|
protected |
Parse a ignoreSect declaration.
int ACEXML_Parser::parse_includesect |
( |
void |
| ) |
|
|
protected |
Parse a includeSect declaration.
int ACEXML_Parser::parse_internal_dtd |
( |
void |
| ) |
|
|
protected |
Parse a "markupdecl" section, this includes both "markupdecl" and "DeclSep" sections in XML specification
int ACEXML_Parser::parse_markup_decl |
( |
void |
| ) |
|
|
protected |
Parse a markupDecl section.
Parse a name from the input CharStream. If ch @!= 0, then we have already consumed the first name character from the input CharStream, otherwise, parse_name will use this->get() to acquire the initial character.
- Returns
- A pointer to the string in the obstack, 0 if it's not a valid name.
Parse a NMTOKEN from the input stream.
- Returns
- A pointer to the string in the obstack, 0 if it's not a valid NMTOKEN.
int ACEXML_Parser::parse_notation_decl |
( |
void |
| ) |
|
|
protected |
*Parse a "NOTATION" decl. The first character this method expects is always the 'N' (the first char) in the word "NOTATION".
- Return values
-
0 | on success, -1 otherwise. |
int ACEXML_Parser::parse_PE_reference |
( |
void |
| ) |
|
|
protected |
int ACEXML_Parser::parse_processing_instruction |
( |
void |
| ) |
|
|
protected |
Parse a PI statement. The first character encountered should always be '?' in the PI prefix "@<?".
- Return values
-
0 | on success, -1 otherwise. |
int ACEXML_Parser::parse_pubid_literal |
( |
ACEXML_Char *& |
str | ) |
|
|
protected |
Parse a PubidLiteral.
- Parameters
-
str | String containing the PubidLiteral if successful. |
- Returns
- 0 if the string was read successfully, 0 otherwise.
ACEXML_Char * ACEXML_Parser::parse_reference_name |
( |
void |
| ) |
|
|
protected |
Parse a reference name, i.e., foo in "&foo;" or "%foo;". The first character encountered should be the character following '&' or ''. Effectively the same as
- See Also
- parse_name but we don't use the parser's obstack. Caller is responsible for deleting the memory.
- Return values
-
A | pointer to name of reference, 0 otherwise. |
Parse a SDDecl string.
- Parameters
-
str | String containing the encoding name if successful. |
- Returns
- 0 if the string was read successfully, -1 otherwise.
int ACEXML_Parser::parse_system_literal |
( |
ACEXML_Char *& |
str | ) |
|
|
protected |
Parse a SystemLiteral.
- Parameters
-
str | String containing the SystemLiteral if successful. |
- Returns
- 0 if the string was read successfully, 0 otherwise.
int ACEXML_Parser::parse_text_decl |
( |
void |
| ) |
|
|
protected |
Parse a TextDecl declaration.
int ACEXML_Parser::parse_tokenized_type |
( |
void |
| ) |
|
|
protected |
Parse a tokenized type attribute.
- Returns
- 0 if attribute type was read successfully, -1 otherwise.
Parse the version string in an XML Prolog section.
- Parameters
-
str | String containing the version number if successful. |
- Returns
- 0 if the string was read successfully, 0 otherwise.
void ACEXML_Parser::parse_version_info |
( |
void |
| ) |
|
|
protected |
Parse VersionInfo declaration.
int ACEXML_Parser::parse_version_num |
( |
ACEXML_Char *& |
str | ) |
|
|
protected |
Parse the version number in a VersionInfo declaration.
void ACEXML_Parser::parse_xml_decl |
( |
void |
| ) |
|
|
protected |
Parse a XMLDecl declaration.
void ACEXML_Parser::parse_xml_prolog |
( |
void |
| ) |
|
|
protected |
size_t ACEXML_Parser::pop_context |
( |
int |
GE_ref | ) |
|
|
private |
Pop the top element in the stack and replace current context with that.
Dispatch prefix mapping calls to the ContentHandler.
- Parameters
-
prefix | Namespace prefix |
uri | Namespace URI |
name | Local name |
start | 1 => startPrefixMapping 0 => endPrefixMapping |
Push the current context on to the stack.
void ACEXML_Parser::reset |
( |
void |
| ) |
|
|
private |
Allow an application to register a content event handler.
Implements ACEXML_XMLReader.
Allow an application to register a DTD event handler.
Implements ACEXML_XMLReader.
Allow an application to register an entity resolver.
Implements ACEXML_XMLReader.
Allow an application to register an error event handler.
Implements ACEXML_XMLReader.
void ACEXML_Parser::setFeature |
( |
const ACEXML_Char * |
name, |
|
|
int |
boolean_value |
|
) |
| |
|
virtual |
void ACEXML_Parser::setProperty |
( |
const ACEXML_Char * |
name, |
|
|
void * |
value |
|
) |
| |
|
virtual |
Skip an equal sign.
- Return values
-
0 | when succeeds, -1 if no equal sign is found. |
Skip any whitespaces encountered until the first non-whitespace character is encountered.
- Returns
- The next non-whitespace character from the CharStream.
- See Also
- skip_whitespace_count
Skip any whitespaces encountered until the first non-whitespace character. The first non-whitespace character is not consumed. This method does peek into the input CharStream and therefore is more expensive than skip_whitespace.
- Parameters
-
peek | If non-null, peek points to a ACEXML_Char where skip_whitespace_count stores the first non-whitespace character it sees (character is not removed from the stream.) |
- Returns
- The number of whitespace characters consumed.
- See Also
- skip_whitespace
Create a new ACEXML_CharStream from systemId and publicId and replace the current input stream with the newly created stream.
Create a new ACEXML_InputSource from systemId and publicId and replace the current input source with the newly created InputSource.
Dispatch warnings to ErrorHandler.
Alternative obstack used to hold any strings when the original is in use.
Stack used to hold the Parser_Context.
Keeping track of the handlers. We do not manage the memory for handlers.
int ACEXML_Parser::external_dtd_ |
|
private |
If set, the document has an external DTD subset.
int ACEXML_Parser::external_entity_ |
|
private |
T => We are parsing an external entity value.
Set of external parsed general entities in the document.
Set of external parsed parameter entities in the document.
int ACEXML_Parser::external_subset_ |
|
private |
T => We are parsing an external subset.
Set used to hold the general entity references that are active.
int ACEXML_Parser::has_pe_refs_ |
|
private |
T => Internal DTD has parameter entity references.
int ACEXML_Parser::internal_dtd_ |
|
private |
If set, the document has an internal DTD.
Set of internal parsed general entities in the document.
Set of internal parsed parameter entities in the document.
int ACEXML_Parser::namespace_prefixes_ |
|
private |
If set, the parser should include namespace declarations in the list of attributes of an element.
int ACEXML_Parser::namespaces_ |
|
private |
If set, the parser should allow access by namespace qualified names.
int ACEXML_Parser::nested_namespace_ |
|
private |
T => We are processing a nested namespace.
Set of notations declared in the document.
Obstack used by the parser to hold all the strings parsed.
Set used to hold the parameter entity references that are active.
Set of predefined entities used by the parser.
State of the parser when it encounters a reference.
int ACEXML_Parser::simple_parsing_ |
|
private |
Feature flags If set, the parser should parse a document without a prolog
int ACEXML_Parser::standalone_ |
|
private |
If set, the document is a standalone XML document.
Set of unparsed entities in the document.
int ACEXML_Parser::validate_ |
|
private |
If set, the parser should also validate.
Namespace stack used by the parser to implement support for Namespaces.
The documentation for this class was generated from the following files: