API reference ============= Module-level functions ---------------------- .. function:: parse(xml: str) -> Document Parse an XML string and return a :class:`Document`. :param xml: A well-formed XML string. :raises XmlParseError: If the XML is syntactically malformed. :raises XmlWellFormednessError: If a well-formedness constraint is violated. .. code-block:: python from pyuppsala import parse doc = parse("hello") print(doc.document_element.text_content) # "hello" .. function:: parse_bytes(data: bytes) -> Document Parse XML from bytes with automatic encoding detection (UTF-8, UTF-16 LE/BE). :param data: Raw bytes of an XML document. :raises XmlParseError: If the XML is malformed. .. code-block:: python from pyuppsala import parse_bytes # UTF-8 doc = parse_bytes(b"hello") # UTF-16 LE with BOM data = b"\xff\xfe<\x00r\x00o\x00o\x00t\x00/\x00>\x00" doc = parse_bytes(data) Document -------- .. class:: Document(xml: str) Parse an XML string into a DOM document. :param xml: A well-formed XML string. .. code-block:: python from pyuppsala import Document doc = Document("XML Guide") print(doc.document_element.tag.local_name) # "catalog" .. staticmethod:: from_bytes(data: bytes) -> Document Parse XML from bytes with automatic encoding detection. .. code-block:: python doc = Document.from_bytes(b"") .. staticmethod:: empty() -> Document Create a new empty document with no document element. .. code-block:: python doc = Document.empty() root = doc.create_element("root") doc.append_child(doc.root, root) print(doc.to_xml()) # "" .. attribute:: root :type: Node The root node (the Document node itself). This is the parent of the document element, processing instructions, and comments at the top level. .. attribute:: document_element :type: Node | None The document element (top-level element), or ``None`` for empty documents. .. attribute:: input_text :type: str The original input text that was parsed to create this document. Returns an empty string for programmatically constructed documents. .. code-block:: python xml = "hello" doc = Document(xml) assert doc.input_text == xml empty = Document.empty() assert empty.input_text == "" .. method:: get_elements_by_tag_name(name: str) -> list[Node] Find all elements in the document with the given local tag name. .. code-block:: python doc = Document("") elements = doc.get_elements_by_tag_name("a") print(len(elements)) # 2 .. method:: get_elements_by_tag_name_ns(namespace_uri: str, name: str) -> list[Node] Find all elements with the given namespace URI and local tag name. .. code-block:: python doc = Document('') items = doc.get_elements_by_tag_name_ns("urn:ex", "item") print(len(items)) # 2 **Tree mutation** .. method:: create_element(local_name, namespace_uri=None, prefix=None) -> Node Create a new element node. The node is not yet attached to the tree; use :meth:`append_child`, :meth:`insert_before`, or :meth:`insert_after` to place it. .. code-block:: python doc = Document.empty() root = doc.create_element("root") doc.append_child(doc.root, root) # With namespace prefix child = doc.create_element("item", namespace_uri="urn:ex", prefix="x") doc.append_child(root, child) print(doc.to_xml()) # '' .. method:: create_text(text: str) -> Node Create a new text node. .. code-block:: python doc = Document("") root = doc.document_element text = doc.create_text("hello world") doc.append_child(root, text) print(doc.to_xml()) # "hello world" .. method:: create_comment(text: str) -> Node Create a new comment node. .. code-block:: python doc = Document("") comment = doc.create_comment(" generated by pyuppsala ") doc.append_child(doc.document_element, comment) print(doc.to_xml()) # "" .. method:: create_cdata(text: str) -> Node Create a new CDATA section node. .. code-block:: python doc = Document("") cdata = doc.create_cdata("function() { return 1 < 2; }") doc.append_child(doc.document_element, cdata) print(doc.to_xml()) # .. method:: create_processing_instruction(target: str, data: str | None = None) -> Node Create a new processing instruction node. .. code-block:: python doc = Document("") pi = doc.create_processing_instruction("xml-stylesheet", 'type="text/xsl" href="style.xsl"') doc.insert_before(doc.root, pi, doc.document_element) print(doc.to_xml()) # .. method:: append_child(parent: Node, child: Node) -> None Append *child* as the last child of *parent*. .. code-block:: python doc = Document("") root = doc.document_element b = doc.create_element("b") doc.append_child(root, b) print(doc.to_xml()) # "" .. method:: insert_before(parent: Node, new_child: Node, reference: Node) -> None Insert *new_child* before *reference* under *parent*. .. code-block:: python doc = Document("") root = doc.document_element a = doc.create_element("a") doc.insert_before(root, a, root.children[0]) print(doc.to_xml()) # "" .. method:: insert_after(parent: Node, new_child: Node, reference: Node) -> None Insert *new_child* after *reference* under *parent*. .. code-block:: python doc = Document("") root = doc.document_element b = doc.create_element("b") doc.insert_after(root, b, root.children[0]) print(doc.to_xml()) # "" .. method:: remove_child(parent: Node, child: Node) -> None Remove *child* from *parent*. .. code-block:: python doc = Document("") root = doc.document_element a = root.children[0] doc.remove_child(root, a) print(doc.to_xml()) # "" .. method:: replace_child(parent: Node, new_child: Node, old_child: Node) -> None Replace *old_child* with *new_child* under *parent*. .. code-block:: python doc = Document("") root = doc.document_element new = doc.create_element("new") doc.replace_child(root, new, root.children[0]) print(doc.to_xml()) # "" .. method:: detach(node: Node) -> None Detach *node* from its parent. The node remains valid and can be re-attached elsewhere. .. code-block:: python doc = Document("") root = doc.document_element b = root.children[1] doc.detach(b) print(len(root.children)) # 2 # Re-attach at the end doc.append_child(root, b) print(doc.to_xml()) # "" **Serialization** .. method:: to_xml() -> str Serialize the document to a compact XML string. .. code-block:: python doc = Document(" ") print(doc.to_xml()) # " " .. method:: to_xml_with_options(indent=None, expand_empty_elements=False) -> str Serialize with formatting options. :param indent: Indentation string (e.g. ``" "``), or ``None`` for compact output. :param expand_empty_elements: If ``True``, write ```` instead of ````. .. code-block:: python doc = Document("text") # Pretty-print with 2-space indent print(doc.to_xml_with_options(indent=" ")) # # # text # # Expand empty elements (useful for HTML compatibility) print(doc.to_xml_with_options(expand_empty_elements=True)) # "text" .. method:: write_to_file(path: str) -> None Write the serialized document to a file. .. code-block:: python doc = Document("value") doc.write_to_file("/tmp/config.xml") **XPath** .. method:: prepare_xpath() -> None Build internal indices required for XPath evaluation. Call this once before using :class:`XPathEvaluator` on this document. If you modify the DOM after calling this, call it again. .. code-block:: python doc = Document('') doc.prepare_xpath() # Now XPath queries can access attribute nodes Node ---- .. class:: Node A lightweight handle to a node in a :class:`Document`. Nodes do not own their data -- the ``Document`` does. Do not use a ``Node`` after its parent ``Document`` has been garbage-collected. .. attribute:: kind :type: str The kind of this node. One of: ``"document"``, ``"element"``, ``"text"``, ``"comment"``, ``"processing_instruction"``, ``"cdata"``, ``"attribute"``. .. code-block:: python doc = Document("text") root = doc.document_element print(root.kind) # "element" print(root.children[0].kind) # "text" print(root.children[1].kind) # "comment" .. attribute:: tag :type: QName | None The tag name for element nodes, or ``None`` for other kinds. .. code-block:: python doc = Document('') root = doc.document_element print(root.tag.local_name) # "root" print(root.tag.namespace_uri) # "urn:ex" print(root.tag.prefix) # "ns" print(root.tag.prefixed_name) # "ns:root" .. attribute:: text :type: str | None The text content for text, comment, and CDATA nodes, or ``None``. .. code-block:: python doc = Document("hello") text_node = doc.document_element.children[0] print(text_node.text) # "hello" print(doc.document_element.text) # None (element, not text) .. attribute:: text_content :type: str Recursively collected text content of this node and all descendants. .. code-block:: python doc = Document("

Hello world!

") print(doc.document_element.text_content) # "Hello world!" .. attribute:: element_text :type: str | None The text of the first Text or CDATA child, or ``None``. This is a fast way to get the text content of simple elements like ``value`` without recursing into descendants. .. code-block:: python doc = Document("AliceA bold person") root = doc.document_element name = root.children[0] bio = root.children[1] print(name.element_text) # "Alice" print(bio.element_text) # "A " (only first text child, not recursive) print(bio.text_content) # "A bold person" (recursive) .. attribute:: attributes :type: list[Attribute] The list of attributes for element nodes (empty list for non-elements). .. code-block:: python doc = Document('') for attr in doc.document_element.attributes: print(f"{attr.name}: {attr.value}") # id: 1 # status: active .. attribute:: parent :type: Node | None The parent node, or ``None`` for the document root. .. code-block:: python doc = Document("") child = doc.document_element.children[0] print(child.parent.tag.local_name) # "root" .. attribute:: children :type: list[Node] The child nodes. .. code-block:: python doc = Document("") for child in doc.document_element.children: print(child.tag.local_name) # a, b, c .. attribute:: line :type: int The line number of this node in the original source (1-based). .. attribute:: column :type: int The column number of this node in the original source. .. code-block:: python xml = "\n \n" doc = Document(xml) child = doc.document_element.children[0] print(f"line {child.line}, column {child.column}") # line 2, column 3 .. attribute:: source_range :type: tuple[int, int] | None The byte range ``(start, end)`` of this node in the original source. Returns ``None`` for programmatically created nodes. .. code-block:: python xml = "text" doc = Document(xml) child = doc.document_element.children[0] start, end = child.source_range print(xml[start:end]) # "text" .. attribute:: source :type: str | None The original source text of this node. Returns ``None`` for programmatically created nodes. .. code-block:: python xml = 'hello' doc = Document(xml) item = doc.document_element.children[0] print(item.source) # 'hello' # Programmatically created nodes have no source new_elem = doc.create_element("new") print(new_elem.source) # None .. method:: get_attribute(name: str, namespace_uri: str | None = None) -> str | None Get an attribute value by local name, optionally filtered by namespace. .. code-block:: python doc = Document('') item = doc.document_element print(item.get_attribute("id")) # "42" # With namespace print(item.get_attribute("lang", namespace_uri="http://www.w3.org/XML/1998/namespace")) # "en" # Missing attribute print(item.get_attribute("missing")) # None .. method:: set_attribute(name, value, namespace_uri=None, prefix=None) -> str | None Set an attribute. Returns the previous value, or ``None``. .. code-block:: python doc = Document('') item = doc.document_element # Set a new attribute old = item.set_attribute("status", "active") print(old) # None # Update an existing attribute old = item.set_attribute("id", "2") print(old) # "1" print(doc.to_xml()) # '' .. method:: remove_attribute(name: str) -> str | None Remove an attribute by local name. Returns the old value, or ``None``. .. code-block:: python doc = Document('') item = doc.document_element old = item.remove_attribute("status") print(old) # "draft" print(doc.to_xml()) # '' .. method:: to_xml() -> str Serialize this node and its subtree to XML. .. code-block:: python doc = Document("hello") child = doc.document_element.children[0] print(child.to_xml()) # "hello" .. method:: to_xml_with_options(indent=None, expand_empty_elements=False) -> str Serialize this subtree with formatting options. .. code-block:: python doc = Document("") a = doc.document_element.children[0] print(a.to_xml_with_options(indent=" ")) # # # .. method:: get_elements_by_tag_name(name: str) -> list[Node] Find descendant elements by local tag name. .. code-block:: python doc = Document("") root = doc.document_element bs = root.get_elements_by_tag_name("b") print(len(bs)) # 2 (finds nested elements too) .. method:: get_elements_by_tag_name_ns(namespace_uri: str, name: str) -> list[Node] Find descendant elements by namespace URI and local tag name. .. code-block:: python doc = Document('') root = doc.document_element items = root.get_elements_by_tag_name_ns("urn:ex", "item") print(len(items)) # 2 (only the namespaced ones) .. method:: first_child_element_by_name_ns(namespace_uri: str, local_name: str) -> Node | None Find the first direct child element matching the given namespace URI and local name. Only searches direct children, not descendants. .. code-block:: python doc = Document("""\ Alice 30 Bob """) root = doc.document_element first_name = root.first_child_element_by_name_ns("urn:example", "name") print(first_name.element_text) # "Alice" # Returns None if not found missing = root.first_child_element_by_name_ns("urn:example", "email") print(missing) # None .. method:: child_elements_by_name_ns(namespace_uri: str, local_name: str) -> list[Node] Find all direct child elements matching the given namespace URI and local name. Only searches direct children, not descendants. .. code-block:: python doc = Document("""\ first skip second """) root = doc.document_element items = root.child_elements_by_name_ns("urn:example", "item") for item in items: print(item.element_text) # first # second .. method:: matches_name_ns(namespace_uri: str, local_name: str) -> bool Check whether this element matches the given namespace URI and local name. Returns ``False`` for non-element nodes. .. code-block:: python doc = Document('text') root = doc.document_element print(root.matches_name_ns("urn:example", "root")) # True print(root.matches_name_ns("urn:other", "root")) # False # Text nodes always return False text_node = root.children[0] print(text_node.matches_name_ns("urn:example", "root")) # False **Protocols** - ``len(node)`` returns the number of child nodes. - ``for child in node`` iterates over children. - ``node[i]`` returns the *i*-th child (supports negative indices). - ``bool(node)`` is always ``True``. - ``str(node)`` returns :meth:`to_xml`. - ``repr(node)`` returns a short description like ``Node()``. .. code-block:: python doc = Document("") root = doc.document_element print(len(root)) # 3 print(root[0].tag.local_name) # "a" print(root[-1].tag.local_name) # "c" for child in root: print(child.tag.local_name, end=" ") # a b c QName ----- .. class:: QName(local_name, namespace_uri=None, prefix=None) A qualified XML name. .. code-block:: python from pyuppsala import QName # Local name only q = QName("root") print(q.local_name) # "root" # With namespace q = QName("item", namespace_uri="urn:example", prefix="ex") print(q.prefixed_name) # "ex:item" print(q.namespace_uri) # "urn:example" .. attribute:: local_name :type: str .. attribute:: namespace_uri :type: str | None .. attribute:: prefix :type: str | None .. attribute:: prefixed_name :type: str The prefixed form (e.g. ``"ns:item"``) or just the local name. .. method:: matches(local_name: str, namespace_uri: str | None = None) -> bool Check whether this QName matches the given local name and optional namespace URI. Prefix is ignored. .. code-block:: python from pyuppsala import QName q = QName("item", namespace_uri="urn:example", prefix="ex") print(q.matches("item", namespace_uri="urn:example")) # True print(q.matches("item", namespace_uri="urn:other")) # False print(q.matches("item")) # False (namespace mismatch) print(q.matches("other", namespace_uri="urn:example")) # False # QName without namespace q2 = QName("root") print(q2.matches("root")) # True print(q2.matches("root", namespace_uri="urn:example")) # False Equality is by ``local_name`` and ``namespace_uri`` (prefix is ignored). QNames are hashable. .. code-block:: python from pyuppsala import QName # Same namespace, different prefix -- equal a = QName("item", namespace_uri="urn:ex", prefix="a") b = QName("item", namespace_uri="urn:ex", prefix="b") print(a == b) # True # Can be used in sets and as dict keys names = {a, b} print(len(names)) # 1 Attribute --------- .. class:: Attribute An XML attribute. .. attribute:: name :type: QName .. attribute:: value :type: str .. code-block:: python doc = Document('') attrs = doc.document_element.attributes for attr in attrs: print(f"{attr.name.local_name}={attr.value}") # id=1 # status=active XPathEvaluator -------------- .. class:: XPathEvaluator() XPath 1.0 expression evaluator. .. code-block:: python from pyuppsala import Document, XPathEvaluator doc = Document("123") doc.prepare_xpath() xpath = XPathEvaluator() .. method:: add_namespace(prefix: str, uri: str) -> None Register a namespace prefix for use in XPath expressions. .. code-block:: python xpath = XPathEvaluator() xpath.add_namespace("soap", "http://schemas.xmlsoap.org/soap/envelope/") xpath.add_namespace("m", "urn:example") .. method:: evaluate(doc, expr, context=None) -> list[Node] | bool | float | str Evaluate an XPath expression. The return type depends on the XPath result type: - **Node-set** -> ``list[Node]`` - **Boolean** -> ``bool`` - **Number** -> ``float`` - **String** -> ``str`` :param doc: The :class:`Document` to query (must have :meth:`~Document.prepare_xpath` called). :param expr: An XPath 1.0 expression string. :param context: Optional context node. Defaults to the document root. :raises XPathError: If the expression is invalid. .. code-block:: python doc = Document("AB") doc.prepare_xpath() xpath = XPathEvaluator() # Node-set nodes = xpath.evaluate(doc, "//item") print(len(nodes)) # 2 # String text = xpath.evaluate(doc, "string(//item[1])") print(text) # "A" # Number count = xpath.evaluate(doc, "count(//item)") print(count) # 2.0 # Boolean has_items = xpath.evaluate(doc, "boolean(//item)") print(has_items) # True # With context node root = doc.document_element first_child = xpath.evaluate(doc, "string(item[1])", context=root) print(first_child) # "A" .. method:: select(doc, expr, context=None) -> list[Node] Evaluate an XPath expression and return matching nodes. This is a convenience method equivalent to ``evaluate()`` when the result is a node-set. .. code-block:: python doc = Document('') doc.prepare_xpath() xpath = XPathEvaluator() nodes = xpath.select(doc, "//a") for node in nodes: print(node.get_attribute("id")) # "1", "2" XsdValidator ------------ .. class:: XsdValidator(schema_xml: str) XSD schema validator. Supports XSD structures and datatypes, 44+ built-in types, facets, complex types, extensions, restrictions, list types, wildcards, substitution groups, identity constraints, and fixed-value constraints. :param schema_xml: An XSD schema as an XML string. .. code-block:: python from pyuppsala import XsdValidator schema = """\ """ validator = XsdValidator(schema) .. staticmethod:: from_file(schema_xml: str, base_path: str) -> XsdValidator Create a validator that resolves ``xs:include``, ``xs:import``, and ``xs:redefine`` relative to *base_path*. .. code-block:: python import os schema_dir = "/path/to/schemas" with open(os.path.join(schema_dir, "main.xsd")) as f: schema_xml = f.read() validator = XsdValidator.from_file(schema_xml, schema_dir) .. method:: validate(doc: Document) -> list[ValidationError] Validate a parsed document. Returns a list of errors (empty = valid). .. code-block:: python from pyuppsala import Document, XsdValidator schema = """\ """ validator = XsdValidator(schema) doc = Document("Alice") errors = validator.validate(doc) print(len(errors)) # 0 .. method:: validate_str(xml: str) -> list[ValidationError] Parse and validate an XML string in one step. .. code-block:: python errors = validator.validate_str("Alice") print(len(errors)) # 0 .. method:: is_valid(doc: Document) -> bool Quick boolean check. .. code-block:: python doc = Document("Alice") print(validator.is_valid(doc)) # True .. method:: is_valid_str(xml: str) -> bool Quick boolean check from a string. Returns ``False`` for malformed XML instead of raising. .. code-block:: python print(validator.is_valid_str("Alice")) # True print(validator.is_valid_str("")) # False print(validator.is_valid_str(" None Configure whether length facets on QName/NOTATION types are enforced. Enabled by default. See `W3C Bug #4009 `_. ValidationError --------------- .. class:: ValidationError A single XSD validation error. .. attribute:: message :type: str .. attribute:: line :type: int | None .. attribute:: column :type: int | None .. code-block:: python errors = validator.validate_str("-5") for err in errors: print(f"Line {err.line}, Col {err.column}: {err.message}") print(repr(err)) # ValidationError('...', line=1, column=1) print(str(err)) # "1:1: ..." XmlWriter --------- .. class:: XmlWriter() An imperative XML builder for constructing XML fragments without a DOM. .. code-block:: python from pyuppsala import XmlWriter w = XmlWriter() w.write_declaration() w.start_element("root") w.text("hello") w.end_element("root") print(w.to_string()) # hello .. method:: write_declaration() -> None Write ````. .. method:: write_declaration_full(version="1.0", encoding=None, standalone=None) -> None Write an XML declaration with custom parameters. .. code-block:: python w = XmlWriter() w.write_declaration_full("1.0", encoding="ISO-8859-1", standalone=True) # .. method:: start_element(name, attrs=None) -> None Start an element. *attrs* is an optional list of ``(name, value)`` tuples. .. code-block:: python w = XmlWriter() w.start_element("div", [("class", "container"), ("id", "main")]) w.text("content") w.end_element("div") print(w.to_string()) # '
content
' .. method:: end_element(name: str) -> None Close the element. .. method:: empty_element(name, attrs=None) -> None Write a self-closing element: ````. .. code-block:: python w = XmlWriter() w.empty_element("br") w.empty_element("img", [("src", "photo.jpg"), ("alt", "Photo")]) print(w.to_string()) # '
Photo' .. method:: empty_element_expanded(name, attrs=None) -> None Write an expanded empty element: ````. .. code-block:: python w = XmlWriter() w.empty_element_expanded("script", [("src", "app.js")]) print(w.to_string()) # '' .. method:: text(content: str) -> None Write text content (auto-escaped). .. code-block:: python w = XmlWriter() w.start_element("p") w.text("Price: 5 < 10 & tax > 0") w.end_element("p") print(w.to_string()) # "

Price: 5 < 10 & tax > 0

" .. method:: cdata(content: str) -> None Write a CDATA section. .. code-block:: python w = XmlWriter() w.start_element("script") w.cdata("if (a < b && c > d) { }") w.end_element("script") print(w.to_string()) # "" .. method:: comment(content: str) -> None Write a comment. .. code-block:: python w = XmlWriter() w.comment(" This is a comment ") print(w.to_string()) # "" .. method:: processing_instruction(target, data=None) -> None Write a processing instruction. .. code-block:: python w = XmlWriter() w.processing_instruction("xml-stylesheet", 'type="text/xsl" href="style.xsl"') print(w.to_string()) # '' .. method:: raw(xml: str) -> None Write raw XML content (not escaped). .. code-block:: python w = XmlWriter() w.start_element("root") w.raw("escaped & ready") w.end_element("root") .. method:: to_string() -> str Return the accumulated XML as a string. .. method:: to_bytes() -> bytes Return the accumulated XML as bytes. .. code-block:: python w = XmlWriter() w.start_element("root") w.end_element("root") data = w.to_bytes() print(type(data)) # **Protocols** - ``len(writer)`` returns the number of bytes written so far. - ``bool(writer)`` is ``True`` if any content has been written. - ``str(writer)`` returns :meth:`to_string`. .. code-block:: python w = XmlWriter() print(bool(w)) # False print(len(w)) # 0 w.start_element("root") w.end_element("root") print(bool(w)) # True print(len(w)) # 13 XsdRegex -------- .. class:: XsdRegex(pattern: str) XSD regular expression pattern matcher. XSD regexes are implicitly anchored -- they must match the **entire** input string. Supported features: alternation (``|``), grouping, quantifiers (``*``, ``+``, ``?``, ``{n}``, ``{n,m}``), character classes with subtraction (``[a-z-[aeiou]]``), Unicode category escapes (``\p{Lu}``), Unicode block escapes (``\p{IsBasicLatin}``), multi-char escapes (``\d``, ``\s``, ``\w``, ``\i``, ``\c``). :param pattern: An XSD regex pattern string. :raises ValueError: If the pattern is invalid. .. code-block:: python from pyuppsala import XsdRegex # Email-like pattern email = XsdRegex(r"[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}") print(email.is_match("user@example.com")) # True print(email.is_match("not-an-email")) # False .. method:: is_match(input: str) -> bool Test whether *input* fully matches the pattern. .. code-block:: python # US ZIP code zip_re = XsdRegex(r"[0-9]{5}(-[0-9]{4})?") print(zip_re.is_match("12345")) # True print(zip_re.is_match("12345-6789")) # True print(zip_re.is_match("abcde")) # False .. attribute:: pattern :type: str The original pattern string. .. code-block:: python regex = XsdRegex(r"\d{3}-\d{4}") print(regex.pattern) # "\\d{3}-\\d{4}"