XML Tutorial for Beginners: Learn XML Step by Step
What Is XML?
XML (eXtensible Markup Language) is a text format for storing and exchanging structured data. Unlike HTML — which has fixed tags like <p>, <div>, <h1> for displaying web pages — XML lets you invent your own tags to describe what your data means:
<customer> <name>Alice Chen</name> <email>alice@example.com</email> <city>Paris</city> </customer>
That's a complete XML document. Three rules and you've already understood the format:
- Every opening tag
<name>needs a closing tag</name> - Tags are nested but never overlap
- The whole document has one root element wrapping everything
Where XML Is Still Used in 2026
JSON has replaced XML in most modern web APIs, but XML is far from dead. Anywhere you find one of these, you'll find XML:
| Use Case | What You'll See |
|---|---|
| Configuration files | Spring (Java), Maven pom.xml, Android AndroidManifest.xml, .NET app configs |
| Office documents | Microsoft Word .docx, Excel .xlsx, PowerPoint .pptx — they're all ZIP files of XML inside |
| Web feeds | RSS, Atom — every podcast feed and most blog feeds are XML |
| Sitemaps | Every sitemap.xml on every website you've ever submitted to Google |
| SVG graphics | Scalable Vector Graphics — used everywhere from icons to charts — is XML |
| SOAP web services | Banking, healthcare, government, and B2B integrations |
| Government & enterprise data | Tax filings, healthcare records (HL7), legal documents (LegalXML) |
| Digital publishing | EPUB ebooks are ZIPs of XML; print publishing uses DocBook, DITA |
Try It Yourself: XML Validator
Below is a working XML validator that runs entirely in your browser. Paste any XML to check if it's well-formed (follows basic syntax rules). Try the example, or break it on purpose to see how errors are reported.
The 5 Core Rules of XML
Rule 1: Every Opening Tag Needs a Closing Tag
<!-- Wrong: --> <name>Alice <!-- Right: --> <name>Alice</name> <!-- Empty element shortcut: --> <br/>
Rule 2: Tags Must Be Properly Nested
<!-- Wrong: --> <b><i>Bold and italic</b></i> <!-- Right: --> <b><i>Bold and italic</i></b>
Rule 3: One Root Element
Every XML document has exactly one outermost element that wraps all others:
<?xml version="1.0" encoding="UTF-8"?> <catalog> <!-- root element --> <product>...</product> <product>...</product> </catalog>
Rule 4: Attribute Values Must Be Quoted
<!-- Wrong: --> <book category=programming> <!-- Right (single or double quotes both work): --> <book category="programming"> <book category='programming'>
Rule 5: XML Is Case-Sensitive
<!-- These are DIFFERENT elements: --> <Name>Alice</Name> <name>Alice</name> <NAME>Alice</NAME> <!-- And THIS is a syntax error (mismatched case): --> <Name>Alice</name>
XML vs HTML: The Key Differences
| Feature | HTML | XML |
|---|---|---|
| Purpose | Display web pages | Store and exchange data |
| Tags | Predefined (<p>, <div>, etc.) | You invent them |
| Case sensitivity | Tolerates either | Strict — case matters |
| Closing tags | Some optional | Always required |
| Attribute quotes | Recommended | Required |
| Browser display | Renders visually | Shows tree structure |
XML vs JSON: When to Use Which
The honest answer: it depends on the system you're working with.
| Use Case | Better Choice |
|---|---|
| Modern REST API | JSON |
| JavaScript/web frontend data | JSON |
| SOAP web service | XML (required) |
| Configuration files (Spring, Maven, Android) | XML (often required) |
| RSS/Atom feeds | XML (standardized) |
| SVG graphics | XML (it IS XML) |
| Document with mixed text and markup | XML |
| Pure data, no document structure | JSON |
| Document validation against a schema | XML (better tooling) |
| Government/healthcare/banking integrations | XML (often mandated) |
<!-- XML --> <customer> <name>Alice</name> <age>30</age> </customer> // JSON { "customer": { "name": "Alice", "age": 30 } }
Attributes vs Elements: When to Use Which
You can model the same data two ways:
<!-- Using attributes: --> <book title="Learning XML" author="Erik Ray" year="2003"/> <!-- Using child elements: --> <book> <title>Learning XML</title> <author>Erik Ray</author> <year>2003</year> </book>
Use attributes for metadata about an element (id, type, lang, version) and for simple values that won't have substructure.
Use child elements for the main content, anything that might contain other elements, or anything that might repeat.
Well-Formed vs Valid XML
Two different concepts that often confuse beginners:
- Well-formed XML follows the basic syntax rules above. Every parser can check this without any extra information.
- Valid XML is well-formed AND conforms to a schema (DTD or XSD) that defines exactly which elements and attributes are allowed.
Example: The XML below is well-formed — but if your schema requires every <book> to have a <price>, this XML is not valid:
<book> <title>Learning XML</title> <!-- missing <price> --> </book>
DTD vs XSD: Two Schema Languages
Two ways to define the rules an XML document must follow:
DTD (Document Type Definition) — older, simpler
<!ELEMENT book (title, author, year, price)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT price (#PCDATA)>
XSD (XML Schema Definition) — modern, more powerful
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="author" type="xs:string"/> <xs:element name="year" type="xs:integer"/> <xs:element name="price" type="xs:decimal"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Use XSD for new projects — it has data types (string, integer, date), namespaces, and more validation power.
You'll see DTD in older systems and document formats (DocBook, some HTML doctypes).
Common Beginner Mistakes
1. Mismatched Case
<Customer>...</customer> is broken — XML is case-sensitive. Open and close must match exactly.
2. Forgetting the XML Declaration
Recommended at the top of every file:
<?xml version="1.0" encoding="UTF-8"?>
It's optional but helps tools handle character encoding correctly.
3. Special Characters Not Escaped
Five characters need escape sequences inside element content or attributes:
| Character | Escape Sequence |
|---|---|
< | < |
> | > |
& | & |
" | " |
' | ' |
4. Multiple Root Elements
<!-- Broken — two root elements: --> <customer>Alice</customer> <customer>Bob</customer> <!-- Fixed — wrap in a root: --> <customers> <customer>Alice</customer> <customer>Bob</customer> </customers>
5. Tag Names Starting with a Number
<1stcustomer> is invalid. XML element names must start with a letter or underscore.
Complete Learning Path
Work through the 13 chapters below in order. Each chapter is short — about 30 minutes of reading and practice.
XML Quick Reference Cheat Sheet
| Pattern | Syntax |
|---|---|
| XML declaration | <?xml version="1.0" encoding="UTF-8"?> |
| Element with content | <name>Alice</name> |
| Empty element | <br/> |
| Element with attribute | <book id="123">...</book> |
| Comment | <!-- This is a comment --> |
| CDATA section (raw text) | <![CDATA[Anything < here is OK]]> |
| Namespace | <ns:book xmlns:ns="http://..."> |
| Reference to DTD | <!DOCTYPE root SYSTEM "schema.dtd"> |
| XSD root element | <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> |
| Escape less-than | < |
| Escape ampersand | & |
| Self-closing tag | <br/> or <br /> |
Frequently Asked Questions
What is XML used for in 2026?
XML is still widely used for configuration files (Spring, Maven, Android manifests), document formats (Microsoft Office .docx, .xlsx are XML internally), SOAP web services, RSS/Atom feeds, SVG graphics, sitemaps, and cross-system data exchange. JSON has replaced XML in many web APIs, but XML remains dominant in enterprise systems.
Is XML hard to learn?
XML is one of the easiest technologies to learn. The core syntax is just opening tags, closing tags, and attributes — anyone who has seen HTML can read XML in minutes. Schemas (DTD, XSD), namespaces, and XSLT add complexity later, but the basics take an hour.
What's the difference between XML and HTML?
HTML is a fixed set of predefined tags for displaying web pages. XML lets you define your own tags for storing structured data. HTML focuses on how things look; XML focuses on what data means.
Should I learn XML or JSON?
Learn both. JSON is preferred for modern web APIs and JavaScript work. XML is essential for configuration files, document formats, SOAP services, and any work touching enterprise systems.
How long does it take to learn XML?
Basic XML syntax: 1-2 hours. Reading and writing XML confidently: a few days. Understanding DTD and XSD schemas: 1-2 weeks. Mastering XSLT, XPath, and namespaces: a few months.
Do I need software to write XML?
No. XML is plain text — Notepad on Windows or TextEdit on Mac is enough. For production work, VS Code, Notepad++, or Sublime Text with XML support helps with syntax highlighting and auto-closing tags.
What is well-formed vs valid XML?
Well-formed XML follows basic syntax rules. Valid XML is well-formed AND conforms to a schema (DTD or XSD). All valid XML is well-formed; not all well-formed XML is valid.
Is XML still relevant in modern web development?
Yes, in specific contexts. APIs increasingly use JSON, but XML is still required for SOAP services, RSS feeds, sitemaps, SVG, Office formats, Android manifests, Spring config, and Maven build files.
Start Chapter 1: What is XML? →Last updated: April 25, 2026.