What is an XML infoset and in what ways is it different to an XML document?
Asked Answered
G

9

19

I've tried to read http://www.w3.org/TR/xml-infoset/ and the wikipedia entry. But frankly I'm still not sure what the difference is.

The quote :

An XML document has an information set if it is well-formed and satisfies the namespace constraints. There is no requirement for an XML document to be valid in order to have an information set.

From the wikipedia entry seems to not make sense. How can a non valid document have any semantics, and thus how can it be an 'information' set?

What is this 'infoset' that

well-formed and satisfies the namespace constrained

XML has? And in what way it is useful in itself. In other words why is it, semantically speaking, necessary to define the XML infoset? Is there any information that cannot be represented in XML? If so I can see the limiting set of the XML Infoset, but if not surely the XML Infoset is as meaningless as term 'information'?

Thank you for the interesting answers: I still cannot grasp why the Xml infoset has any purpose as opposed to the term infoset. But you guys have given me the direct answer to the question.

Gouache answered 8/5, 2009 at 10:34 Comment(1)
An old question, but I posted a new answer as I think it is useful.Reify
C
9

A useful way of thinking of the distinction between XML text and the XML infoset is to consider the Fast Infoset. This is a binary representation of the XML infoset.

So you have the an abstract "infoset" which is a conceptual model representing XML data (nodes, elements, attributes, etc). This can be physically represented as a text XML document, or as a Fast Infoset stream. Both represent the same data, but in radically different ways.

Cornwell answered 1/10, 2009 at 11:55 Comment(3)
Thank you, but I still have the the problem in comprehending what makes the XML info set different from the general case of an info set. I'll take a look at that ans see.Gouache
I'll try and be more clear. Is it the case that XML => elements and attributes? In that case it makes sense however I originally perceived concept of XML as a specialisation of the general case of the infoset (ie. describing information). Now it seems to be the case the XML is the generalisation of that concept in which case the XML infoset is THE infoset. Hence my inability to comprehend the semantics.Gouache
Thankyou. it's finally sunk in.Gouache
P
21

XML is not text. XML "is" the XML infoset. This may then be serialized into text in an XML document, but it is the XML infoset that is the reality.

The infoset may exist in memory as a DOM tree, for instance. It exists in memory as the implementation of an abstract object model.

What if I serialized it as UTF-8 and then as UTF-16. Chances are the results would be two different sets of bits, but same infoset.

Consider also that with text it makes sense to do things like string concatenation. You don't want to concatenate a "<" into the middle of an XML element. You have to encode it first. Why would you have to do this if it were just text? If you used the DOM, for instance, you'd just say element.InnerText = "<"; When serialized, the "<" would be encoded into "&lt;". Yet it's the same infoset.

Plumber answered 8/5, 2009 at 10:43 Comment(7)
I cannot visualise this paradigm - in what way is XML not text. I'm not being facaetious but how does xml 'exist' without being represented with angle brackets?Gouache
thank you. I appreciate the example. I did originally see the encoding aspect and the 'same information' aspect - but is this all an infoset is? What makes the XML Infoset distinct from any information definition?Gouache
+1 for examining the model independent of its bits. See also en.wikipedia.org/wiki/Theory_of_FormsZaccaria
@Preet Sangha: The infoset is the abstract data. XML is just one way of representing that data. The data could be represented completely in a completely different way, one that does not even look like pointy brackets in a text file, still it would be the same data. It is a common mistake to think that XML actually is data it represents. It is merely the serialized form.Accommodating
@tomalak. In which case this is an infoset. What makes its the XML infoset then?Gouache
It's an XML infoset because it's an infoset represented in XML.Ursola
Sorry, but that's wrong. Plain XML 1.0 is defined based on a syntax and it does not have namespaces, but CDATA section which Infoset does not have. In fact there are various slightly differing models of XML (plain XML, XML with Namespaces, XML Infoset, XPath Model, Canonical XML, XML 1.1 etc.).Subtonic
C
9

A useful way of thinking of the distinction between XML text and the XML infoset is to consider the Fast Infoset. This is a binary representation of the XML infoset.

So you have the an abstract "infoset" which is a conceptual model representing XML data (nodes, elements, attributes, etc). This can be physically represented as a text XML document, or as a Fast Infoset stream. Both represent the same data, but in radically different ways.

Cornwell answered 1/10, 2009 at 11:55 Comment(3)
Thank you, but I still have the the problem in comprehending what makes the XML info set different from the general case of an info set. I'll take a look at that ans see.Gouache
I'll try and be more clear. Is it the case that XML => elements and attributes? In that case it makes sense however I originally perceived concept of XML as a specialisation of the general case of the infoset (ie. describing information). Now it seems to be the case the XML is the generalisation of that concept in which case the XML infoset is THE infoset. Hence my inability to comprehend the semantics.Gouache
Thankyou. it's finally sunk in.Gouache
M
2

A valid XML document fulfills the requirements of a DTD or XSD (or other standards). If it is well-formed, it still can be 'invalid', if it violates the rules in the given DTD or XSD.

Edit: I am new to this area of XML, but it looks like the infoset is the 'abstract level' description of the parts of a XML document, independent of the actual technical implementation - which could be, for example, a Document Object Model implementation.

Mountie answered 8/5, 2009 at 10:37 Comment(1)
but what makes it an infoset as opposed to a vanilla xml document?Gouache
Q
2

An XML infoset is an abstract set of concepts such as attributes and entities that can be used to describe a valid XML document. According to the specification, "An XML document's information set consists of a number of information items; the information set for any well-formed XML document will contain at least a document information item and several others."

Just because an XML document is an infoset does not mean it conforms to an XSD and is a valid XML document.

Quadrennial answered 8/5, 2009 at 21:52 Comment(2)
Thank you. So what you're saying is that by describing something with attributes and entities - i.e. things and things about things makes it an xml infoset? I refer you to original questions - then why even bother to define such a thing? What needs it?Gouache
It allows the other XML standards to be described in terms of this abstract model instead of in terms of their effect on some concrete implementation. Consider the fact that there may be many concrete implementations, and the benefit becomes much more clear. You would have to describe XSLT multiple times to account for the separate implementations instead of describing it once, in terms of the infoset.Plumber
F
2

Please see this link from MSDN. http://msdn.microsoft.com/en-us/library/aa468561.aspx

It is a really good explanation of the concepts and will hopefully make it clear to you.

Fidole answered 12/12, 2010 at 10:44 Comment(0)
R
0

A good example I've just come across is in David Chappell's WCF PDF. This is how it works when using TCP for example:

To allow optimal performance when both parties in a communication are built on WCF, the wire encoding used in this case is an optimized binary version of SOAP. Messages still conform to the data structure of a SOAP message, referred to as its Infoset, but their encoding uses a binary representation of that Infoset rather than the standard angle-brackets-and-text format of XML. Using this option would make sense for communicating with the call center client application, since it’s also built on WCF, and performance is a paramount concern.

Reify answered 1/10, 2009 at 11:50 Comment(1)
Cheers Rich, this actually where my question originated. I cannot see what distinguished the XML Imfoset from the general case of the Infoset in the case of a thing with attributes. Actually I feel stupid in that I'm the only person who cannot seem to see why the XMK in XML infoset matters.Gouache
B
0

XML is a language, therefore it has syntax, and XML Infoset has specification of the data model, this is due to applications have need that are based on data model rather than syntax; XML comes before XML Infoset; Reference: protocol considerations for Web Linkbase Access

Berriman answered 25/5, 2017 at 14:38 Comment(1)
Can you elbourate this answer please? What is the data model basically and how does it differ from the term infoset?Gouache
V
0

XML Infoset is a requirement on how you should structure serialised XML document.

Serialized XML can have different forms, like some binary format (Fast Infoset) or text (most popular form).

Basically for XML document format (text), each element and attribute should be defined in XSD trough corresponding namespace.

Here you will find an example.

Voncile answered 4/9, 2017 at 13:26 Comment(0)
G
0

XML Information Set is a set of definitions for use in other specifications that need to refer to the information in an XML document.

XML Information Set's purpose is to provide a consistent set of definitions for use in other specifications that need to refer to the information in a well-formed XML document.

One of the way to get XML Information Set is by parsing an XML document.

An XML document's information set consists of a number of information items. The terms "information set" and "information item" are similar in meaning to the generic terms "tree" and "node".

The details can be found on XML Information Set.

Gillam answered 2/10, 2021 at 6:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.