Storing XML data in database - many tables vs dumping xml in a column
Asked Answered
H

6

7

I want to store an xml that I receive in my java web service. Reports would be run every 5 mins to pull some data in the xml elements.

I thought of two approaches to solve this problem.

  1. Create multiple tables in the database to capture the xml data. Basically each element will have its own column in the database.

  2. Dump the whole xml in a column that can store xml data. For reporting purposes parse the value in the query itself.

Which of the above approaches is better, particularly in terms of performance? This is critical since reports will be generated in very high frequency (every 5 mins).

The xml schema is pretty complicated and not a simple one.

Hereditary answered 13/7, 2011 at 21:28 Comment(0)
R
9

If data is going to be written once and queried many times, it will almost certainly be more efficient to parse the XML document once, store the data in a proper relational schema, and query the relational schema. Parsing XML is not cheap so the overhead of parsing potentially multiple XML documents every 5 minutes could be substantial.

Of course, as will all performance questions, your mileage may vary so it may be worth testing. If you are using Oracle 11.2 and you store the data as binary XML (in which case it is stored after being parsed) and you create appropriate XMLIndexes on the XMLTypes you are storing, the performance penalty for leaving the data in the XML document may be quite small. It should still be slower than a proper relational structure but the difference may not be meaningful to you.

Personally, I'd prefer the relational storage approach in general even ignoring the performance issues because it makes it easier for others to interact with the data. There are far more developers that can write decent SQL than can write decent XPath expressions and there are far more query tools that can generate reports off of relational tables than can generate reports off of XML stored in a database.

Rajiv answered 13/7, 2011 at 21:42 Comment(0)
D
5

Maximus, It really depends on what you want to do with the XML data.

When I use XML for control purposes, such as configuring how a page displays, I will store the whole XML in a single BLOB field. It's fast and extremely simple. It's a simple save and load routine. You can easily view the XML in the BLOB field, and edit it.

If you need to search for or report on values inside of the XML, such as how many customers have a specific attribute, you probably want to parse into individual attributes. This will generally mean that you will have to do some pre and post processing, but allows you to quickly get to individual attributes.

Deragon answered 13/7, 2011 at 21:43 Comment(0)
L
4

Adhoc Access

If you need to run efficent queries on the data contained in the XML in an adhoc or arbitrary manner you should parse it out into Tables and Columns that can logically be index and joined upon.

Limited Access

If you are just storing the data, and delivering it based on some other criteria such as an unique id or other key, and the XML is essentially an opaque BLOB then just store it in a BLOB column and be done with it.

Hybrid Model

What you will probably need is something in between, where the XML is stored in a BLOB and only relevant bits are stored in Tables and Columns so you can search for the XML payload effectively.

Leralerch answered 13/7, 2011 at 21:42 Comment(1)
Hybrid model: Do you mean store elements frequently queried in the tables and store the xml as blob? If required other elements can be queried from the xml.Hereditary
W
1

Without knowing bit more, it is hard to say for sure, but most likely you are missing one important part that can simplify life a lot.

  1. Bind from XML to POJOs (JAXB, MOXy or JibX)
  2. Store as normalized columns from POJO (use jDBI, Hibernate, or even simple JDBC templates)

Also, depending on exactly what kind of reports you produce, perhaps consider possibility of just keeping data in memory -- every 5 minutes does not sound like performance critical, but then again persistence is not always needed (or just is for historical data or backups).

Whelk answered 13/7, 2011 at 21:52 Comment(0)
C
1

If you need to keep and query more than a couple of xml documents you should use a XML database..

eXist is nice, keep those xmls in a column or disagrete them in many tables is a bad option I think..

Contrayerva answered 13/7, 2011 at 22:13 Comment(0)
S
0

You could also check out the XMLData column type which is in Sqlserver or Xml Type in Oracle http://msdn.microsoft.com/en-us/library/hh403385.aspx

You could create computed columns on your xml data column for those xml fields that are queried the most which would help in faster retrievals. To retrieve a certain value at a certain xpath, you just need to pass the xpath to sqlserver for it to return the value at that xpath to you.

Sea answered 23/7, 2014 at 17:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.