OLE DB vs OPEN XML SDK vs Excel.interop
Asked Answered
J

1

11

I need to read XLSX files and extract a maximum amount of content from it. Which of the API's should I use?

OLE DB, open XML SDK, or Excel Interop?

  • Which is the easiest to use?
  • Can you retrieve all the information using one or the other? i.e, date, times, merged cells, tables, pivottables, etc.
Judiciary answered 28/4, 2012 at 16:33 Comment(1)
Open XML SDK wouldn't be that straightforward at first, but the good thing there are sound open source libraries which makes the tasks trivial. My advice is if you dont mind using third party dlls, then Open XML is the way to go. Two excellent libraries I can suggest are ClosedXML and EPPlus. You can straightaway start querying with Linq, which is cool.Wilfredowilfrid
G
19

You can try all of them and choose the one that fits you most...

Depending on data you want to read, I'd suggest you to use Open XML over Interop or Ole DB.
I don't know an open XML SDK, although I've some experience with EPPlus library which I'm using a lot and can say only good words about it - it's fast, easy to learn, with good examples. The library is basing on Open Office XML format, so I suppose it's pretty much the same as the SDK you've mentioned, and is capable of easy read and write Excel 2007 and 2010 files.
On the linked web, you'll find a library itself, documentation and some example "Hello World" projects to download.

Why that library in the first place? Because with it you will be able to read not only cells values, but also their colors, fonts, widths and heights, merging and all that detailed stuff, that you can not only read, but modify as well. What's more, you don't need the Excel installed to do that.

On the second place - just in case you need to extract tabular data from worksheet - you may play with OLE DB. I'm afraid with that you won't be able to extract any info about formats, colors etc., as well as the data must be in a tabular organized worksheet, so you can treat is as a database's table.

The last one is Interop, because:
- it's a COM library, so you need to be very careful when playing with it via .NET, as it's easy to cause some ugly and hard to find memory leaks (confirmed by myself bad experience) - if you don't dispose their objects properly, it leaves the Excel.exe process opened,
- it's much slower than previous methods,
- basically, it has almost no more added value that one of the previous methods (EPPlus or OleDB) and requires Excel to be installed on client's machine, so why to use it?

Good luck, then.

Glyn answered 28/4, 2012 at 18:42 Comment(3)
Very comprehensive answer, thank you. I've tested the openxml SDK, and i must say it requires some bunch of boilerplate code. Let's have a look ath the EPP.Judiciary
One reason to go COM route will be the comprehensive API from Microsoft. While third party libraries must cover the use case of 99 pc developers they may not be still feature complete. Good answer +1. I like Open XML option the most here. Just sad you gotta write lots of boilerplate.Wilfredowilfrid
Microsoft itself does not recommend using Interop for Server DeploymentShulins

© 2022 - 2024 — McMap. All rights reserved.