How to get list of ONLY excel worksheet names in Excel using OLEDB; filter out non-worksheets that show up in metadata
Asked Answered
L

4

10

I have an issue getting worksheet names from an Excel spreadsheet using OLEDB. The problem is that when I use GetOleDbSchemaTable, the resulting DataTable has more than just the actual worksheet names; it has extra rows for "Tables" that I can only assume are used internally by Excel.

So for example, if I have a worksheet named myWorksheet, the code below might end up with a list that contains myWorksheet$, myWorksheet$PrintTable and myWorksheet$_. Only the first myWorksheet$ record is for the actual worksheet. The others are just garbage that I don't need. When you look at them in metadata they look just like regular tables, even with the type of TABLE.

For now I just manually filtered out anything with "$_" or "$Print" in the name, but who knows what other Excel feature might make these extra records turn up in a different format.

Does anyone know the best way to get ONLY actual worksheet names, and not these internal tables that aren't worksheets? Is there something in metadata that would differentiate them?

 private ArrayList getXlsWorksheetNames(OleDb.OleDbConnection conn)
    {
        ArrayList wsList = new ArrayList();
        DataTable schemaTable;

        try
        {
            conn.Open();
            schemaTable = conn.GetOleDbSchemaTable(OleDb.OleDbSchemaGuid.Tables, null);

            foreach (DataRow row in schemaTable.Rows)
            {
                //form.appendToResultsTxt("Adding worksheet to list: " + Environment.NewLine +
                //    "Name = " + row.Field<string>("TABLE_NAME") + "," + Environment.NewLine +
                //    "Type = " + row.Field<string>("TABLE_TYPE") + "," + Environment.NewLine + Environment.NewLine);
                wsList.Add(row.Field<string>("TABLE_NAME"));
            }
            conn.Close();
        }
        catch (Exception ex)
        {
            if (this.mode == Cps2TxtUtilModes.GUI_MODE)
            {
                this.form.appendToResultsTxt(ex.ToString());
            }
            throw;
        }

        return wsList;
    }

I read through the article at this link, but they don't seem to be doing anything differently than I am, and I don't see any filtering out of extra non-worksheet tables, so Microsoft doesn't seem to have provided the right answer.

http://support.microsoft.com/kb/318452

And I've also looked around alot of StackOverflow, like at the thread from the link below which was helpful, but doesn't solve this one problem.

Using Excel OleDb to get sheet names IN SHEET ORDER

Before anyone asks, I'd also like to say that I don't really have control over what features are used in the spreadsheet, so I can't just tell them "Don't turn on filtering" or "Don't use print tables".

Any ideas are much appreciated. Thanks!

Logorrhea answered 18/5, 2012 at 14:24 Comment(7)
Just filter out anything that doesn't with "$". I've been using that for a while and it's never failed, even when there are pivot tables, filtered lists, etc.Oid
Thank you; I'll do that. Like I said in my comment below, that's less than ideal because it relies on an implementation detail, but if that's what I've got to do, then so be it.Logorrhea
My answer to the question you have linked in above shows how you could do it using DAO, but if you don't want to use COM then that's not an option. However the addendum to the answer by Esen shows yet another way by renaming the .xlsx to .zip, opening the zip file contents and reading one of the xml files therein for the sheet names. I haven't tested to see if hidden sheets appear in there and it will only work on 2007 (.xlsx) files, but it might be worth a shot.Oid
Good info Sid, but unfortunately for me I have to support Excel 2003, and I definitely don't want to use COM / interop. It's very good to know though that XLSX files can be renamed to zips and opened to get some gems of metadata. Hope that tip helps others; it might help me at some point in the future, but not on this project.Logorrhea
Just FYI to everyone that might look at this for future reference. If a worksheet has a space in the name, the table name in metadata will have single quotes around it, meaning that the $ WILL NOT be the last character in the table name, even if it's a valid worksheet. So just looking for everything that has $ as the last character isn't good enough. You have to first remove any surrounding single qoutes, then evaluate whether $ is the last character. This, to say the least, is lame. Why doesn't Microsoft include something in metadata that signifies a user worksheet?Logorrhea
Indeed this is a problem, I did some "research" and if I create a file in Excel 2013 and save it in both V8 and V12 that is the older 2003 binary or 2010 OXML format, both files return just one table from the schema, however I exported some data from a table from SQL Server with the export wizard and that file had the Table and Table$ inside its schema. So it seems to vary between components used to create the file I guess.Spot
There is some info on how to retrieve specific info "only" here: msdn.microsoft.com/en-us/library/… Specifically for tables here: msdn.microsoft.com/en-us/library/… So in theory if you pass the second argument of GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null) as new object[]{null, null, null, "TABLE"} it is supposed to filter tables only, sadly, this does not work with excel. Maybe you already know this, just wanted to post if for the record.Spot
B
5

The question is old, but for those who found it now, the skipping can be done as Jim found...

// skip those that do not end correctly
foreach (DataRow row in schemTable.Rows)
{
    string sheetName = row["TABLE_NAME"].ToString();
    if (!sheetName.EndsWith("$") && !sheetName.EndsWith("$'"))
        continue;
    Console.WriteLine(sheetName);
}

That is the wanted are or those that end with $ or those that end with $'.

Bogoch answered 30/9, 2015 at 14:24 Comment(0)
E
2

From experience, it seems to be all those whose name ends in a dollar sign. I've come across scenarios from clients where extra worksheets seemed to be appearing which weren't present in the data - these later turned out to be hidden worksheets in Excel!

Exuberance answered 18/5, 2012 at 15:34 Comment(5)
Yes, these "hidden" tables are the things causing the problem. You're probably right that I could just look for '$' as the very last character in the name. However, some of these "hidden" tables do have $ in the names, and I was more looking for a definitive way to tell from some metadata property if a "Table" is a worksheet, or if it's some other type of internally used Excel table. Thanks for the answer anyways though. Anyone else know of anything in Excel metadata to differentiate?Logorrhea
I haven't found a way to only see which tables are visible as yet, I'm guessing we'd need to use Interop for something like that. There's nothing in the GetOleDbSchemaTable table to differentiate them. If you do find something, please do share and I'll update my program too!Exuberance
Definitely don't want to use interop! Thanks though for the info. I've just been wary that maybe there is some feature I don't know about that will result in a non-worksheet "table" that has a $ at the end of the name, but I guess it's good enough for now to just assume that isn't the case. Relying on implementation details sucks though.Logorrhea
Hi Jim, I am having the same kind of issue and its frustrating. Did you find out any proper solution for this specific problem?Extrusion
@Tejas, sorry no, the only solution I've had so far is the hacky way that relies on the implementation detail of the $ being either the last character, or if a single quote is the last character, then the $ will be the second to last character. I'm as aggravated about it as you! Why would we have to rely on an implementation detail like that? It's so stupid.Logorrhea
E
0

The first way that comes to my mind is the same way akash88 listed in your link to Using Excel OleDb to get sheet names IN SHEET ORDER link.

You can take akash88's approach and clean it up a little so the code is nicer to read.

        var wsList = from s in schemaTable
                     where s.Field<string>("TABLE_NAME").Contains("$")
                     select s.Field<string>("TABLE_NAME");
Eisegesis answered 5/10, 2012 at 18:56 Comment(1)
Please read the entire post and comments, and you'll see that your answer isn't really adequate; it will return extra records for Excel internal tables like Filter Tables. The original post specifically mentioned this and specifically asked for a way to get only actual user worksheets from metadata. I've found since posting this that there is no standard way to tell from metadata except by relying on the implementation detail that all user worksheets have a $ as either the very last character, or second to last char if the name has a space and thus is surrounded by double quotes.Logorrhea
T
0

You can test EndsWith("$") instead of Contains("$") like below:

List<String> lstsheetNames = new List<String>();
String sheetName;
foreach (DataRow row in schemaTable.Rows)
{
    sheetName = row.Field<string>("TABLE_NAME");
    String strTemp = sheetName.Split(' ');

    if(strTemp.Length == 1 && sheetName.EndsWith("$"))
       lstsheetNames.Add(sheetName.Substring(0, sheetName.Length - 1));

    else if(strTemp.Length > 1 && strTemp.GetValue(strTemp.Length - 1).ToString().EndsWith("$'"))
       lstsheetNames.Add(sheetName.Substring(1, sheetName.Length - 3));
}

I have used this code in a same problem and it works fine.

Edit : Sorry,I did not pay attention to this.I changed the code now.It might not the best or shortest way but it works.

Tripper answered 28/1, 2013 at 9:58 Comment(1)
Does nobody read the actual post? I've mentioned several times that this doesn't work if the table name has a space, because it will end with a quote instead of a $, and I also specifically asked for a real solution where the metadata indicates a user table and not an internal table, instead of just relying on an implementation detail that I very clearly already know exists because its in my OP.Logorrhea

© 2022 - 2024 — McMap. All rights reserved.