Read the content of the string intern pool
Asked Answered
M

2

6

I would like to enumerate the strings that are in the string intern pool.

That is to say, I want to get the list of all the instances s of string such that:

string.IsInterned(s) != null

Does anyone know if it's possible?

Malayalam answered 4/3, 2014 at 12:37 Comment(13)
Curious: Why do you like to do that ?Calcariferous
Both research and fun :-)Malayalam
Possibly related, though I don't think there's a direct answer specifically to your question (but considering this is for "research and fun", there's a lot of info): #2329245Lissa
Hmm I guess it could be possible via the profiling API (msdn.microsoft.com/en-us/library/bb384493(v=vs.110).aspx). However, someone more knowledgeable in this field should provide a detailed answer.Unflinching
So no, .NET does not provide access to the hashtable. It's hidden in internal calls to c++ files in the SSCLI. And it's only an implementation detail which could change whenever MS wants. I assume that this is also the reason why it's not exposed.Moult
@TimSchmelter The callsite corroborates this.Psalter
@TimSchmelter What is you walk all objects on the heap using the profiling API, select strings, and call IsInterned?Unflinching
@OndrejTucny: i don't know, i have never used the profiling API before. However, i think that even if that could work you would indirectly modify the intern-pool by tracking the objects. You could f.e. prevent the garbage collector from removing strings from the pool, hence you'd impact the results.Moult
This should prove interesting... https://mcmap.net/q/654448/-c-strings-with-same-contentsOystercatcher
@PaulZahra: which proves my argument wrong that tracking the intern-pool could prevent them from being garbage-collected (J.Skeet says that the intern-pool is not garbage collected as long as the app-domain lives).Moult
@TimSchmelter I don't think that is strictly correct, they can live longer than that!... "First, the memory allocated for interned String objects is not likely be released until the common language runtime (CLR) terminates. The reason is that the CLR's reference to the interned String object can persist after your application, or even your application domain, terminates." - Taken from msdn.microsoft.com/en-us/library/system.string.intern.aspxOystercatcher
You are 99% there by enumerating the strings in the assembly metadata, IMetaDataImport::EnumUserStrings().Livelong
@HansPassant, thanks I'll dig in that directionMalayalam
M
2

Thanks to the advice of @HansPassant, I managed to get the list of string literals in an assembly. Which is extremely close to what I originally wanted.

You need to use read assembly meta-data, and enumerate user-strings. This can be done with these three methods of IMetaDataImport:

[ComImport, Guid("7DAC8207-D3AE-4C75-9B67-92801A497D44")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
public interface IMetaDataImport
{
    void CloseEnum(IntPtr hEnum);

    uint GetUserString(uint stk, [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)] char[] szString, uint cchString, out uint pchString);

    uint EnumUserStrings(ref IntPtr phEnum, [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 1)]uint[] rStrings, uint cmax, out uint pcStrings);

    // interface also contains 62 irrelevant methods
}

To get the instance of IMetaDataImport, you need to get a IMetaDataDispenser:

[ComImport, Guid("809C652E-7396-11D2-9771-00A0C9B4D50C")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
[CoClass(typeof(CorMetaDataDispenser))]
interface IMetaDataDispenser
{
    uint OpenScope([MarshalAs(UnmanagedType.LPWStr)]string szScope, uint dwOpenFlags, ref Guid riid, [MarshalAs(UnmanagedType.Interface)] out object ppIUnk);

    // interface also contains 2 irrelevant methods
}

[ComImport, Guid("E5CB7A31-7512-11D2-89CE-0080C792E5D8")]
class CorMetaDataDispenser
{
}

Here is how it goes:

var dispenser = new IMetaDataDispenser();
var metaDataImportGuid = new Guid("7DAC8207-D3AE-4C75-9B67-92801A497D44");

object scope;
var hr = dispenser.OpenScope(location, 0, ref metaDataImportGuid, out scope);

metaDataImport = (IMetaDataImport)scope;    

where location is the path to the assembly file.

After that, calling EnumUserStrings() and GetUserString() is straighforward.

Here is a blog post with more detail, and a demo project on GitHub.

Malayalam answered 13/3, 2014 at 13:5 Comment(0)
K
1

The SSCLI function that its pointing to is

STRINGREF*AppDomainStringLiteralMap::GetStringLiteral(EEStringData *pStringData) 
{ 
    ... 
    DWORD dwHash = m_StringToEntryHashTable->GetHash(pStringData);
    if (m_StringToEntryHashTable->GetValue(pStringData, &Data, dwHash))
    {
        STRINGREF *pStrObj = NULL;
        pStrObj = ((StringLiteralEntry*)Data)->GetStringObject();
        _ASSERTE(!bAddIfNotFound || pStrObj);
        return pStrObj;
    }
    else { ... }

    return NULL; //Here, if this returns, the string is not interned
}

If you manage to find the native address of m_StringToEntryHashTable, you can enumerate the strings that exist.

Kermitkermy answered 6/3, 2014 at 22:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.