In Python, why do separate dictionary string values pass "in" equality checks? ( string Interning Experiment )

I am building a Python utility that will involve mapping integers to word strings, where many integers might map to the same string. From my understanding, Python interns short strings and most hard-coded strings by default, saving memory overhead as a result by keeping a "canonical" version of the string in a table. I thought that I could benefit from this by interning string values, even though string interning is built more for key hashing optimization. I wrote a quick test that checks string equality for long strings, first with just strings stored in a list, and then strings stored in a dictionary as values. The behavior is unexpected to me:

import sys

top = 10000

non1 = []
non2 = []
for i in range(top):
    s1 = '{:010d}'.format(i)
    s2 = '{:010d}'.format(i)
    non1.append(s1)
    non2.append(s2)

same = True
for i in range(top):
    same = same and (non1[i] is non2[i])
print("non: ", same) # prints False
del non1[:]
del non2[:]


with1 = []
with2 = []
for i in range(top):
    s1 = sys.intern('{:010d}'.format(i))
    s2 = sys.intern('{:010d}'.format(i))
    with1.append(s1)
    with2.append(s2)

same = True
for i in range(top):
    same = same and (with1[i] is with2[i])
print("with: ", same) # prints True

###############################

non_dict = {}
non_dict[1] = "this is a long string"
non_dict[2] = "this is another long string"
non_dict[3] = "this is a long string"
non_dict[4] = "this is another long string"

with_dict = {}
with_dict[1] = sys.intern("this is a long string")
with_dict[2] = sys.intern("this is another long string")
with_dict[3] = sys.intern("this is a long string")
with_dict[4] = sys.intern("this is another long string")

print("non: ",  non_dict[1] is non_dict[3] and non_dict[2] is non_dict[4]) # prints True ???
print("with: ", with_dict[1] is with_dict[3] and with_dict[2] is with_dict[4]) # prints True

I thought that the non-dict checks would result in a "False" print-out, but I was clearly mistaken. Would anyone know what is happening, and whether string interning would yield any benefits at all in my case? I could have many, many more keys than single value if I consolidate data from several input texts, so I am searching for a way to save memory space. (Maybe I will have to use a data-base, but that is outside the scope of this question.) Thank you in advance!

Recommended topics

Hot tags