Mapping result rows to namedtuple in python sqlite
Asked Answered
F

5

17

I am playing a bit with the python api for sqlite3, i have a little table for store languages with an id, name and creation_date fields. I am trying to map the raw query results into a namedtuple as the docs recommend, it that way i can manage rows in a more readable way, so here is my namedtuple.

LanguageRecord = namedtuple('LanguageRecord', 'id, name, creation_date')

The code that the docs suggest for the mapping is as follows:

for language in map(LanguageRecord._make, c.fetchall()):
  # do something with languages

This is fine when i want to return a collection of languages but in this case i want just to retrieve one language:

c.execute('SELECT * FROM language WHERE name=?', (name,))

So my first attempt it was something like this:

language = map(LanguageRecord._make, c.fetchone())

This code doesn't works because fetchone() returns a tuple instead a list with one tuple, so the map function tries to create three namedtuples one for each tuple field thought.

My first approach to solve this was to explicitly create a list and append to it the tuple result, something like:

languages = []
languages.append(c.fetchone())
for language in map(LanguageRecord._make, languages):
  # do something with language

My second approach was to use fetchall() although i just want one record. I can set the name field with a unique constrain in the database in order to garantize just one result.

for language in map(LanguageRecord._make, c.fetchall()):
  # do something with languages

Another approach could be use fetchall()[0] without the unique constrain to garantize just one result.

My question is which is the best and common way to deal with this problem, should i use always fetchall to maintain a common interface and let the database manage the uniqueness logic? or should i create a list explicitly as in approach 1? Is there a more easy way to accomplish this task?

Flophouse answered 2/5, 2013 at 10:29 Comment(1)
You can also iterate over the database cursor, there's no need to fetch all the records, unless you want to, so the code can be rewritten as map(LanguageRecord._make, c).Crista
R
39

There is a much easier way! Sqlite3 provides a way for the user to define "row factories". These row factories take the cursor and the tuple row and can return whatever type of object it wants.

Once you set the row factory with

con.row_factory = my_row_factory

then rows returned by the cursor will be the result of my_row_factory applied to the tuple-row. For example,

import sqlite3
import collections

LanguageRecord = collections.namedtuple('LanguageRecord', 'id name creation_date')
def namedtuple_factory(cursor, row):
    return LanguageRecord(*row)

con = sqlite3.connect(":memory:")
con.row_factory = namedtuple_factory
cur = con.cursor()
cur.execute("select 1,2,3")
print(cur.fetchone())

yields

LanguageRecord(id=1, name=2, creation_date=3)

For another example of how to define a namedtuple factory, see this post.


By the way, if you set

conn.row_factory = sqlite3.Row

then rows are returned as dicts, whose keys are the table's column names. Thus, instead of accessing parts of the namedtuple with things like row.creation_date you could just use the builtin sqlite3.Row row factory and access the equivalent with row['creation_date'].

Ringmaster answered 2/5, 2013 at 10:49 Comment(2)
sqlite3.Row is not a real dict, as it doesn't implement .get() or .__contains__().Thoron
Your solution could be even more concise. With an active cursor you can set the row factory with the following one liner; cursor.row_factory = lambda _, row: LanguageRecord(*row). I've used this pattern to create a wrapper class for a sqlite db that has several different return types.Redblooded
S
10

An improved row_factory is actually this, which can be reused for all sorts of queries:

from collections import namedtuple

def namedtuple_factory(cursor, row):
    """Returns sqlite rows as named tuples."""
    fields = [col[0] for col in cursor.description]
    Row = namedtuple("Row", fields)
    return Row(*row)

conn = sqlite3.connect(":memory:")
conn.row_factory = namedtuple_factory
cur = con.cursor()
Statutable answered 20/1, 2018 at 17:24 Comment(5)
what sets the cursor and row in the named_factory function. I don't see these values being passed to the functionIntensive
@GoldenLion row is a tuple with a single row's values. Row(*row) is where the values get passed to the named tuple that will be returned.Statutable
The row_factory points to the namedtuple_factory function. does the con.cursor invoke the point function?Intensive
The row_factory is used internally by the sqlite3 module. See docs.python.org/3/library/…Statutable
thanks, I realized that was the patternIntensive
A
1

There is another one row_factory on the top of namedtuple:

from collection import namedtuple

def namedtuple_factory(cursor, row, cls=[None]):
    rf = cls[0]
    if rf is None:
        fields = [col[0] for col in cursor.description]
        cls[0] = namedtuple("Row", fields)
        return cls[0](*row)
    return rf(*row)

One can generalize further in order to use other class factories:

def make_row_factory(cls_factory, **kw):
    def row_factory(cursor, row, cls=[None]):
        rf = cls[0]
        if rf is None:
            fields = [col[0] for col in cursor.description]
            cls[0] = cls_factory("Row", fields, **kw)
            return cls[0](*row)
        return rf(*row)
    return row_factory

These factory functions are useful for cases when all query results have same fields.

Examples:

  1. namedtuple_factory = make_row_factory(namedtuple)

  2. import dataclass

    row_factory = make_row_factory(dataclass.make_dataclass)

  3. pip3 install recordclass

    import recordclass

    row_factory = make_row_factory(recordclass.make_dataclass, fast_new=True)

Here are some performance counters to compare different ways (debian linux, 64 bit, python 3.9).

Script for creation test database:

N = 1000000
conn = sqlite3.connect('example.db')
c = conn.cursor()
c.execute('''CREATE TABLE test
             (id int, x real, y real, p int, q int)''')
gen = ((i, random(), random(), randint(0,N), randint(0,N)) for i in range(N))
c.executemany("INSERT INTO test VALUES (?,?,?,?,?)", gen)
conn.commit()
conn.close()

Default:

conn = sqlite3.connect('example.db')
c = conn.cursor()
%time res = [row for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')

CPU times: user 971 ms, sys: 92.1 ms, total: 1.06 s
Wall time: 1.06 s
80 Mb

sqlite3.Row:

conn = sqlite3.connect('example.db')
conn.row_factory = sqlite3.Row
c = conn.cursor()
%time res = [row for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
# print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')

CPU times: user 1.11 s, sys: 80.1 ms, total: 1.19 s
Wall time: 1.19 s

namedtuple:

from collections import namedtuple
Row = namedtuple("Row", "id x y p q")
conn = sqlite3.connect('example.db')
c = conn.cursor()
%time res = [Row(*row) for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')

CPU times: user 1.89 s, sys: 71.8 ms, total: 1.96 s
Wall time: 1.96 s
80 Mb

namedtuple-based row factory:

conn = sqlite3.connect('example.db')
conn.row_factory = make_row_factory(namedtuple)
c = conn.cursor()
%time res = [row for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')

CPU times: user 1.93 s, sys: 116 ms, total: 2.05 s
Wall time: 2.05 s
80 Mb

recordclass:

from recordclass import make_dataclass
Row = make_dataclass("Row", "id x y p q", fast_new=True)
conn = sqlite3.connect('example.db')
c = conn.cursor()
%time res = [Row(*row) for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')

CPU times: user 1 s, sys: 72.2 ms, total: 1.08 s
Wall time: 1.07 s
56 Mb

recordclass-based row factory:

conn = sqlite3.connect('example.db')
conn.row_factory = make_row_factory(make_dataclass, fast_new=True)
c = conn.cursor()
%time res = [row for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')

CPU times: user 1.11 s, sys: 76.2 ms, total: 1.19 s
Wall time: 1.19 s
56 Mb
Ataman answered 5/8, 2021 at 6:12 Comment(0)
I
0

I demonstrate how to take a resulting dataframe from a sql query and convert it into a namedtuple list. I did not dynamic bind the dataframe columns to the nametuple names, not sure if this is possible.

LanguageRecord=namedtuple('Generic',['id','name','creation_date'])
def map_to_language_record(row):
    return LanguageRecord(row.id, row.name, row.creation_date)

df=pd.DataFrame({'id':[1,2,3],'name':['bob','dick','jane'],'creation_date': 
  ['1/1/2021','1/2/2021','1/3/2021']})

languages = list(map(map_to_language_record, df.itertuples()))
print(languages)

output:

[Generic(id=1, name='bob', creation_date='1/1/2021'), Generic(id=2, name='dick', creation_date='1/2/2021'), Generic(id=3, name='jane', creation_date='1/3/2021')]
Intensive answered 1/10, 2021 at 15:19 Comment(0)
V
-2

I think better to use for language in map(LanguageRecord._make, c.fetchall()[:1]): Because it can cause IndexError with fetchall()[0].

If you need one result and there is already "WHERE" in query. As I understand query should return one row. Early optimization is evil. :)

Verdieverdigris answered 2/5, 2013 at 10:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.