I want to set up a mock database (as opposed to creating a test database if possible) to check if the data is being properly queried and than being converted into a Pandas dataframe. I have some experience with mock and unit testing and have set-up previous test successfully. However, I'm having difficulty in applying how to mock real-life objects like databases for testing.
Currently, I'm having trouble generating a result when my test is run. I believe that I'm not mocking the database object correctly, I'm missing a step involved or my thought process is incorrect. I put my tests and my code to be tested in the same script to simplify things.
- I've thoroughly read thorough the Python unittest and mock documentation so I know what it does and how it works (For the most part).
- I've read countless posts on mocking in Stack and outside of it as well. They were helpful in understanding general concepts and what can be done in those specific circumstances outlined, but I could not get it to work in my situation.
- I've tried mocking various aspects of the function including the database connection, query and using the 'pd_read_sql(query, con)' function to no avail. I believe this is the closest I got.
My Most Recent Code for Testing
import pandas as pd
import pyodbc
import unittest
import pandas.util.testing as tm
from unittest import mock
# Function that I want to test
def p2ctt_data_frame():
conn = pyodbc.connect(
r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};'
r'DBQ=My\Path\To\Actual\Database\Access Database.accdb;'
)
query = 'select * from P2CTT_2016_Plus0HHs'
# I want to make sure this dataframe object is created as intended
df = pd.read_sql(query, conn)
return df
class TestMockDatabase(unittest.TestCase):
@mock.patch('directory1.script1.pyodbc.connect') # Mocking connection
def test_mock_database(self, mock_access_database):
# The dataframe I expect as the output after query is run on the 'mock database'
expected_result = pd.DataFrame({
'POSTAL_CODE':[
'A0A0A1'
],
'DA_ID':[
1001001
],
'GHHDS_DA':[
100
]
})
# This is the line that I believe is wrong. I want to create a return value that mocks an Access table
mock_access_database.connect().return_value = [('POSTAL_CODE', 'DA_ID', 'GHHDS_DA'), ('A0A0A1', 1001001, 100)]
result = p2ctt_data_frame() # Run original function on the mock database
tm.assert_frame_equal(result, expected_result)
if __name__ == "__main__":
unittest.main()
I expect that the expected dataframe and the result after running the test using the mock database object is one and the same. This is not the case.
Currently, if I print out the result when trying to mock the database I get:
Empty DataFrame Columns: [] Index: []
Furthermore, I get the following error after the test is run:
AssertionError: DataFrame are different;
DataFrame shape mismatch
[left]: (0, 0)
[right]: (1, 3)