Suppose you have a table in a database constructed as follows:
create table data (v int, base int, w_td float);
insert into data values (99,1,4);
insert into data values (99,2,3);
insert into data values (99,3,4);
insert into data values (1234,2,5);
insert into data values (1234,3,2);
insert into data values (1234,4,3);
To be clear select * from data
should output:
v |base|w_td
--------------
99 |1 |4.0
99 |2 |3.0
99 |3 |4.0
1234|2 |5.0
1234|3 |2.0
1234|4 |3.0
Note that since the vectors are stored in a database, we need only store the non-zero entries. In this example, we only have two vectors $v_{99} = (4,3,4,0)$ and $v_{1234} = (0,5,2,3)$ both in $\mathbb{R}^4$.
The cosine similarity of those vectors should be $\displaystyle \frac{23}{\sqrt{41 \cdot 38}} = 0.5826987807288609$.
How do you compute the cosine similarity using nearly only SQL
?
I say nearly because you will need the sqrt
function which is not always provided in basic SQL
implementations, for example it is not in sqlite3
!
sqrt
in sqlite3 by compiling and using math extension library. See here for details. – Anthracoid