Join operation with NOSQL
Asked Answered
A

4

54

I have gone through some articles regarding Bigtable and NOSQL. It is very interesting that they avoid JOIN operations.

As a basic example, let's take Employee and Department table and assume the data is spread across multiple tables / servers.

Just want to know, if data is spread across multiple servers, how do we do JOIN or UNION operations?

Arethaarethusa answered 3/1, 2010 at 15:4 Comment(2)
You want a SQL JOIN or UNION operation in a product called NoSQL?Luing
Simple, you use playOrm and join partitions(partitions are usually less than 1 million rows but the table can be infinite) and it performs well.Mcnamee
G
53

When you have extremely large data, you probably want to avoid joins. This is because the overhead of an individual key lookup is relatively large (the service needs to figure out which node(s) to query, and query them in parallel and wait for responses). By overhead, I mean latency, not throughput limitation.

This makes joins suck really badly as you'd need to do a lot of foreign key lookups, which would end up going to many,many different nodes (in many cases). So you'd want to avoid this as a pattern.

If it doesn't happen very often, you could probably take the hit, but if you're going to want to do a lot of them, it may be worth "denormalising" the data.

The kind of stuff which gets stored in NoSQL stores is typically pretty "abnormal" in the first place. It is not uncommon to duplicate the same data in all sorts of different places to make lookups easier.

Additionally most nosql don't (really) support secondary indexes either, which means you have to duplicate stuff if you want to query by any other criterion.

If you're storing data such as employees and departments, you're really better off with a conventional database.

Galasyn answered 3/1, 2010 at 22:26 Comment(1)
This link was helpful when I was trying to understand this: allbuttonspressed.com/blog/django/2010/09/…Durance
B
7

You would have to do multiple selects, and join the data manually in your application. See this SO post for more information. From that post:

Bigtable datasets can be queried from services like AppEngine using a language called GQL ("gee-kwal") which is a based on a subset of SQL. Conspicuously missing from GQL is any sort of JOIN command. Because of the distributed nature of a Bigtable database, performing a join between two tables would be terribly inefficient. Instead, the programmer has to implement such logic in his application, or design his application so as to not need it.

Breeches answered 3/1, 2010 at 15:13 Comment(0)
G
4

Natively, unfortunately, is not possible to perform a Join into a NoSQL database. This is actually one of the biggest differences between SQL and NoSQL DBs.

As @kaleb said, you would have to do multiple selections and then join the needed information "manually".

Luckily, there are ORMs frameworks such as Prisma that will allow you to "fake" the native SQL join feature.

Note: you're still performing multiple db calls under the hood, increasing the read-ops, and everything that's concerned.

" A key feature of Prisma Client is the ability to query relations between two or more models. " -> https://www.prisma.io/

example:

 const getUser = await prisma.user.findUnique({
  where: {
    id: 19,
  },
  select: {
    name: true,
    posts: {
      select: {
        title: true,
      },
    },
  },
})

In this case, the posts are stored in a different table, but Prisma is able to fetch them and join them into the User object.

Granuloma answered 22/5, 2021 at 10:3 Comment(4)
Some NoSQL databases can do joins, so the blanket statement is falseStreptococcus
Please expand your answer. Aren't NoSQL able to perform JOIN queries only via aggregation queries that bring time complexity of the query itself to O(log(N))+Nlog(M)? Where M and N are the involved documents?Granuloma
Not all databases use documents either. Neo4j, for example, does Graph traversal through relations, thereby "joining" nodes into paths through edges. Couchbase does joins with indicies. Apache Hive/Impala/Trino do joins on large semi-structured filesets via MapReduce-like operations.Streptococcus
Regading your answer here, though, Prisma only lists support for SQL databases, and doesn't address the question about BigTableStreptococcus
D
3

Kaleb's right. You write custom code with a NoSQL solution if your data doesn't fit well into a key-value store. Map-reduce/async processing and custom view caches are common. Brian Aker gave a very funny (and satirical and biased) presentation at the Nov 2009 OpenSQLCamp http://www.youtube.com/watch?v=LhnGarRsKnA. Skip in 40 seconds to hear about joins.

Drumm answered 3/1, 2010 at 15:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.