Groupings of queries

Asked 27/10, 2020 at 22:59 Answered 5/11, 2020 at 19:58

Solved sql database mdx graph-databases database-theory

I would like to understand what might be the highest-level groupings of how query languages can be broken up into, and why one grouping might be fundamentally different than another. For example, the groupings that I have come up with now (for general-purpose usage) are:

Relational
Example: SQL
Document
Example: XQuery, JSONPath, MQL (mongoDB)
Graph
Example: Cypher (Neo4j)
Other possibilities (?)
Dataframe/pandas? multidimensional (MDX)?

What might be the best high-level grouping to describe various query languages?

Dieldrin answered 27/10, 2020 at 22:59 Comment(3)

Your list is a mix of different types of data. Relatinal and non-relational are database types, SQL and NoSQL are language types and document, graph, column, etc. are just sub-categories of NoSQL-based (non-relational) databases – Birecree 6/11, 2020 at 9:29

@SvenEschlbeck understood but I think that's too broad and doesn't do much help. It's almost like categorizing stackoverflow tags into "Database" and "Not-Database". Also, doesn't sql-server, postgres, oracle support json, xml types? spatial data? graph relationships (sequel server even has two table categories for graph data for "Nodes" and "Edges"). – Dieldrin 6/11, 2020 at 20:5

Yes, of course, those types exist. But you asked for "grouping of [...] query languages". Otherwise, what you really want to know is the way data can be categorized inside of SQL or NoSQL. If you follow some of my links below, you can learn more about that. – Birecree 6/11, 2020 at 22:6

One variant is to group the query language depending on the database categories.

relational (Microsoft SQL Server, Oracle, MySQL, MariaDB)
object-relational (PostgreSQL)
NoSQL
- Key-value (Riak, Redis, Couchbase Server, MemcacheDB)
- Columnar (HBase)
- Document (MongoDV, CouchDB)
- Graph (Neo4j)

So far, so good, but in reality the border line between the categories become thinner and thinner.

For example, we have graph support in Microsoft SQL Server and T-SQL we have syntax like the following:

-- Find Restaurants that John's friends like
SELECT Restaurant.name 
FROM Person person1, Person person2, likes, friendOf, Restaurant
WHERE MATCH(person1-(friendOf)->person2-(likes)->Restaurant)
AND person1.name='John';

In MongoDB, we have graph,too using graph lookup:

{
   $graphLookup: {
      from: <collection>,
      startWith: <expression>,
      connectFromField: <string>,
      connectToField: <string>,
      as: <string>,
      maxDepth: <number>,
      depthField: <string>,
      restrictSearchWithMatch: <document>
   }
}

So, maybe the the highest-level grouping is just a group of database management system following the American National Standards Institute (ANSI) standards (relational and object-relational) and the others.

Overpass answered 1/11, 2020 at 18:55 Comment(1)

Your answer is right but your list is confusing. It is indeed a possible way to differ the languages based on the database type. However, you cannot mix up relational and NoSQL, one is a mathematical model, the other one is a query language. Furthermore, he asked for high-level grouping. In that case, object-relational is still relational and not its own category. – Birecree 6/11, 2020 at 9:32

I will try to answer this question from analytics perspective.

Relational Database (DBMS):

SQL is one of the most common Functional Programming Languages that has been used to deal with the relationship between tables.

In terms of Data Analytics, we're using GROUP BY clause to summarize our data;

An important component for Analyst to summarize the data such as sales, profit, cost, and salary. Data Summarization is very helpful for the Analyst to create a visualization, conclude findings, and report writing. In SQL, GROUP BY Clause is one of the tools to summarize or aggregate the data series. For example, sum up the daily sales and combine in a single quarter and show it to senior management. Similarly, if you want to count how many employees in each department of the company. It groups the databases based on one or more column and aggregates the results., GROUP BY and HAVING Clause in SQL by Avinash Navlani

More in details:

Grouping in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has the same values in different rows then it will arrange these rows in a group.(1)

Simple Syntax

SELECT column1, function_name(column2)
FROM table_name
WHERE condition
GROUP BY column1, column2
ORDER BY column1, column2;

function_name: Name of the function used for example, SUM() , AVG().
table_name: Name of the table.
condition: Condition used.

Documents

Our example here will be about MongoDB.

When we're talking about Grouping in MongoDB, we've to mention the aggregation process when we're dealing with multiple documents.

Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together and can perform a variety of operations on the grouped data to return a single result. In SQL count(*) and with a group by is an equivalent of MongoDB aggregation. (2)

What's the difference between `GROUPing` in table and Document?

This question has to have 3 keys to answer is: (3)

1- What kind of data are you using?
- If you're using data that are connected, the best approach you can use is SQL.
2- What type of process you want to do?
- SQL databases are better for multi-row transactions, NoSQL are better for unstructured data like documents or JSON.
3- What is your data scalability?
- SQL databases are vertically scalable, NoSQL databases are horizontally scalable. Which means in terms of High-level-Grouping, SQL will be the winning card in terms of heavy and in-depth grouping beside more flexible in Normalization.

Graph

Example: Cypher (4)

Cypher is like SQL a declarative, textual query language, but for graphs.

It consists of clauses, keywords and expressions like predicates and functions, many of which will be familiar (like WHERE, ORDER BY, SKIP LIMIT, AND, p.unitPrice > 10).

Unlike SQL, Cypher is all about expressing graph patterns.
Grouping in Cypher is focusing on Virtualization aspect of the data to give you the big-picture. But, it's not useful in the processing aspect. In terms of heavy data, it will not be very efficient like Relational Tables but on the other hand, the data will be virtualized.
Grouping-with-high-level, cypher will be not recommended for it.

Other possibilities

Example: Dataframe/pandas

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas dataframe.groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.(5)

Syntax

Syntax: DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)

Parameters :

by: mapping, function, str, or iterable

axis: int, default 0

level: If the axis is a MultiIndex (hierarchical), group by a particular level or levels

as_index: For aggregated output, return an object with group labels as the index. Only relevant for DataFrame input. as_index=False is effective “SQL-style” grouped output

sort: Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. groupby preserves the order of rows within each group.

group_keys: When calling apply, add group keys to an index to identify pieces

squeeze: Reduce the dimensionality of the return type if possible, otherwise return a consistent type

Returns: GroupBy object

If we'll compare between pandas and other methods that we mentioned above in terms of data analytics, Python pandas will definitely be the green card.
- Scalability of pandas is HUGE!.
- Light-weight compare to any functional programming.
- It fits perfectly will largely amount of data.

Conclusion

At the end, each one of these methods works depending on three things that I mentioned before:

What kind of data are you using.
What type of process you want to do.
What is your data scalability.

Reference

References has been attached into each section to be reachable.

Scheld answered 31/10, 2020 at 9:31 Comment(0)

You probably already have the answer...

I mean this grouping is also what I can think of.

I haven't used graph databases, but in the other two, Relational and NoSql, SQL, or relational languages, are used to query the multiple types of relationships as the name suggests which is its distinctive feature and they also have a fixed schema.

In the case of Document-based or NoSQL, the distinctive feature is that the schema is very flexible, also usually related data is stored inside the same doc.

Graph, I have little idea about them. But as far as I know, they are just NoSQL with the ability to query relationships. Combining the distinctive features of RBDMS and non-RBDMS (NoSQL).

Dataframes are usually used for fast operations as required in data processing. They are in-memory datastore. They don't have the ability to fetch relations on their own. We have to perform operations on them from scratch.

Oldfangled answered 31/10, 2020 at 6:12 Comment(0)

In the highest level you can ask, what a database really is. Is it every form of accumulated data? Most people agree that a database is a sortiment of data that is organized or structured in some way.

You can differentiate between data lakes, data vaults and data warehouses. Data lakes are an inhomogeneous group of data which is stored in one or more databases. The purpose is to store a great amount of information quickly. However, the data is not pre-structured. Therefore, searching or analyzing might take longer. Data vaults consist of multiple databases or database schemas with every one of them containing data of a certain value, purpose or type. This takes a lot of initial work when setting up the vault but is highly effective, once it comes to analyzing data. It can also process a lot of information parallely and is often used with cloud computing. If you want to access data quickly based on certain tags, filters or topics, then this is the way to go. Finally, data warehouses are made up of many databases and can be described as a super database, which is why it is often seen as the king's discipline of database management.

The databases themselves can mainly be categorized into relational/non-relational or sequential/non-sequential. Relational databases follow the goal that every table or entity should/can in some way be linked or connected to any other table or entity. This makes it easy to see the relations or dependencies between various entries. Also, searching, filtering or debugging data becomes easier. Nevertheless, it takes a lot of effort to keep track of all the relations and often, database administrators or developers have a hard time considering all the combinations and links when editing code or documents. Furthermore, relational databases use complexe data base management systems (DBMS) which contain some heavy algebra. Relational databases are e.g. Oracle, PostgreSQL, MySQL. They all depend on the Structured Query Language (SQL). With small differences, they all use the same basic commands to alter, edit, search or write data. There are further sub-categories like type-relational, object-relational, etc. but the differences are rather insignificant.

Non-relational databases are less complexe, easier to maintain and they are not as sensitive to logical or mathematical errors as relational ones. But they can be less useful for big amounts of data or for purposes like data mining, quick search or personal information storage. Data is mostly stored in form of different data types. Instead of strictly sticking to the table-row concept, they can contain users, bookings, documents of various shapes and forms. The greatest weakness of those databases is their lack of "smart connection". Because of non-existant links between documents, specific queries or search efforts can take a long time. Also, double entries, missing entries or mistakes are less likely to be detected immediately by the software system. Non-relational databases can be sub-categorized into key-value-pair types, broad row tables, document stores, search engine bases or graph/image data bases. Examples include Neo4J, Datastax Enterprise Graph, some NoSQL bases like Couchbase and MongoDB or Scyalla and Cassandra. As you might guess, they don't use SQL but NoSQL. You get data in easily and quickly but out slowly and sometimes with complications.

So, to specifically answer your question, relational and non-relational are the two (only) big and official types (by big I mean serious mathematical differences in data processing). Thereby, SQL and NoSQL are the biggest query languages with vast differences. Document, graph, etc. are just forms of data structures that are often associated with NoSQL databases but they are not a seperate type of language or base! In the same way, the forms of databases (e.g. symmetrical, snowflake, tree, star, etc.) are only a way to describe their basic hierarchy or structure. They, too, don't form their own categories... Dataframes, data lakes and data vaults (ultimately data warehouses) consist of many databases and can be relational, non-relational or a mix of both!

I want to make clear, that it comes down to relational and non-relational. Especially with databases, I hear a lot of nonsense and people who differ between details, they mix up forms, shapes, languages, data base names and whatnot. Document, MongoDB or snowflake are neither languages nor mathematical models.

PS: I add some links in case you want to learn more.

https://www.oracle.com/database/what-is-a-relational-database/

https://www.pluralsight.com/blog/software-development/relational-vs-non-relational-databases

https://www.oracle.com/database/what-is-database.html

https://www.guru99.com/data-warehousing.html

Birecree answered 5/11, 2020 at 19:58 Comment(0)