Relationship between catalog, schema, user, and database instance
Asked Answered
G

4

74

To compare databases of different vendors (Oracle, SQL Server, DB2, MySQL, and PostgreSQL) how can I identify any object uniquely and do I need a catalog? For instance, In Java's DatabaseMetadata I should specify catalog and schema fooPattern at least.

Is it true that catalog is just an abstraction of data storage?

Gules answered 30/10, 2011 at 0:11 Comment(1)
#7023255Nadenenader
P
152

In Oracle:

  • server instance == database == catalog == all data managed by same execution engine
  • schema == namespace within database, identical to user account
  • user == schema owner == named account, identical to schema, who can connect to database, who owns the schema and use objects possibly in other schemas
  • to identify any object in running server, you need (schema name + object name)

In PostgreSQL:

  • server instance == db cluster == all data managed by same execution engine
  • database == catalog == single database within db cluster, isolated from other databases in same db cluster
  • schema == namespace within database, by default public is used
  • user == named account, who can connect to database, own and use objects in each allowed database separately
  • to identify any object in running server, you need (database name + schema name + object name)

In MySQL:

  • server instance == not identified with catalog, just a set of databases
  • database == schema == catalog == a namespace within the server.
  • user == named account, who can connect to server and use (but can not own - no concept of ownership) objects in one or more databases
  • to identify any object in running server, you need (database name + object name)

In Microsoft SQL Server:

  • server instance == set of managed databases
  • database == namespace qualifier within the server, rarely referred to as catalog
  • schema == owner == namespace within the database, tied to database roles, by default dbo is used
  • user == named account, who can connect to server and use (but can not own - schema works as owner) objects in one or more databases
  • to identify any object in running server, you need (database name + owner + object name)

So I think answer to your questions is:

  1. It depends on implementation, whether catalog name is needed to identify objects. The meaning of catalog, schema and database vary from one implementation to another.

  2. Yes, a catalog is an abstraction of data storage. I think it should be also defined as a self-contained isolated namespace, but not all SQL engines do it.

  3. Database and schema are pretty well defined by all vendors. Catalog is sometimes synonymous to "database" (at least in Oracle and Postgres), sometimes synonymous to "schema", and sometimes synonymous to both. The term catalog also often means metadata collection (aka system tables).

  4. Schema is what programmers should use to organize artifacts in SQL database as it represents a logical namespace with access control layer.

Popinjay answered 30/10, 2011 at 10:48 Comment(6)
Well, mostly. I would clarify Oracle a bit more. The "database" in Oracle parlance is actually the fileset which can be accessed by one or more "instances". This is how RAC works AFAIK, significantly different from other engines clustering. Also, given that most database engines can normally only access one "catalog" on a given connection, I'm not sure where catalog would be useful. In JDBC it's the schema field that identifies this level I think in all popular engines, certainly MySQL, PostgreSQL, Oracle and SQL Server. Each of these requires a different jdbc url for a different "catalog".Rejoice
For MySQL, when using JDBC, it seems to identify the schema as a catalog: TABLE_CAT:iserver, TABLE_SCHEM:null, TABLE_NAME:accountgroups, TABLE_TYPE:TABLE, REMARKS:. The TABLE_SCHEM field is blank. Based on this output, I would have expected what I see in the catalog field to be in the schema field. Is it possible that JDBC definition is off, or perhaps is MYSQL catalog the same as schema?Mooned
Yep I agree @SamGoldberg. Another evidence is this SHOW SCHEMAS is a synonym for SHOW DATABASES (dev.mysql.com/doc/refman/5.1/en/show-databases.html). Updated the answer above to reflect this.Balthazar
@Popinjay - is there any documentation/paper explaining what resulted to these differences and why they didn't decide to use the same terminology ? (eg. why calling getCatalog in MySql is returning schema/namespace and in Oracle is returning the Database)Sharanshard
@Victor, it's history. Database software was developed since 1970's. Industry standards were emerging slowly. Some decisions are extremely hard to revert / change :-)Popinjay
I have picturized this answer in the DBMS organization.pdf file on github.com/iwis/SQL-notes. The meaning of the used colors is described here. The details are in the DBMS organization - DB, schema, user, connecting, SET ROLE, referencing objects, synonym, tablespace.txt file.Eskill
G
5

For DB2, schema is used as namespaces. So if you want to uniquely identify an object in a database you would say *schema.object_name*. This is a very handy way to achieve multitenancy. You can have a separate schema for each tenant in your database. This provides for good separation of concerns from both security as well as management aspects. You can have 32K schemas in a single DB2 database.

A catalog in DB2 is simply a collection of system tables that contain metadata about the database. In general, it is considered a bad practice to access catalog objects directly. It is best to use the facilities provided by your API (e.g. JDBC) to explore the catalog and the metadata it contains.

DB2 also has other abstraction layers. You can have multiple instances of DB2 running on the same machine. Each instance can manage 256 separate databases (each with 32K schemas). The number of DB2 instances on a server is limited only by the amount of memory you have available. At one point in time we had 120 instances of DB2 (each with one database and 10 connections) running on Amazon EC2 m1.large. You can also have multiple installs of DB2 on a single server. it is useful when testing a new version you plan to migrate to. I do find it confusing though often forgetting to switch to the right install.

Garibold answered 2/11, 2011 at 1:56 Comment(3)
However there are considerations around transaction logs to be had when putting multiple instances on a machine, and with SQL Server, even multiple databases. One of the major benefits of having a separate transaction log is that it writes serially to the storage device, which with spindle based storage is much much faster than having to perform seeks between writes/reads. A background process gathers dirty database blocks and writes them to disk, optimizing the write order unless a checkpoint is forced, causing all dirty blocks to be flushed.Rejoice
This answer was about Db2 LUW, which has a separate transaction log path for each databaseTran
Also I don't believe there is a limit on the number of schemas in a database. I can't see why anyone would want so many, but I've just created 33000 schemas on my local Db2 LUW 11.1.3.3 database, and got no complaints from the DBMS. db2 -x "select count(*) from syscat.schemata" returned 33069Tran
L
2

I'm sharing my results in the point of client(driver)'s view.

  • product: result of the getProductName()
  • c_term: result of the getCataglogTerm()
  • s_term: result of the getSchemaTerm()
  • T_CAT: distinct values of TABLE_CAT from the result of getTables(null, null, "%", null)
  • T_SCHEM: distinct values of TABLE_SCHEM from the result of getTables(null, null, "%", null)
product c_term s_term T_CAT T_SCHEM
Apache Derby CATALOG SCHEMA <empty> SYS SYSIBM
H2 catalog schema TEST INFORMATION_SCHEMA
HSQL Database Engine CATALOG SCHEMA PUBLIC INFORMATION_SCHEMA, SYSTEM_LOBS
SQLite catalog schema null null
MySQL database <empty> performance_schema, information_schema null
MariaDB database schema information_schema null
PostgreSQL database schema pg_catalog, information_schema pg_toast
SQLServer database schema
Oracle XE schema
Oracle(Free) schema
Lawrence answered 27/1, 2023 at 12:42 Comment(3)
What client is it? Could you post code snippet which produces this data? Thanks!Popinjay
@Popinjay Those clients are just JDBC drivers.Lawrence
nice, very informative. It would be great if you could add Oracle JDBC driver to this tablePopinjay

© 2022 - 2025 — McMap. All rights reserved.