Could someone explain me the concept of PROJECTION in vertica database with an example query?
Vertica doesn't use indexes to find the data.
Conceptually, you still access tables using SQL. But underneath the hood, the data in the table are stored in projections, which you can optimize for different queries.
I like to think of it as a table representing a deck of cards. If playing poker you may still say something like
Select * from CardDeck limit 5;
Let's say you have a table defined with the following columns:
FaceValue int (let's just assume face values are ints),
Suit varchar(10)
I can then create my projections (I'm omitting details about partitioning, super-projections, buddy-projections, etc.).
create projection CardDeck_p1
(
FaceValue ENCODING RLE,
Suit
)
as
select FaceValue, Suit from CardDeck order by FaceValue;
create projection CardDeck_p2
(
FaceValue,
Suit
)
as
select FaceValue, Suit from CardDeck order by Suit;
Now, each column can get a different type of encoding which is defined in the projection. And the database designer, which I haven't used much since I've been on an older version, can help design the projections for you.
So getting back to the deck of card analogies, imagine you want to access a deck of cards but you want to have different shuffles of the cards. Projections in Vertica gives you the different shuffles. Tables are really a construct that allows you to access the data which is stored in projections. But if you are writing SQL, you access tables.
I want to emphasize the point made in geoff's answer -- projections are physical structures on disk. Defining multiple projections for a table can improve query performance, but at the cost of increased space on disk and slower load times (since your rows have to be placed into each projection).
There are super-projections which store all columns in a table as well as partial projections. You would use a partial projection when the query you're seeking to support/optimize only needs a subset of columns from the table. Each table needs at least one super-projection. If you don't define one, Vertica will provision a default one, which can have very poor performance.
Recommended practice is to have the Database Designer tool help you analyze your table with test data and test queries, after which it can suggest a projection for you. I've personally not had great results going this way, but knowing how to use the DBD tool should be part of the curriculum for anyone training in Vertica.
It seems like you are familiar with views. Projections are much like views in concept, they both cache something but in different levels. In a nutshell, views cache the query statements, while projections cache the query results.
Views cache the query statements. You give names to the predefined queries then invoke them thereafter. View queries are not executed when they are created. When you perform the queries using views, they won’t get any performance improvement since they are just ordinary queries.
Projections cache the query results. The projection queries are executed when you create them, and the results are persisted on storage. When you perform any query that can utilize the query result, Vertica will use those projections to respond to the query hence improve the query performance. After the projections are created, there is nothing special you need to do, Vertica will select the projections automatically if it can benefit the query. The projections can be used for the query because the query uses the subset columns of projections, have the same sort order, and etc.
Projections like views, you can select a subset of table columns, perform join with other tables, order by specific columns. However, different projections will occupy their own space to save the query result, the more projections are created, the more space they will be consumed. Projections will got updated automatically while the related source tables are updated. The update process are executed in the background and may takes a long while depending on the query complexity and data size. Therefore, projections are more suitable for many reads rather than many writes. From the use case perspective, projections are more suitable for reporting compared to real time web dashboard.
In the implementation detail, tables in Vertica are all logical. All data in tables are stored in each associated super projection. The super projection contains all columns in the table and is created automatically by default. All other projections are derived from the super projections.
Vertica will decide what projections will be used for the query, but you also can specify the name of the projections directly to force Vertica to use them:
-- List all projections
SELECT projection_name FROM projections;
-- Force to use super projection, _super is the suffix of the super projection
SELECT * FROM FACT_TABLE_super;
You can use the explain statement to see what projections are used in the query plan. This will help you to improve the performance of your query.
From the Concepts Guide.pdf (around page 23) of the Vertica Documentation.
Projections store data in a format that optimizes query execution. They are similar to materialized views in that they store result sets on disk rather than compute them each time they are used in a query.
also
Projections are transparent to end-users of SQL. The Vertica query optimizer automatically picks the best projections to use for any query.
All that needs to be done for a projection to improve query performance is to create the projection. Vertica will automatically select the best projection to use for that query. (Note: It is possible to force a specific projection by querying against it specifically instead of a table)
I don't know where your understanding of projections is at, but more specific questions about projections will allow greater elaboration on specific points. If you are looking to get a general view of concepts I'd recommend getting and reading the Concepts Guide.pdf. http://my.vertica.com
Views
in Vertica are, essentially, queries. (From the SQL Reference Manual.pdf) "[in] a query that contains a view, the view name is replaced by the view's defining query" –
Aroma © 2022 - 2024 — McMap. All rights reserved.