PROJECTION in vertica database [closed]

Asked 18/4, 2012 at 14:41 Answered 13/5, 2017 at 9:14

Could someone explain me the concept of PROJECTION in vertica database with an example query?

Aimeeaimil answered 18/4, 2012 at 14:41 Comment(2)

There are some nice explanations of Vertica's projections at vertica.com blog and advises of how to use it: - Part #1 vertica.com/2011/09/01/the-power-of-projections-part-1 - Part #2 vertica.com/2011/09/02/the-power-of-projections-part-2 - Part #3 vertica.com/2011/09/06/the-power-of-projections-part-3 Power of Vertica Projections: ( from the archives ) Part #1 : web.archive.org/web/20130920020028/http://184.106.12.19:80/2011/… Part #2 : web.archive.org/web/20140828131329/http://184.106.12.19/2011/09/… – Donela 26/1, 2014 at 21:39

Great question that could help a lot of people. It is unfortunate that this answer has to be closed. – Grath 5/6, 2018 at 14:8

Vertica doesn't use indexes to find the data.

Conceptually, you still access tables using SQL. But underneath the hood, the data in the table are stored in projections, which you can optimize for different queries.

I like to think of it as a table representing a deck of cards. If playing poker you may still say something like

Select * from CardDeck limit 5;

Let's say you have a table defined with the following columns:

FaceValue int (let's just assume face values are ints),
Suit varchar(10)

I can then create my projections (I'm omitting details about partitioning, super-projections, buddy-projections, etc.).

create projection CardDeck_p1
(
FaceValue ENCODING RLE,
Suit
)
as 
select FaceValue, Suit from CardDeck order by FaceValue;

create projection CardDeck_p2
(
FaceValue,
Suit
)
as
select FaceValue, Suit from CardDeck order by Suit;

Now, each column can get a different type of encoding which is defined in the projection. And the database designer, which I haven't used much since I've been on an older version, can help design the projections for you.

So getting back to the deck of card analogies, imagine you want to access a deck of cards but you want to have different shuffles of the cards. Projections in Vertica gives you the different shuffles. Tables are really a construct that allows you to access the data which is stored in projections. But if you are writing SQL, you access tables.

Heise answered 18/4, 2012 at 21:44 Comment(0)

I want to emphasize the point made in geoff's answer -- projections are physical structures on disk. Defining multiple projections for a table can improve query performance, but at the cost of increased space on disk and slower load times (since your rows have to be placed into each projection).

There are super-projections which store all columns in a table as well as partial projections. You would use a partial projection when the query you're seeking to support/optimize only needs a subset of columns from the table. Each table needs at least one super-projection. If you don't define one, Vertica will provision a default one, which can have very poor performance.

Recommended practice is to have the Database Designer tool help you analyze your table with test data and test queries, after which it can suggest a projection for you. I've personally not had great results going this way, but knowing how to use the DBD tool should be part of the curriculum for anyone training in Vertica.

Si answered 19/4, 2012 at 5:54 Comment(0)

It seems like you are familiar with views. Projections are much like views in concept, they both cache something but in different levels. In a nutshell, views cache the query statements, while projections cache the query results.

Views cache the query statements. You give names to the predefined queries then invoke them thereafter. View queries are not executed when they are created. When you perform the queries using views, they won’t get any performance improvement since they are just ordinary queries.

Projections cache the query results. The projection queries are executed when you create them, and the results are persisted on storage. When you perform any query that can utilize the query result, Vertica will use those projections to respond to the query hence improve the query performance. After the projections are created, there is nothing special you need to do, Vertica will select the projections automatically if it can benefit the query. The projections can be used for the query because the query uses the subset columns of projections, have the same sort order, and etc.

Projections like views, you can select a subset of table columns, perform join with other tables, order by specific columns. However, different projections will occupy their own space to save the query result, the more projections are created, the more space they will be consumed. Projections will got updated automatically while the related source tables are updated. The update process are executed in the background and may takes a long while depending on the query complexity and data size. Therefore, projections are more suitable for many reads rather than many writes. From the use case perspective, projections are more suitable for reporting compared to real time web dashboard.

In the implementation detail, tables in Vertica are all logical. All data in tables are stored in each associated super projection. The super projection contains all columns in the table and is created automatically by default. All other projections are derived from the super projections.

Vertica will decide what projections will be used for the query, but you also can specify the name of the projections directly to force Vertica to use them:

-- List all projections
SELECT projection_name FROM projections;

-- Force to use super projection, _super is the suffix of the super projection
SELECT * FROM FACT_TABLE_super;

You can use the explain statement to see what projections are used in the query plan. This will help you to improve the performance of your query.

Expansile answered 13/5, 2017 at 9:14 Comment(0)

From the Concepts Guide.pdf (around page 23) of the Vertica Documentation.

Projections store data in a format that optimizes query execution. They are similar to materialized views in that they store result sets on disk rather than compute them each time they are used in a query.

also

Projections are transparent to end-users of SQL. The Vertica query optimizer automatically picks the best projections to use for any query.

All that needs to be done for a projection to improve query performance is to create the projection. Vertica will automatically select the best projection to use for that query. (Note: It is possible to force a specific projection by querying against it specifically instead of a table)

I don't know where your understanding of projections is at, but more specific questions about projections will allow greater elaboration on specific points. If you are looking to get a general view of concepts I'd recommend getting and reading the Concepts Guide.pdf. http://my.vertica.com

Aroma answered 18/4, 2012 at 15:45 Comment(3)

Thanks Nija. Your reply is very much useful. I have few more questions on this. My understanding is, PROJECTIONS are similar to VIEWS except that projections store result sets on disk whereas views compute the data from the table each time they are used in a query. Can I use VIEW as an alternate to PROJECTION? If so, will I get any performance issues? – Aimeeaimil 19/4, 2012 at 6:12

@Murali: Views in Vertica are, essentially, queries. (From the SQL Reference Manual.pdf) "[in] a query that contains a view, the view name is replaced by the view's defining query" – Aroma 19/4, 2012 at 16:58

Thank you. btw, I could not download the Concepts Guide.pdf from my.vertica.com. only the authorised user could download, it seems. I signed up, but yet to receive the activation mail. Is there any other way to get the .pdf guide? – Aimeeaimil 19/4, 2012 at 17:54

Recommended topics

Hot tags