Relational algebra to count rows
Asked Answered
B

4

7

I wasn't sure quite what to call this problem but it's not exactly counting rows. Let's say we have the relation:

Competition(compId, sport, playerName, medal)

And let's say the attribute medal can be either gold, silver, bronze, or null. So we have the following data:

(193, Tennis, John Doe, Gold)
(931, Skiing, Mary White, Bronze)
(193, Tennis, Arnold Black, null)
(182, Bobsledding, John Doe, Gold)
(901, Ping-Pong, Adam Brown, Silver)
(248, Bobsledding, Mary White, Silver)

I am having a very hard time figuring out how to answer this question: Get the names of all players who have won more than one medal. In this data the answers would be John Doe and Mary White. How could I get that answer on arbitrary data for this relation using relational algebra?

(This is a simplified version of the actual homework problem, and this simplification represents (I hope) the part of that problem I'm struggling with. There are an arbitrary and unknown number of competitions, sports, and players, but only 4 possibilities for medal)

Barbiturism answered 11/9, 2013 at 23:16 Comment(4)
Don't think about counting rows. Think about the possibility of a self-join.Buckling
Would grouping work for you in this case? Aggregate on player names and then filter. For example, aggregation operation over a schema (A1, A2, ... An) will be: G1, G2, ..., Gm g f1(A1'), f2(A2'), ..., fk(Ak') (r) Reference: en.wikipedia.org/wiki/Relational_algebraHuihuie
Not relational algebra, but in SQL, it looks like this: SELECT playerName, count() FROM Competition GROUP BY playerName HAVING COUNT() > 1Thirtyeight
There's no single "relational algebra". You need to give the one you were told to use. There are different notions of σ, ⋈ and even "relation".Giga
G
2

Get the names of all players who have won more than one medal.

(It's not clear what this means. Have won than one kind of medal? Or have received more than one medal? Your example answer suggests the latter. Also, it treats "null" as just another kind of medal, not specially as in SQL.)

-- rows where
    THERE EXISTS compId,sport,medal,compId1,compId2,medal2 SUCH THAT
        in competition [compId] of sport [sport] player [playerName] won [medal]
    AND in competition [compId2] of sport [sport2] player [playerName] won [medal2]
    AND (compId <> compId2 OR sport <> sport2 OR medal <> medal2)

Using statement shorthand:

-- rows where
    THERE EXISTS compId,sport,medal,compId1,compId2,medal2 SUCH THAT
        Competition(compId, sport, playerName, medal)
    AND Competition(compId2, sport2, playerName, medal2)
    AND (compId <> compId2 OR sport <> sport2 OR medal <> medal2)

Rearranging (anticipating the limitations of one comparison per σ and one attribute set per ∪):

-- rows where
    THERE EXISTS compId,sport,medal,compId1,compId2,medal2 SUCH THAT
        (   Competition(compId, sport, playerName, medal)
        AND Competition(compId2, sport2, playerName, medal2)
        AND compId <> compId2)
    OR (   Competition(compId, sport, playerName, medal)
        AND Competition(compId2, sport2, playerName, medal2)
        AND sport <> sport2)
    OR (   Competition(compId, sport, playerName, medal)
        AND Competition(compId2, sport2, playerName, medal2)
        AND medal <> medal2)

Now to get the algebra replace:

  • every statement by its table/relation
  • every AND of table/relation by ⋈ (natural join)
  • every OR of table/relation (which must have the same columns/attributes) by ∪ (union)
  • every AND NOT (which must have the same columns/attributes) by \ (difference)
  • every AND comparison by σ comparison (select/restrict)
  • every EXISTS names to drop by π names to keep (projection)
  • every column/attribute renaming by ρ (rename).

    π playerName (
        σ compId <> compId2 (Competition
            ⋈ ρ compID2/compID ρ sport2/sport ρ medal2/medal Competition)
    ∪   σ sport <> sport2 (Competition
            ⋈ ρ compID2/compID ρ sport2/sport ρ medal2/medal Competition)
    ∪   σ medal <> medal2 (Competition
            ⋈ ρ compID2/compID ρ sport2/sport ρ medal2/medal Competition)
    )
    

(For more see this answer.)

Giga answered 22/9, 2014 at 23:7 Comment(7)
would you show your answer working in this tool? dbis-uibk.github.io/relax/calc.htmRamos
Why do you ask? Why don't you? Except relax is a certain RA & it is not the RA in my answer & it is not necessarily the RA in the question. PS My answer says that it treats null like a medal & a normal value--not as a mark of no medal & not as a value operated on specially as in SQL--because the question doesn't say it does. But I should probably write more about it. Nowadays I wouldn't answer this until the asker clarified.Giga
Look at what relax does supply. Go to its help link. It has my rho/rename column using the same greek letter rho with trivially different syntax. (The help is uncler re arrows, it's new<-old or old->new.) Anyway, I told you what my rho/rename does, so just find a subexpression that does that. Eg their pi/projection supports attribute renaming.Giga
I did do this however, there is this message -> could not find column "compId" in schema [Competition.compId2 : number, Competition.sport2 : string, Competition.playerName : string, Competition.medal2 : string]Ramos
There is that message on what input expression? PS My code should have selections of joins, not joins taking selections. I fixed it.Giga
π playerName ( ( Competition ⋈ σ compId <> compId2 ρ compId->compId2 ρ sport->sport2 ρ medal->medal2 Competition) ∪ ( Competition ⋈ σ sport <> sport2 ρ compId->compId2 ρ sport->sport2 ρ medal->medal2 Competition) ∪ ( Competition ⋈ σ medal <> medal2 ρ compId->compId2 ρ sport->sport2 ρ medal->medal2 Competition) )Ramos
Thanks. Your error was because my code was in error. PS When you get an error drop parts until there's no error then add back a part. Here the error is in a join taking a select: Competition ⋈ σ compId <> compId2 ρ compId->compId2 ρ sport->sport2 ρ medal->medal2 Competition has an error but not ρ compId->compId2 ρ sport->sport2 ρ medal->medal2 Competition Which was because my code was wrong--thanks. See the new code. PS Remember, my current answer treats 'null' like a medal. It returns names that appear in 2 rows. (But I'll edit to address null as a marker of no win.)Giga
C
0

There is a much simpler way to solve this problem, in my opinion:

enter image description here

Essentially, you find the relation of records where medal isn't null, and then join this record with itself, joining on name. The resulting records will be the ones where there are duplicate names.

Cheder answered 17/4, 2018 at 18:15 Comment(1)
The approach of a self-join is good but this code is wrong. You need a certain restriction of Result rows, ie your join condition is wrong. But anyway we don't know which "relational algebra" they mean. (You don't explain the approach, you just use words to try to repeat what the code says. Also "the ones where there are duplicate names" is not clear.) PS Please use text, not images/links, for text (including code, tables & ERDs). Use a link/image only for convenience to supplement text and/or for what cannot be given in text.Giga
R
0

I know this is a very old post, but I'm learning this subject, and I did find another way to represent what the exercise asks. Maybe it can be usefull for someone searching for this.

The answer:

π Lc.playerName (
    ρ Lc σ medal ≠ 'null' Competition
            ⨝ Lc.playerName = Rc.playerName AND Lc.sport ≠ Rc.sport
    ρ Rc σ medal ≠ 'null' Competition)

I'm assuming that nobody can have two medals in the same sport, if somone have more than one medal, it has to be in different sports. I'm also assuming that each tuple in the relation represents an awarding, so ...

The expression above manages to get the playerName that appears more than once for different sports.


Edited: I treated null... you can see it in action with this tool https://dbis-uibk.github.io/relax/calc.htm

to set the data, use this this code in the Group Editor tab

group:Competition

Competition = {
    compId:number, sport:string , playerName:string, medal:string
    193          , 'Tennis'     , 'John Doe'       , 'Gold'      
    931          , 'Skiing'     , 'Mary White'     , 'Bronze'    
    193          , 'Tennis'     , 'Arnold Black'   , 'null'      
    182          , 'Bobsledding', 'John Doe'       , 'Gold'      
    901          , 'Ping-Pong'  , 'Adam Brown'     , 'Silver'    
    248          , 'Bobsledding', 'Mary White'     , 'Silver'    
}
Ramos answered 3/5, 2019 at 18:25 Comment(4)
There's no need to make constraint assumptions, we can always query just knowing what the tables mean. (Not that the question clearly says.) Your assumption of one medal per sport per person has no basis & the example data has multiple competitions in the same sport, which suggests your assumption is not correct. Anyway all you need to do is get people that appear with more than one medal. PS You need to say what algebra you are using. Give name & edition & page, but summarize your operators & how null is treated, to make your post self-contained.Giga
You--like the asker--do not say what role (string) 'null' plays in what a row says when it is in the given table--please do. You say "each tuple in the relation represents an awarding". What does it mean if a null is present? Also you seem to use 'null' as something to do with a mark of no medal having been won. Also your added select does not seem to deal with that correctly--what if Lc.medal ≠ 'null' but Rc.medal = 'null'? PS You can push a select into a (generalized) theta-join. (This is after your edit to your code. Using the same assumption of one medal per person sport.)Giga
Yes, you are right, the condition was wrong, I oversight this. I edited my aswer. However you are enphasizing too much about this thing 'null'. The goal of the asker as mine is learning, so if null is a special kind or a type of medal, doesn't matter, as long as we learn how to solve a problem using RA. I personally think makes no sense to consider a null as a type of medal. However if you do, I don't think that will increase the complexity.Ramos
I am not overemphasizing. My issue is not re which interpretation of null you should take. I would agree that for an exercise it doesn't matter what particular thing we take the question base table rows to say. But to compose an answer you have to take them to mean some particular thing & to justify your answer you have to say what that particular thing was. PS The typical use of null would give Competition = rows where "competition [ci] of sport [s] had player [pn] and either they won medal [m] in it or they didn't win a medal in it & [m]='null'".Giga
P
0

I have a similar problem. I resolved thinking if I match every person with himself, and then filter only when the attribute is different, I get the people with more that one value for that attribute, in this case, medal.

Player1 = π compId,playerName,medal (ρ Player1 (Competition))
Player2 = π compId,playerName,medal (ρ Player2 (Competition))
π Player1.playerName (σ Player1.medal<>Player.medal (Player1 ⨝ Player1.compId=Player2.compId Player2))
Propylene answered 5/5, 2024 at 0:14 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.