Gremlin - how do you merge vertices to combine their properties without listing the properties explicitly?
Asked Answered
T

3

7

Background: I'm trying to implement a time-series versioned DB using this approach, using gremlin (tinkerpop v3).

enter image description here

I want to get the latest state node (in red) for a given identity node (in blue) (linked by a 'state' edge which contains a timestamp range), but I want to return a single aggregated object which contains the id (cid) from the identity node and all the properties from the state node, but I don't want to have to list them explicitly. (8640000000000000 is my way of indicating no 'to' date - i.e. the edge is current - slightly different from the image shown).

I've got this far:

:> g.V().hasLabel('product').
     as('cid').
     outE('state').
     has('to', 8640000000000000).
     inV().
     as('name').
     as('price').
     select('cid', 'name','price').
     by('cid').
     by('name').
     by('price')

=>{cid=1, name="Cheese", price=2.50}
=>{cid=2, name="Ham", price=5.00}

but as you can see I have to list out the properties of the 'state' node - in the example above the name and price properties of a product. But this will apply to any domain object so I don't want to have to list the properties all the time. I could run a query before this to get the properties but I don't think I should need to run 2 queries, and have the overhead of 2 round trips. I've looked at 'aggregate', 'union', 'fold' etc but nothing seems to do this.

Any ideas?

===================

Edit: Based on Daniel's answer (which doesn't quite do what I want ATM) I'm going to use his example graph. In the 'modernGraph' people-create->software. If I run:

> g.V().hasLabel('person').valueMap()
==>[name:[marko], age:[29]]
==>[name:[vadas], age:[27]]
==>[name:[josh], age:[32]]
==>[name:[peter], age:[35]]

then the results are a list of entities's with the properties. What I want is, on the assumption that a person can only create one piece of software ever (although hopefully we will see how this could be opened up later for lists of software created), to include the created software 'language' property into the returned entity to get:

> <run some query here>
==>[name:[marko], age:[29], lang:[java]]
==>[name:[vadas], age:[27], lang:[java]]
==>[name:[josh], age:[32], lang:[java]]
==>[name:[peter], age:[35], lang:[java]]

At the moment the best suggestion so far comes up with the following:

> g.V().hasLabel('person').union(identity(), out("created")).valueMap().unfold().group().by {it.getKey()}.by {it.getValue()}
==>[name:[marko, lop, lop, lop, vadas, josh, ripple, peter], lang:[java, java, java, java], age:[29, 27, 32, 35]]

I hope that's clearer. If not please let me know.

Tman answered 10/2, 2017 at 17:16 Comment(0)
C
10

Since you didn't provide I sample graph, I'll use TinkerPop's toy graph to show how it's done.

Assume you want to merge marko and lop:

gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V(1).valueMap()
==>[name:[marko],age:[29]]
gremlin> g.V(1).out("created").valueMap()
==>[name:[lop],lang:[java]]

Note, that there are two name properties and in theory you won't be able to predict which name makes it into your merged result; however that doesn't seem to be an issue in your graph.

Get the properties for both vertices:

gremlin> g.V(1).union(identity(), out("created")).valueMap()
==>[name:[marko],age:[29]]
==>[name:[lop],lang:[java]]

Merge them:

gremlin> g.V(1).union(identity(), out("created")).valueMap().
           unfold().group().by(select(keys)).by(select(values))
==>[name:[lop],lang:[java],age:[29]]

UPDATE

Thank you for the added sample output. That makes it a lot easier to come up with a solution (although I think your output contains errors; vadas didn't create anything).

gremlin> g.V().hasLabel("person").
           filter(outE("created")).map(
             union(valueMap(),
                   outE("created").limit(1).inV().valueMap("lang")).
             unfold().group().by {it.getKey()}.by {it.getValue()})
==>[name:[marko], lang:[java], age:[29]]
==>[name:[josh], lang:[java], age:[32]]
==>[name:[peter], lang:[java], age:[35]]
Calceolaria answered 10/2, 2017 at 23:4 Comment(9)
Thanks for this! I think the identity() call was the one I had missed in the docs. Unfortunately though, I tried this on both my graph and the 'modern' graph and I get 'No such property: keys for class: groovysh_evaluate' in both. Any ideas? All queries up to the last one work out as per your answer above.Tman
Which TinkerPop version are you using? It's probably older than the one I've used for testing. IIRC older versions had .mapKeys() and .mapValues(), try to use those instead.Calceolaria
ah yes... Tinkerpop 3.0.1-incubating as I'm using titanDb. You were right about mapKeys() etc but it's not a straight replace as I get: gremlin> g.V(1).union(identity(), out("created")).valueMap().unfold().group().by(select(mapKeys())).by(select(mapValues())) No signature of method: static org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.__.select() is applicable for argument types: (org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal) values: [[MapKeysStep]]Tman
Also tried 'by(mapKey‌​s()).by(mapV‌​alues())' after reading the issue that lead to the origins of 'select', but no joy. I'm not clear on the types that are expected or returned by these operators and I'm not sure where to look - the docs don't seem to go to this level or detail, else I'm missing something. Any ideas? (Sorry for the hassle! :-( )Tman
Yea, the 3.0.1 implementation was really weak in this area. I can't find a way w/o lambdas. However, w/ lambdas it's: ....group().by {it.getKey()}.by {it.getValue()}.Calceolaria
Thanks. That's running but I'm not getting the expected results from the original question. If I change 'g.V(1)' for g.V().hasLabel('person') I get '==>[name:[marko, lop, lop, lop, vadas, josh, ripple, peter], lang:[java, java, java, java], age:[29, 27, 32, 35]]' which is a list of values against each property key. This isn't what I want. I'll update the original question to hopefully make this clearer.Tman
Thanks for your continued help on this... I'm not in a position to try this ATM (hopefully later this week or next). When I do I will definitely make it as an answer/accepted. It certainly looks promising! :-)Tman
I've finally managed to revisit this! (My apologies). When I run your edited code I get a list of lists for the properties: ==>[name:[[marko]],lang:[[java]],age:[[29]]] ==>[name:[[josh]],lang:[[java]],age:[[32]]] ==>[name:[[peter]],lang:[[java]],age:[[35]]] I added [0] to the last getValue() and it's all golden! :-) MANY THANKS!Tman
Not working for multiple columns. require local functionLemos
N
0

Merging edge and vertex properties using gremlin java DSL:

 g.V().has('User', 'id', userDbId).outE(Edges.TWEETS)
    .union(__.identity().valueMap(), __.inV().valueMap())
    .unfold().group().by(__.select(Column.keys)).by(__.select(Column.values))
    .map(v -> converter.toTweet((Map) v.get())).toList();
Nole answered 9/6, 2018 at 10:55 Comment(1)
will not work for multiple rows. required local step for not merging two rows.Lemos
L
0

Thanks for the answer by Daniel Kuppitz and youhans it has given me a basic idea on the solution of the issue. But later I found out that the solution is not working for multiple rows. It is required to have local step for handling multiple rows. The modified gremlin query will look like:

g.V()
.local(
        __.union(__.valueMap(), __.outE().inV().valueMap())
        .unfold().group().by(__.select(Column.keys)).by(__.select(Column.values))
)

    

This will limit the scope of union and group by to a single row.

If you can work with custom DSL ,create custom DSL with java like this one.

public default GraphTraversal<S, LinkedHashMap> unpackMaps(){
        GraphTraversal<S, LinkedHashMap> it = map(x -> {
            LinkedHashMap mapSource = (LinkedHashMap) x.get();
            LinkedHashMap mapDest = new LinkedHashMap();

            mapSource.keySet().stream().forEach(key->{

                Object obj = mapSource.get(key);
                if (obj instanceof LinkedHashMap) {

                    LinkedHashMap childMap = (LinkedHashMap) obj;
                    childMap.keySet().iterator().forEachRemaining( key_child ->
                            mapDest.put(key_child,childMap.get(key_child)
                            ));


                } else
                    mapDest.put(key,obj);

            });

            return mapDest;
        });
        return it;
    }

and use it freely like

g.V().as("s")

.valueMap().as("value_map_0")
.select("s").outE("INFO1").inV().valueMap().as("value_map_1")
.select("s").outE("INFO2").inV().valueMap().as("value_map_2")
.select("s").outE("INFO3").inV().valueMap().as("value_map_3")

.select("s").local(__.outE("INFO1").count()).as("value_1")
.select("s").outE("INFO1").inV().value("name").as("value_2")


.project("val_map1","val_map2","val_map3","val1","val2")
.by(__.select("value_map_1"))
.by(__.select("value_map_2"))
.by(__.select("value_1"))
.by(__.select("value_2"))
.unpackMaps()

results to rows with

 map1_val1, map1_val2,.... ,map2_va1, map2_val2....,value1, value2

This can handle mix of values and valueMaps in a natural gremlin way.

Lemos answered 12/8, 2022 at 22:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.