Elasticsearch relationship mappings (one to one and one to many)
Asked Answered
U

1

38

In my elastic search server I have one index http://localhost:9200/blog.
The (blog) index contains multiple types.

e.g.: http://localhost:9200/blog/posts, http://localhost:9200/blog/tags.

In the tags type I have created more than 1000 tags and 10 posts created in posts type.

e.g.: posts

{   
    "_index":"blog",
    "_type":"posts",
    "_id":"1",
    "_version":3,
    "found":true,
    "_source" : {
        "catalogId" : "1",
       "name" : "cricket",
       "url" : "http://www.wikipedia/cricket"
    }
}

e.g.: tags

{   
    "_index":"blog",
    "_type":"tags",
    "_id":"1",
    "_version":3,
    "found":true,
    "_source" : {
        "tagId" : "1",
        "name" : "game"
    }
}

I want to assign the existing tag to blog posts (i.e. relationship => mapping).

How do I assign the tags to posts mapping?

Unlade answered 1/5, 2014 at 6:31 Comment(0)
P
83

There are 4 approaches that you can use within Elasticsearch for managing relationships. They are very well outlined in the Elasticsearch blog post - Managing Relations Inside Elasticsearch I would recommend reading the entire article to get more details on each approach and then select that approach that best meets your business needs while remaining technically appropriate.

Here are the highlights for the 4 approaches.

Inner Object

  • Easy, fast, performant
  • Only applicable when one-to-one relationships are maintained
  • No need for special queries

Nested

  • Nested docs are stored in the same Lucene block as each other, which helps read/query performance. Reading a nested doc is faster than the equivalent parent/child.
  • Updating a single field in a nested document (parent or nested children) forces ES to reindex the entire nested document. This can be very expensive for large nested docs
  • “Cross referencing” nested documents is impossible
  • Best suited for data that does not change frequently

Parent/Child

  • Children are stored separately from the parent, but are routed to the same shard. So parent/children are slightly less performance on read/query than nested
  • Parent/child mappings have a bit extra memory overhead, since ES maintains a “join” list in memory
  • Updating a child doc does not affect the parent or any other children, which can potentially save a lot of indexing on large docs
  • Sorting/scoring can be difficult with Parent/Child since the Has Child/Has Parent operations can be opaque at times

Denormalization

  • You get to manage all the relations yourself!
  • Most flexible, most administrative overhead
  • May be more or less performant depending on your setup
Patrolman answered 1/5, 2014 at 12:8 Comment(4)
Nice concise listing of pros and cons.Stralsund
Saying parent-child is "slightly" less performant for querying than nested could be misleading. From what we've seen, it's 1-2 orders of magnitude worse, depending on the index size.Introduce
what is the different between Denormalization and inner-object?Axis
The inner-object approach means that you store your parent document once and add its children into an array within that document. Denormalization means that you only have "children" and save parent data with each of themLewison

© 2022 - 2024 — McMap. All rights reserved.