maximum size of attributes on AWS SimpleDB
Asked Answered
F

5

10

I am in the process of building an mobile application (iPhone/Android) and want to store the application data onto Amazon's SimpleDB, because we do not want to host our own server to provide these services. I've been going through all of the documentation and the maximum storage size of element values is 1024 bytes.

In my case we need to store 1024 up to 10K of text data.

I was hoping to find out how other projects are using SimpleDB when they have larger storage needs like our project. I read that one could store pointers to files that are then stored in S3 (file system). Not sure if that is a good solution.

In my mind I am not sure if SimpleDB is the correct solution. Could anyone comment on what that have done or provide a different way to think about this problem?

Footmark answered 11/6, 2009 at 12:18 Comment(4)
What are you requirements for retrieving the data? Do you have to search on it, seperate it by fields, etc.?Ultramicrochemistry
I just need to display the text data. I plan on tagging this data so it could be queried against and display the text that is greater than 1024 bytes to the user. I guess I would have city/state/description information and one would query against the city and state and I would display the description to the user.Footmark
This sounds like a great use for SimpleDB. You just need to add a routine to split the text up when you store the item, and another to put it back together from your select results. "SELECT desc FROM Domain001 where city = ? INTERSECTION state = ?"Iberia
possible duplicate of Amazon SimpleDBInstructions
I
14

There are ways to store your 10k text data but whether it will be acceptable will depend on what else you need to store and how you plan to use it.

If you need to store arbitrarily large data (especially binary data) then the S3 file pointer can be attractive. The value that SimpleDB adds in this scenario is the ability to run queries against the file metadata that you store in SimpleDB.

For text data limited to 10k I would recommend storing it directly in SimpleDB. It will easily fit in a single item but you'll have to spread it across multiple attributes. There are basically two ways to do this each with some draw backs.

One way is more flexible and search friendly but requires you to touch your data. You split your data up into chunks of about 1000 bytes and you store each chunk as an attribute value in a multi-valued attribute. There is no ordering imposed on multi-valued attributes so you have to prepend each chunk with a number for ordering (e.g. 01)

The fact that you have all the text stored in one attribute makes queries easy to do with a single attribute name in the predicate. You can add a different size text to each item anywhere from 1k to 200+k and it gets handled appropriately. But you do have to be aware that your prepended line numbers can pop positive for your queries (e.g. if you are searching for 01 every item will match that query).

The second way to store the text within SimpleDB does not require you to place arbitrary ordering data within your text chunks. You do the ordering by placing each text chunk in a different named attribute. For example you could use attribute names: desc01 desc02 ... desc10. Then you place each chunk in the appropriate attribute. You can still do full text search with both methods but the searches will be slower with this method because you will need to specify many predicates and SimpleDB will end up searching through a separate index for each attribute.

It may be easy to think of this type of work around as a hack because with databases we are used to having this type of low level detail handled for us within the database. SimpleDB is specifically designed to push this sort of thing out of the database and into the client as a means of providing availability as a first class feature.

If you found out that a relational database was splitting your text into 1k chunks to store on disk as an implementation detail it wouldn't seem like a hack. The problem is that the current state of SimpleDB clients is such that you have to implement a lot of this type of data formatting yourself. This is the type of thing that ideally will be handled for you in a smart client. There just aren't any smart clients freely available yet.

Iberia answered 11/6, 2009 at 13:49 Comment(2)
Had a nice little answer written up and was about to submit when Mocky posted this one. Great summation, I agree completely with it. Given the speed and pricing of SimpleDB it's definitely worth a shot. Especially when you start to realize that the limitations of a traditional DB no longer apply.Ultramicrochemistry
Yes great answer, thank you for that. Breaking up the data will require a lot more thought and work on my part, but I feel it will be easier than hosting a database and server. thank you.Footmark
W
1

If you are concerned about cost, you might find that it is cheaper to put the text in S3 and metadata with pointers in SimpleDB.

Wigwag answered 12/6, 2009 at 17:49 Comment(1)
This is the technique I am looking to use. Good for start-ups.Bradfordbradlee
S
1

You could put the 10k text on S3, then create an attribute that has all the unique words of the 10k of text as multiple values. Then searches would be fast. No phrase searching, though.

How many values can you store in one attribute in one 'row' (name)? I looked in the docs, no answer popped out at me.

--Tom

Starflower answered 12/2, 2010 at 22:23 Comment(1)
Ok - I figured it out. To do word only searching on simpleDB, create a set of all unique words (lowercased) and load as many words as will fit into 1024 bytes per attribute. for 10k of typical english text that might amount to 3 or 4 attributes. Then store the actual text in s3, with the key stored in simpleDB. You get 256 attribute - value pairs per item with simpleDB.Starflower
N
0

The upcoming release of Simple Savant (a C# persistence library for SimpleDB which I created) will support both attribute spanning as described by Mocky and full-text searches of SimpleDB data using Lucene.NET.

I realize you are probably not building your app in C#, but since your question is a top result when searching for SimpleDB and full-text indexing it seemed worth mentioning.

UPDATE: The Simple Savant release I mentioned above is now available.

Noetic answered 29/1, 2010 at 16:23 Comment(1)
That is perfect this is what I need because managing in my own code I did not want to do.Footmark
T
0

SimpleDb is, well, simple. Everything in it is a string. The documentation is very straight-forward. And there are lots of usage restricts. Such as:

  • You can only do a SELECT * FROM ___ WHERE ItemName() IN (...) with 20 ItemNames in the IN.
  • You can only PUT (update) to 25 records at a time.
  • All reads are based on computation time. So if you do a SELECT with a LIMIT of 1000 it may return something like 800 (or even nothing) along with a nextToken in which you need to make an additional request (with the nextToken). This means that the next SELECT may actually return the limit count, so the sum of returned rows from the two SELECTs may be greater than your original limit. This is a concern if you are selecting a lot. Also, if you do a SELECT COUNT(*) you will hit a similar problem. It will return you a count, along with a nextToken. And you need to keep iterating over those nextTokens and sum the returning counts to get the true (total) count.
  • All of these computation times will be largely affected by larger data in the store.
  • If you end up having a large number of records you will likely have to shard your records across multiple domains
  • Amazon will throttle your requests if you make too many on a single domain

So, if you plan to use large amounts of string-data, or have a lot of records, then you may want to look elsewhere. SimpleDb is very very reliable, and works as documented, but it can cause lots of headaches.

In your case I'd recommend something like MongoDb. It has its own share of problems as well, but may be better for this case. Though, if you have lots of records (millions and upward) and then try to add indexes to too many records you may break it if it's on spindels and not SSDs.

Tights answered 10/3, 2012 at 1:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.