For argument's sake, let's say that a browser allows 4GB of memory in WebAssembly applications. Ignoring compression and other data-storage considerations, if a user had a 3GB local csv file, we could query that data entirely in-memory using webassembly (or javascript, of course). For example, if the user's data was of the following format:
ID | Country | Amount |
---|---|---|
1 | US | 12 |
2 | GB | 11 |
3 | DE | 7 |
Then in a few lines of code we could do a basic algorithm to filter to ID=2
, i.e., the SQL equivalent of SELECT * FROM table WHERE id=2
.
Now, my question is whether it's possible in any browser (and possibly with experimental flags and/or certain user preferences selected) such that a query could be done on a file that would not fit in memory, even if properly compressed. For example, in this blog post, a ~500GB file is loaded and then queried. I know that the 500GB of data is not loaded entirely in memory, and there's probably a column-oriented data structure so that only certain columns need to be read, but either way the OS has access to the file system and so files much larger than available memory can be used.
Is this possible to do in any way within a webassembly browser application? If so, what would be an outline of how it could be done? I know this question might require some research, so when it's available for a bounty I can add a 500-point bounty to it to encourage answers. (Note that the underlying language being used is C++-compiled-to-wasm, but I don't think that should matter for this question.)
I suppose one possibility might be along the lines of something like: https://rreverser.com/webassembly-shell-with-a-real-filesystem-access-in-a-browser/.
I know that the 500GB of data is not loaded entirely in memory, and there's probably a column-oriented data structure so that only certain columns need to be read,
- As you don't need to read the entire file, what exactly is the problem ? – PoultererFileReader
andBlob
. Since those APIs are asynchronous, you are also going to need to export/import a bunch of thePromise
API. In the end, you'll be doing a lot of marshalling work for something that is much simpler to do in JS. If WASM had its own file API, things might be different... – AutoionizationFileReader
from WASM (via Rust). – Autoionization