perl6 How to give more memory to MoarVM?
Asked Answered
B

2

6

I have to run data analysis on about 2 million lines of data and each line about 250 bytes long. So total about 500 megabytes of data. I am running latest Rakudo on Virtualbox Linux with 4G memory.

After about 8 hours, I got MoarVM panic due to running out of memory. How do I give more memory to MoarVM? Unfortunately I cannot break up the 2 millions into chunks and write to a files first because part of the data analysis requires the whole 2-m lines.

Brister answered 3/8, 2018 at 16:49 Comment(3)
It might be worth tracking the mem-usage and see if there is some part of the code that is using memory excessively.Heave
@Brister I'm real curious to hear what happened...Riccardo
Hi raiph, thank you for your help. I was reading about profiler, but not yet proficient at it. In the interest of time to get things done, what I ended up doing was to isolate the one data array that required some info from all the 2M lines and handled it first; the I wrote the 2M lines to disk and read from the file. It is working so far. I am still reading the profiler. Thanks !!!Brister
R
4

I suggest you tackle your problem in several steps:

  • Prepare two small sample files if you haven't already. Keep them very small. I suggest a 2,000 lines long file and a 20,000 line long one. If you already have some sample files of around that length then those will do. Run your program for each file, noting how long each took and how much memory was used.

  • Update your question with your notes about duration and RAM use; plus links to your source code if that's possible and the sample files if that's possible.

  • Run the two sample files again but using the profiler as explained here. See what there is to see and update your question.

If you don't know how to do any of these things, ask in the comments.

If all the above is fairly easy, repeat for a 100,000 line file.

Then we should have enough data to give you better guidance.

Riccardo answered 3/8, 2018 at 23:27 Comment(5)
Thank you very much raiph ! My codes work without bugs for 5, 100, 200K lines of data, but Moar panicked at the whole 2M lines. I guess it is time to re-code. Thanks !Brister
"My codes work without bugs" I already assumed that. "I guess it is time to re-code." NOOOO!!! Focus on the ONE problem you have. Which is OOM!!! Don't CREATE BUGS by changing WORKING code when BLIND. Instead you MUST gather and share DATA about your code's MEMORY USE. If you don't know how YOU MUST ASK QUESTIONS to find out. You must LEARN how to use the PROFILER. You need to change just a TINY PART of your code. But WHICH PART? Never ever GUESS!!! Instead YOU MUST PROFILE until you KNOW WHERE THE PROBLEM LIES FIRST. Only THEN do you / we pay attention to jnthn's excellent tips.Riccardo
"My codes work without bugs for 5, 100, 200K lines of data". How much RAM was used when you ran the 5K file? And the 100K file? Please update your answer to say so. (And while you're about it, how long each took too.) You will have much more fun, learn more, and fix your problem much faster if you focus first on jnthn's swap space configuration tip and/or on gathering and sharing (in your question) data about how your existing code runs, including learning to use the built in profiler as explained in the link in my answer, than if instead you look at your code and fiddle with it.Riccardo
Thank you raiph !!! Profiler is completely new to me, and I am starting to read the documentations now. Let me experiment with it. Thanks !!!Brister
Good to hear but what about the earlier steps? Imo your goals, in order, should be something like 1) Know what memory your program is using overall for 5K run and 100K run; 2) let us know, and times too; 3) configure VirtualBox to up your swap space (which adds to the 4GB); 4) learn the profiler; 5) let us know some basic stats about what you see for the 5K and 100K runs under the profiler.Riccardo
R
6

MoarVM doesn't have its own upper limit on memory (unlike, for example, the JVM). Rather, it gives an "out of memory" or "memory allocation failed" error only when memory is requested from the operating system and that request is refused. That may be because of configured memory limits, or it may really be that there just isn't that much available RAM/swap space to satisfy the request that was made (likely if you haven't configured limits).

It's hard to provide specific advice on what to try next given there's few details of the program in the question, but some things that might help are:

  • If you are processing the data in the file into some other data structure, and it's possible to do so, read the file lazily (for example, for $fh.lines { ... } will only need to keep the Str for the line currently being processed in memory, while my @lines = $fh.lines; for @lines { } will keep all of the Str objects around).
  • Is that data in the file ASCII or Latin-1? If so, pass an :enc<ascii> or similar when opening the file. This may lead to a smaller memory representation.
  • If keeping large arrays of integers, numbers, or strings, consider using natively typed arrays. For example, if you have my int8 @a and store a million elements then it takes 1 MB of memory; do that with my @a and they will all be boxed objects inside of a Scalar container, which on a 64-bit machine that could eat over 70MB. Similar applies if you have an object that you make many instances of, and might be able to make some of the attributes native.
Rinaldo answered 4/8, 2018 at 16:18 Comment(2)
Thank you very much Jonathan Worthington ! I have arrays and hashes of these 2M lines. That may be some codes to revise. Thanks.Brister
If by "codes to revise" you mean YOUR code, code that WORKS, then I urge you to step back and think again. Yes, there's a small chance you need to change your code. But there's a bigger chance "there just isn't that much available RAM/swap space to satisfy the request that was made" which is "likely if you haven't configured limits". Have you configured limits? That would be the first thing to do of jnthn's suggestions. If you don't know how, you must ask. Don't change your code unless you have to!!! And if you know you've maxed out your RAM setup, read my LOUD!!! comment on my answer. :)Riccardo
R
4

I suggest you tackle your problem in several steps:

  • Prepare two small sample files if you haven't already. Keep them very small. I suggest a 2,000 lines long file and a 20,000 line long one. If you already have some sample files of around that length then those will do. Run your program for each file, noting how long each took and how much memory was used.

  • Update your question with your notes about duration and RAM use; plus links to your source code if that's possible and the sample files if that's possible.

  • Run the two sample files again but using the profiler as explained here. See what there is to see and update your question.

If you don't know how to do any of these things, ask in the comments.

If all the above is fairly easy, repeat for a 100,000 line file.

Then we should have enough data to give you better guidance.

Riccardo answered 3/8, 2018 at 23:27 Comment(5)
Thank you very much raiph ! My codes work without bugs for 5, 100, 200K lines of data, but Moar panicked at the whole 2M lines. I guess it is time to re-code. Thanks !Brister
"My codes work without bugs" I already assumed that. "I guess it is time to re-code." NOOOO!!! Focus on the ONE problem you have. Which is OOM!!! Don't CREATE BUGS by changing WORKING code when BLIND. Instead you MUST gather and share DATA about your code's MEMORY USE. If you don't know how YOU MUST ASK QUESTIONS to find out. You must LEARN how to use the PROFILER. You need to change just a TINY PART of your code. But WHICH PART? Never ever GUESS!!! Instead YOU MUST PROFILE until you KNOW WHERE THE PROBLEM LIES FIRST. Only THEN do you / we pay attention to jnthn's excellent tips.Riccardo
"My codes work without bugs for 5, 100, 200K lines of data". How much RAM was used when you ran the 5K file? And the 100K file? Please update your answer to say so. (And while you're about it, how long each took too.) You will have much more fun, learn more, and fix your problem much faster if you focus first on jnthn's swap space configuration tip and/or on gathering and sharing (in your question) data about how your existing code runs, including learning to use the built in profiler as explained in the link in my answer, than if instead you look at your code and fiddle with it.Riccardo
Thank you raiph !!! Profiler is completely new to me, and I am starting to read the documentations now. Let me experiment with it. Thanks !!!Brister
Good to hear but what about the earlier steps? Imo your goals, in order, should be something like 1) Know what memory your program is using overall for 5K run and 100K run; 2) let us know, and times too; 3) configure VirtualBox to up your swap space (which adds to the 4GB); 4) learn the profiler; 5) let us know some basic stats about what you see for the 5K and 100K runs under the profiler.Riccardo

© 2022 - 2024 — McMap. All rights reserved.