Self-restarting MathKernel - is it possible in Mathematica?

Asked 23/10, 2011 at 6:7 Answered 10/3, 2016 at 16:52

Solved wolfram-mathematica mathematica-frontend

This question comes from the recent question "Correct way to cap Mathematica memory use?"

I wonder, is it possible to programmatically restart MathKernel keeping the current FrontEnd process connected to new MathKernel process and evaluating some code in new MathKernel session? I mean a "transparent" restart which allows a user to continue working with the FrontEnd while having new fresh MathKernel process with some code from the previous kernel evaluated/evaluating in it?

The motivation for the question is to have a way to automatize restarting of MathKernel when it takes too much memory without breaking the computation. In other words, the computation should be automatically continued in new MathKernel process without interaction with the user (but keeping the ability for user to interact with the Mathematica as it was originally). The details on what code should be evaluated in new kernel are of course specific for each computational task. I am looking for a general solution how to automatically continue the computation.

Overtrick answered 23/10, 2011 at 6:7 Comment(3)

Probably one possible way to solve the problem includes programmatic launching new kernel from the FrontEnd, then evaluating code in this new kernel and then closing the old kernel keeping the new kernel running. – Overtrick 23/10, 2011 at 6:44

How about driving a kernel (call it B) from another (call it A), and using A as a supervisor? Of course that requires reorganization of the code. But surely you've thought about that and discarded the approach? – Lowspirited 23/10, 2011 at 11:48

@Lowspirited This idea was the first (and the only) I have tried. And I already have implemented such functionality through pure MathLink - but my way really very hard, extremely non-elegant, relies on great number of undocumented features which are potentially version-specific. And the code is really huge! I hope that new ScheduledTasks functionality can give much more elegant way to solve this problem. – Overtrick 23/10, 2011 at 15:21

From a comment by Arnoud Buzing yesterday, on Stack Exchange Mathematica chat, quoting entirely:

In a notebook, if you have multiple cells you can put Quit in a cell by itself and set this option:

SetOptions[$FrontEnd, "ClearEvaluationQueueOnKernelQuit" -> False]

Then if you have a cell above it and below it and select all three and evaluate, the kernel will Quit but the frontend evaluation queue will continue (and restart the kernel for the last cell).

-- Arnoud Buzing

Delanos answered 6/12, 2012 at 9:29 Comment(0)

The following approach runs one kernel to open a front-end with its own kernel, which is then closed and reopened, renewing the second kernel.

This file is the MathKernel input, C:\Temp\test4.m

Needs["JLink`"];
$FrontEndLaunchCommand="Mathematica.exe";
UseFrontEnd[
nb = NotebookOpen["C:\\Temp\\run.nb"];
SelectionMove[nb, Next, Cell];
SelectionEvaluate[nb];
];
Pause[8];
CloseFrontEnd[];
Pause[1];
UseFrontEnd[
nb = NotebookOpen["C:\\Temp\\run.nb"];
Do[SelectionMove[nb, Next, Cell],{12}];
SelectionEvaluate[nb];
];
Pause[8];
CloseFrontEnd[];
Print["Completed"]

The demo notebook, C:\Temp\run.nb contains two cells:

x1 = 0;
Module[{}, 
 While[x1 < 1000000, 
  If[Mod[x1, 100000] == 0, Print["x1=" <> ToString[x1]]]; x1++];
 NotebookSave[EvaluationNotebook[]];
 NotebookClose[EvaluationNotebook[]]]

Print[x1]
x1 = 0;
Module[{}, 
 While[x1 < 1000000, 
  If[Mod[x1, 100000] == 0, Print["x1=" <> ToString[x1]]]; x1++];
 NotebookSave[EvaluationNotebook[]];
 NotebookClose[EvaluationNotebook[]]]

The initial kernel opens a front-end and runs the first cell, then it quits the front-end, reopens it and runs the second cell.

The whole thing can be run either by pasting (in one go) the MathKernel input into a kernel session, or it can be run from a batch file, e.g. C:\Temp\RunTest2.bat

@echo off
setlocal
PATH = C:\Program Files\Wolfram Research\Mathematica\8.0\;%PATH%
echo Launching MathKernel %TIME%
start MathKernel -noprompt -initfile "C:\Temp\test4.m"
ping localhost -n 30 > nul
echo Terminating MathKernel %TIME%
taskkill /F /FI "IMAGENAME eq MathKernel.exe" > nul
endlocal

It's a little elaborate to set up, and in its current form it depends on knowing how long to wait before closing and restarting the second kernel.

Delanos answered 23/10, 2011 at 12:12 Comment(1)

+1 Awesome! But I feel that there should be simpler way through not-well documented FrontEnd tokens which available through the "Evaluation" menu in FrontEnd: "Start kernel" (allows to start a new kernel which is already configured in the "Kernel configuration options" dialog), "Default kernel" (allows to set another default kernel for the current Notebook). The real problem is how to configure a new kernel using the "Kernel configuration options" dialog programmatically and how to evaluate in new kernel some code from the original kernel. – Overtrick 23/10, 2011 at 15:11

Perhaps the parallel computation machinery could be used for this? Here is a crude set-up that illustrates the idea:

Needs["SubKernels`LocalKernels`"]

doSomeWork[input_] := {$KernelID, Length[input], RandomReal[]}

getTheJobDone[] :=
  Module[{subkernel, initsub, resultSoFar = {}}
  , initsub[] :=
      ( subkernel = LaunchKernels[LocalMachine[1]]
      ; DistributeDefinitions["Global`"]
      )
  ; initsub[]
  ; While[Length[resultSoFar] < 1000
    , DistributeDefinitions[resultSoFar]
    ; Quiet[ParallelEvaluate[doSomeWork[resultSoFar], subkernel]] /.
        { $Failed :> (Print@"Ouch!"; initsub[])
        , r_ :> AppendTo[resultSoFar, r]
        }
    ]
  ; CloseKernels[subkernel]
  ; resultSoFar
  ]

This is an over-elaborate setup to generate a list of 1,000 triples of numbers. getTheJobDone runs a loop that continues until the result list contains the desired number of elements. Each iteration of the loop is evaluated in a subkernel. If the subkernel evaluation fails, the subkernel is relaunched. Otherwise, its return value is added to the result list.

To try this out, evaluate:

getTheJobDone[]

To demonstrate the recovery mechanism, open the Parallel Kernel Status window and kill the subkernel from time-to-time. getTheJobDone will feel the pain and print Ouch! whenever the subkernel dies. However, the overall job continues and the final result is returned.

The error-handling here is very crude and would likely need to be bolstered in a real application. Also, I have not investigated whether really serious error conditions in the subkernels (like running out of memory) would have an adverse effect on the main kernel. If so, then perhaps subkernels could kill themselves if MemoryInUse[] exceeded a predetermined threshold.

Update - Isolating the Main Kernel From Subkernel Crashes

While playing around with this framework, I discovered that any use of shared variables between the main kernel and subkernel rendered Mathematica unstable should the subkernel crash. This includes the use of DistributeDefinitions[resultSoFar] as shown above, and also explicit shared variables using SetSharedVariable.

To work around this problem, I transmitted the resultSoFar through a file. This eliminated the synchronization between the two kernels with the net result that the main kernel remained blissfully unaware of a subkernel crash. It also had the nice side-effect of retaining the intermediate results in the event of a main kernel crash as well. Of course, it also makes the subkernel calls quite a bit slower. But that might not be a problem if each call to the subkernel performs a significant amount of work.

Here are the revised definitions:

Needs["SubKernels`LocalKernels`"]

doSomeWork[] := {$KernelID, Length[Get[$resultFile]], RandomReal[]}

$resultFile = "/some/place/results.dat";

getTheJobDone[] :=
  Module[{subkernel, initsub, resultSoFar = {}}
  , initsub[] :=
      ( subkernel = LaunchKernels[LocalMachine[1]]
      ; DistributeDefinitions["Global`"]
      )
  ; initsub[]
  ; While[Length[resultSoFar] < 1000
    , Put[resultSoFar, $resultFile]
    ; Quiet[ParallelEvaluate[doSomeWork[], subkernel]] /.
        { $Failed :> (Print@"Ouch!"; CloseKernels[subkernel]; initsub[])
        , r_ :> AppendTo[resultSoFar, r]
        }
    ]
  ; CloseKernels[subkernel]
  ; resultSoFar
  ]

Roid answered 29/10, 2011 at 6:12 Comment(0)

I have a similar requirement when I run a CUDAFunction for a long loop and CUDALink ran out of memory (similar here: https://mathematica.stackexchange.com/questions/31412/cudalink-ran-out-of-available-memory). There's no improvement on the memory leak even with the latest Mathematica 10.4 version. I figure out a workaround here and hope that you may find it's useful. The idea is that you use a bash script to call a Mathematica program (run in batch mode) multiple times with passing parameters from the bash script. Here is the detail instruction and demo (This is for Window OS):

To use bash-script in Win_OS you need to install cygwin (https://cygwin.com/install.html).
Convert your mathematica notebook to package (.m) to be able to use in script mode. If you save your notebook using "Save as.." all the command will be converted to comments (this was noted by Wolfram Research), so it's better that you create a package (File->New-Package), then copy and paste your commands to that.
Write the bash script using Vi editor (instead of Notepad or gedit for window) to avoid the problem of "\r" (http://www.linuxquestions.org/questions/programming-9/shell-scripts-in-windows-cygwin-607659/).

Here is a demo of the test.m file

str=$CommandLine;
len=Length[str];
Do[
If[str[[i]]=="-start",
start=ToExpression[str[[i+1]]];
Pause[start];
Print["Done in ",start," second"];
];
,{i,2,len-1}];

This mathematica code read the parameter from a commandline and use it for calculation. Here is the bash script (script.sh) to run test.m many times with different parameters.

#c:\cygwin64\bin\bash
for ((i=2;i<10;i+=2))
do
math -script test.m -start $i
done

In the cygwin terminal type "chmod a+x script.sh" to enable the script then you can run it by typing "./script.sh".

Preordain answered 10/3, 2016 at 16:52 Comment(1)

This workaround has an advantage over Chris Degnen's way is that you don't have to deal with the wait time. – Preordain 10/3, 2016 at 17:6

You can programmatically terminate the kernel using Exit[]. The front end (notebook) will automatically start a new kernel when you next try to evaluate an expression.

Preserving "some code from the previous kernel" is going to be more difficult. You have to decide what you want to preserve. If you think you want to preserve everything, then there's no point in restarting the kernel. If you know what definitions you want to save, you can use DumpSave to write them to a file before terminating the kernel, and then use << to load that file into the new kernel.

On the other hand, if you know what definitions are taking up too much memory, you can use Unset, Clear, ClearAll, or Remove to remove those definitions. You can also set $HistoryLength to something smaller than Infinity (the default) if that's where your memory is going.

Lycia answered 23/10, 2011 at 6:29 Comment(2)

The point is to continue computation starting from some point without interacting with the user. – Overtrick 23/10, 2011 at 6:31

@AlexeyPopkov, I think Rob's solution of DumpSave and Exit could be implemented programmatically, but the difficult part will be the automatic selection of what definitions to save. You could extend Set to add a specified definitions to a list that will be saved. Maybe something similar to the technique used in MakeCheckedReader. – Osteopathy 23/10, 2011 at 16:51

Sounds like a job for CleanSlate.

<< Utilities`CleanSlate`;
CleanSlate[]

From: http://library.wolfram.com/infocenter/TechNotes/4718/

"CleanSlate, tries to do everything possible to return the kernel to the state it was in when the CleanSlate.m package was initially loaded."

Delanos answered 23/10, 2011 at 9:45 Comment(1)

CleanSlate is a very simple and old utility. It relies on Remove and cannot even protect from the memory leaks due to internal caching. Obviously CleanSlate is not a way to deal with memory leaks. The only reliable approach is to restart the kernel. – Overtrick 23/10, 2011 at 10:38

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags