Common pitfalls
This a glossary on how to not get burned by SOLPS-ITER. It covers error encountered by several users (or by one user frequently) which deserve a chapter of their own and don't logically belong in another tutorial.
The danger of timestamps
"I ran
b2run b2uf
on my converged simulation and deleted the results and started a whole new simulation!""My initial state
b2fstati
is being ignored and overwritten with flat profiles!""
b2run b2something
is not doing what the manual says it should!"
The issue is that b2run
tries to be smart. Using Makefile under the hood, it analyses dependency chains between the input/output files and tries to propagate changes to input files by triggering additional programs you did not intend to run. In general, the main cause are any changes to file timestamps. Examples include:
- Something was copied into the
run
/baserun
directory. - The
run
/baserun
directories themselves were copied (for instance while branching out or transferring a case) - Input files were edited after a simulation run.
Perform dry runs as prevention
A good general rule to check whether b2run
is about to cause mayhem: perform a dry run b2run -n <your_program>
first. Its output lists programs which are going to be executed, giving you a chance to see if you're fine with that. To parse the dry run output, use:
# Print a list of programs ending with "b2*.exe"
b2run -n <your_program> | grep -oE 'b2[a-z]+.exe' | uniq
If you see only the program you intended to run (like b2mn.exe
or b2uf.exe
), all is fine. If you see anything else, you have a problem. For example:
> b2run -n b2uf | grep -oE 'b2[a-z]+.exe' | uniq
b2mn.exe
b2uf.exe
b2mn.exe
in the list is a red flag since that would start the whole simulation and overwrite the results in b2fstate
. Another common example:
> b2run -n b2mn | grep -oE 'b2[a-z]+.exe' | uniq
b2ah.exe
b2ai.exe
b2ar.exe
b2mn.exe
b2fstati
and start from the flat profiles solution (created by b2ai
for "initial").
Fix timestamps using correct_b2yt_timestamps
To update timestamps of existing files, run:
# Updates b2mn input files' timestamps
correct_b2yt_timestamps
# Switch from tcsh to bash and update b2mn output files' timestamps
bash
for f in b2mn.prt b2fparam b2fstate b2fmovie b2ftrace b2ftrack b2fplasma; do [ -s $f ] && touch $f && stat $f; done
exit
It should be noted that running correct_b2yt_timestamps
effectively cloaks changes made in each of the more obscure (and rarely modified) input files listed in the table below. If you need some changes to take effect before starting a simulation, either touch
the input file in question or delete the generated file according to this table:
I've modified... Before b2run b2mn
, I should delete...Comment b2ag.dat
b2fgmtry
,fort.30
(rarely modified) b2ah.dat
b2fpardf
just use b2.*.parameters
files to avoid this issueb2ai.dat
b2fstati
(rarely modified) b2ar.dat
b2frates
(rarely modified)
Fix timestamps by rerunning the simulation
Another option how to "revive" a completed simulation before you post-process the results is to re-run it from its final state while specifying few/zero iterations in the b2mn.dat
file.
# Either set the number of iterations to zero in b2mn.dat...
'b2mndr_ntim' '0'
# ...Or set the elapsed time to a small amount
'b2mndr_elapsed' '30' #s
Take care to change this back after you're done. Nothing bad will happen, you just might waste time not performing a long SOLPS simulation.
Prepare the simulation for a restart:
rm b2mn.prt
cp b2fstate b2fstati
Perform a dry run to check what's going to be called, correct the time stamps if needed, and check again.
b2run -n b2mn | grep -oE 'b2[a-z]+.exe' | uniq
correct_b2yt_timestamps
b2run -n b2mn | grep -oE 'b2[a-z]+.exe' | uniq
Finally, submit the short simulation:
b2run b2mn >& run.log &
After this, you are good to perform b2run b2uf
or any other post-processing you wish. These short simulation restarts are also good when printing out individual equation terms with the 'b2wdat_out' '4'
switch in b2mn.dat
. Again, don't forget to reset the switches after you're done.
Simulation divergence
Not all SOLPS-ITER simulations have the decency of a COMPASS deuterium runs, which will converge even if the server maintainer has contracted coronavirus and the hard disks are on fire. (No, I am not referring to Marconi Gateway, what are you suggesting.) Here are some tips how to recognise and remedy code divergence.
How to recognise a run is not converging
- Residuals (plotted with commands such as
resco
) should fall with time or remain constant. (If you're running a coupled simulation, B2.5+EIRENE, ignore the regular peaks in residuals, they are EIRENE calls. See also Questions and answers: What is the meaning of residual plots.) If the residuals are going up or fluctuating wildly, something is wrong. You can check the residuals while the simulation is running.
-
Inspect the simulation state using the
2dt
command (see section How to check the simulation using2dt
). This can be done while the simulation is running. For example, if the outer midplane separatrix electron temperature (2dt tesepm
) is 500 eV and rising, your simulation is probably diverging. If it's fluctuating wildly, ditto. -
If you let the simulation crash, it will not produce output (either
b2fstate
will be missing entirely, or it will contain only a few lines, see below). As a result, you won't be able to inspect the plasma state usingb2plot
and many other post-processing routines. So if you suspect something is going wrong, soft-land the simulation usingtouch b2mn.exe.dir/.quit
...and inspect its state. There is a LOT you can look at, so use your brain and follow any clue you have, for instance from inspecting run.log
.
-
After a soft landing, you can visualise overall results:
This busy figure shows key SOLPS-ITER quantities (density \(n\), temperatures \(T\), parallel/poloidal velocities \(u_a\) and plasma pressure \(p\)) for a deuterium simulation. Red stands for ions, blue for electrons and purple for combined. The quantities are plotted at the most important locations: outer midplane (OMP), both sides of the X-point (AKA divertor entrance) and both the targets. Right off the bat, you can see that the inner target \(T_i\) and \(u_a\) look weird and non-monotonic. This may and may not be a problem, but at least you've got an idea where to look.
-
Another approach is to hunt for particular details. Brain power is especially handy here; enlist your colleagues (not necessarily SOLPS experts, since you're looking for general plasma physics insights) and let them guess what may cause whichever strangeness you come across. Below I give two examples of hunting for the cause of SOLPS-ITER divergence. This process may take months and require multiple meetings, asking several experts, reading the manual back and forth, inspecting the source code, having dreams about black slaves in uncle Tom's cabin solving SOLPS-ITER equations (true story) and other painful experiences. Don't be discouraged that you don't know why or how your simulation is diverging. Take a nap, make coffee, get help, and you'll figure it out.
How to recognise a simulation crash
First make sure that the simulation is not running anymore. This can be done in several ways:
- Check the last modification date of the
b2mn.exe.dir
directory. If it's very recent and is updated every few seconds, the simulation is still running. If you have mounted the remote folder, make sure to refresh a lot. - The same can be done for the
run.log
file. - Check if the
b2fstate
file is present. If it is, the simulation has definitely ended. If it's absent, the simulation might still be running or might have crashed. (Usually it's still running, though.) - If you ran the simulation using a submission script (such as
sorobansubmit
), check if the job has ended (possibly before you specified inb2mn.dat
). - Use the
cpu
command twice, ten seconds apart. This often doesn't work on Soroban, but it's pretty reliable on Marconi Gateway. -
When launching the simulation with the
b2run
command, sometimes this message is printed upon completion:[2] Done b2run b2mn >& run.log
If the simulation has ended, these are the symptoms of a crash:
- The
b2fstate
file is present, but it contains only a few lines of text. - The
b2fstate
file is missing altogether. - The directory
b2mn.exe.dir
is present in the run folder. An exception is if you usedsorobansubmit
, that leavesb2mn.exe.dir
behind even if it ends normally. b2plot
commands, such asecho "phys te surf" | b2plot
, fail, complaining about "No rule to make target 'b2fplasma'".- The simulation did not run as long as you wanted it to.
- Plasma parameters plotted by
2dt tesepm
and similar commands diverge to unphysical values.
I think something went wrong here.
-
The
run.log
file ends in an error message. If your simulation was long, itsrun.log
may be too large to open quickly. To save the last 100 lines from therun.log
file into ashort_run.log
file, use the command:tail -100 run.log > short-run.log
Alternatively, refer to
run.log.last10
if present.
How to check the simulation using 2dt
A single SOLPS-ITER solution (represented mostly by the b2fstate
file) is a snapshot in time, the plasma state just before the next EIRENE call. Except for spotting unphysical values in the solution, it is not meant for watching how the simulation evolves with subsequent iterations. There is, however, a built-in way to keep track of the "time evolution" of many scalar, vector and 2D quantities in this plasma solution. It is activated by the b2mndt_b2time
switch inside the b2mn.dat
file, its results are stored in the b2time.nc
file and the results can be viewed with the 2dt
command.
To activate the time evolution tracking, set the b2mndt_b2time
switch to a positive integer value, for instance:
'b2mndr_b2time' '1'
1
will save at every occasion.
The quantities which are stored as the simulation goes along are listed in Appendix E of the manual in a table. They include fluxes, temperatures, densities, potentials etc. at various locations in the simulated domain. Using the b2mwti_2dwrite
switch, entire 2D arrays (\(n_e\), \(T_e\), \(T_i\)) can be saved in the b2time.nc
file.
The basic usage of the 2dt
command is:
# Plot time evolution of the separatrix electron temperature at the outer midplane
2dt tesepm
2dt tesepm
produces a messy line. You can plot any quantity you want, since a violent crash will usually diverge the entire plasma solution. My favourite is the OMP separatrix electron density (nesepm
), but the target quantities (such as the inner strike point electron density nesepi
) are more sensitive.
How to plot 2D maps of residuals
Given that your run has not crashed and burned yet, a 2D map of residuals can be plotted with
echo "phys rco surf" | b2plot
echo "phys rco 2 zsel surf" | b2plot
b2yn
command should do the same trick, but I haven't managed to make it work. (Might be something about the source data file b2ftrack
being mostly empty.)
Divergence hunt 1: carbon sputtering
Take the resco
plot above where C0 residuals stop evolving with time. Since this strange occurrence concerns carbon neutrals and the continuity equation, you may look at the carbon species densities using the following b2plot
script (refer to the b2plot tutorial for more information if needed):
# Set the figure orientation and configure common subfigure elements
phys a4l 4 2 page logf aspc vesl 1.5 lbsz
0.3 rmin 0.8 rmax -0.4 zmin 0.4 zmax
# Plot density of each carbon ionisation state in turn
na 2 zsel 9e3 fmin 1.1e4 fmax surf
na 3 zsel 1e3 fmin 1e17 fmax surf
na 4 zsel 1e3 fmin 1e17 fmax surf
na 5 zsel 1e3 fmin 1e17 fmax surf
na 6 zsel 1e3 fmin 1e17 fmax surf
na 7 zsel 1e3 fmin 1e17 fmax surf
na 8 zsel 1e3 fmin 1e17 fmax surf
Here you see that, for some weird reason, the C0 density is 104 m-3 across the entire computation domain. Well, maybe the SOL temperature is so high that any neutral carbon becomes quickly ionised? You check the \(T_e\), setting the colour bar extent to focus on low temperatures:
echo "phys te 0.0 fmin 40.0 fmax surf" | b2plot
Carbon ionisation energy is 11.16 eV (as stated in the b2plot
output) and \(T_e>10\) eV pretty much in the entire SOL. So the argument of high-temperature SOL holds some water. Maybe SOLPS-ITER is so wired that the C0 density can't fall under 104 m-3, even though physics says it should continue being ionised?
To verify this, you restart the simulation from the same b2fstati
and break it up into smaller parts. Taking individual snapshots of the carbon densities, electron temperature and resco
, you find that the carbon densities fall with every iteration, from approx. 1015 m-3 (flat profiles solution) to what you see above. The moment C0 density falls to 104 m-3, the C0 continuity equation residuals stop evolving with further iterations. Thus you conjecture that this is, at root, a numerical problem, not physical. Perhaps there is some switch in SOLPS-ITER which sets the minimum species density? You refer to the B2 input doc viewer and find that, indeed, there is a switch b2mndr_na_min
which gives the minimum ion species density. Its default value is 104 m-3. VoilĂ .
This scenario does not describe divergence per se, but I did encounter it while investigating divergence following adding carbon sputtering to a COMPASS deuterium simulation. It eventually turned out that a carbon sputtering switch was set wrong in input.dat
, so there was little or no carbon coming into the simulation domain. (The walls and targets were set, by default, to absorb carbon without recycling.) After adding the carbon ion species, their density began at the flat profiles solution, 1015 m-3, then decreased as the carbon was pumped out of the simulation domain. When the C0 density fell to 104 m-3, its residuals stopped evolving with subsequent iterations, but this did not cause the simulation to crash. The reason for the eventual crash is still unknown.
Divergence hunt 2: EIRENE strata
While performing a heat flux limiter scan in COMPASS H-mode plasmas, I repeatedly encountered a particular sort of crash. Its first sign was, as usual, that the simulation ended early, b2fstate
contained only a few lines of text, and the 2dt nesepm
plot showed a sudden jump at the end of the simulation.
The end of run.log
said, among other things:
XTIM(ISTRA)= 29.9603174603175
ERROR IN VELOEI
NPANU 5
IREI 1 0.000000000000000E+000
EIRENE EXIT_OWN ENTERED FROM PROCESSOR 0
Run apparently failed: Optional files not moved
Apparently there was a problem with EIRENE. I referred to the EIRENE manual, and found that VELOEI
is a routine which calculates electron-impact collisions (and lacks detailed documentation). Not very helpful. I scrolled through run.log
unhappily until my eyes fell upon the tally printout. Apparently, in every B2.5-EIRENE iteration, intermediate results are printed out just for desperadoes such as I. The following line caught my attention:
tflux: 1.0387084E+56 3.6568542E+41 4.0123014E+43 6.0231569E+41 0.0000000E+00 2.9567243E+15 1.5507629E+19 3.9618810E+15 1.9964270E+54 9.9604863E+17 6.2415091E+18 6.2415091E+18 6.2415091E+18
tflux
gives the flux strength for the first #1 strata from b2.neutrals.parameters
. I had no idea what a stratum is, but it did seem suspicious that in the previous iterations, tflux
had values only about:
tflux: 3.6566185E+21 8.6639188E+19 3.1764206E+21 8.9073048E+19 2.0026835E+17 1.2713032E+16 1.9079832E+17 1.3954952E+16 5.2018662E+18 1.5476509E+18 6.2415091E+18 6.2415091E+18 6.2415091E+18
I transferred one of the crashed simulations from Soroban to Marconi Gateway, where I had installed the SOLPS version 3.0.8-29-gf89a6a23
, which had been deemed safe by Oleg. I ran it from the same b2fstati
as previously, trying to reproduce the error - but it did not crash. It ran just fine. I transferred the result back to Soroban. It ran just fine.
Relaunching the parameter scan again and again, I discovered that this error happened at random, in about 20 % of the simulations. When resubmitted from the same b2fstati
, they usually ran through the danger point without a glance back.
2dt nesepm
. The simulation starts in the upper left corner, runs for about a day and is restarted at \(t\) = 1.269 s. At \(t\) = 1.278 s, it crashes (vertical line) and is relaunched from the previous b2fstati
(diagonal line). It continues running for another day without a problem, roughly reproducing the previous convergence path.
I scoured the SOLPS documentation, looking for what the numbers in tflux
meant. They were fluxes for some strata, but what were strata? The research eventually bore fruit, yielding a dictionary entry and the Q&A What is a "stratum". The flux
values which most commonly went haywire represented the neutral fluxes from the W
and E
boundary. That is, there was a huge influx of neutrals from the divertor targets, especially the inner one. The rest of run.log
was the code complaining there are NaNs everywhere. Just prior to the tflux
printout, the log said:
b2stbc: wrong_flow returned from b2stbc_phys
b2stbc.F
, as I found by searching for b2stbc
in the SOLPS-ITER source code, is a B2-5 routine responsible for calculating the boundary conditions. This, again, indicates that something weird happened on the divertor targets.
Since the crash seemed to happen at random and was not reproducible, I finally concluded that nothing was wrong with my simulation and that this divergence was accidental. Why did it happen, though? That remains a task for future me. Since the used version of SOLPS was two years old, there is a good chance updating to the latest version (or Oleg's "safe" version for that matter) would have solved the problem.
Tailor convergence path
SOLPS-ITER works best in baby steps. Lead it by the hand from the simple to the complex and it will have an easier time converging.
- Do not start from the flat profiles solution. Even a converged
b2fstate
of a completely different simulation of a completely different tokamak (in the same topology) will work better than flat profiles. Ideally, start the simulation from a similar plasma state. This will prevent divergence and save you time waiting for the run to converge. - Start the simulation with low input power and density. When the initial run converges, raise one or both a little and wait for convergence again. You can do in steps what you can't do in one fell swoop.
- When introducing a new element into your simulation, e.g. gas puff, don't add it all at once. Introduce it step-wise, allowing the simulation to converge and relax after each increment.
- When adding impurities, start from a converged
b2fstate
with the same parameters without the impurities. SOLPS-ITER will automatically add the new ion species. - Keep copies of
b2fstate
along the path. These will enable you to resume the simulation from an earlier state.
EIRENE averaging schemes
One might think: If EIRENE produces statistical noice, perhaps it has an effect on the convergence. Then wouldn't using the already implemented averaging schemes help? Sadly, as stated in the manual, section 5.4 Averaging scheme:
This averaging scheme does NOT affect the physical run behaviour as such and therefore does not have any stabilizing effect on the run.
Averaging schemes serve speeding up the code convergence and assessing the statistical error introduced by EIRENE; it does not improve convergence.
Reduce the time step
The manual section 5.3 Changing dt delves into the effect of changing the time step (switch b2mndr_dtim
in the b2mn.dat
file), but it seems to conclude that the time step does not really affect convergence quality, as nearly all the solutions follow a similar time evolution of plasma parameters plotted by the 2dt
command.
My own limited experiments have shown that if a simulation is going to diverge, it is going to diverge with any time step from \(10^{-4}\) to \(10^{-6}\). Examples from a D+C run, all resco
residual plots:
Simulation examples | Deuterium+carbon, COMPASS |
---|---|
![]() |
![]() |
\(dt = 100\) \(\mu\)s, run for 10 minutes | \(dt = 50\) \(\mu\)s, run for 10 minutes |
![]() |
![]() |
\(dt = 20\) \(\mu\)s, run for 10 minutes | \(dt = 10\) \(\mu\)s, run for 10 minutes |
![]() |
![]() |
\(dt = 5\) \(\mu\)s, run for 10 minutes | \(dt = 2\) \(\mu\)s, run for 20 minutes |
![]() |
|
\(dt = 1\) \(\mu\)s, run for 40 minutes |
Note that in all cases, the residuals of the neutral carbon C0 continuity equation eventually start rising. This was described in paragraph Divergence hunt 1: carbon sputtering. It is caused by the neutral carbon density falling to the minimum ion species density, set by the b2mn.dat
switch b2mndr_na_min
. This happens because of carbon sputtering settings in the simulation, regardless of the time step. This leads me to believe that a simulation may be doomed to diverge, no matter what the time step, because its convergence path takes it to some no-go zone. With a smaller time step, it just gets there slower.
Switch to standalone B2.5 (fluid neutrals)
While switching to the standalone mode might not make your desired plasma easier to model, it will typically speed up the convergence (divergence) and might bypass EIRENE errors. The relevant switches are described in the manual section 5.1.5 b2mn.dat as well as in Questions and answers: Where can I switch between EIRENE modelling and B2.5 fluid modelling.
Make SOLPS-ITER run faster
Here tips and tricks for accelerating SOLPS-ITER runs are given. Some have been tried, others are (sadly) still theoretical.
Guide the convergence path
Cases with fewer ion species and without drifts are more stable, can endure higher time step and solve fewer equations. Cases with low density are better behaved than cases with high density. Cases which are already converged for some input parameters are better behaved than the flat profile solution. Introducing changes step-wise carries less chance of divergence than jumping straight from start to finish.
Take for instance a D+C case which was based on a pure-D case and now runs well with \(dt=10^{-5}\) s, but you need to change the plasma state substantially. Instead of labouring with the D+C case, you might revert to the pure-D case, introduce the changes there and then import the solution back into the D+C case (by copying b2fstate
). This will mean the carbon species will start from flat profiles again, but at least the deuterium background will be realistic.
Another version of this trick can be used while branching out runs, for instance while making parameter scans. First, procure a converged case in the centre of the parameter space, and branch it out to seed the other runs. If one run in the scan crashes, try reseeding it with b2fstate
of a nearby converged run.
Increase the time step
The manual, section 5.3 Changing dt, shows that if the simulation is inherently stable (no inconsistent inputs, no metastable states), changing the time step essentially takes it along the same convergence path, just faster. That said, you have no way of knowing in advance whether you're exploring parameter space with a single, well-defined optimum where the simulation will happily converge. Increasing the time step may make your simulation unstable, so always keep around copies of b2fstate
.
Use MPI parallelisation
This applies only if you're running a coupled case, that is B2.5+EIRENE (or "kinetic neutrals"). On Soroban, this means using the sorobansubmit
command (see Running SOLPS-ITER: sorobansubmit
). EIRENE, as a Monte Carlo code which calculates the fates of many independent particles, naturally takes to parallelisation. In an unparallelised run, it will take up 85 \% of SOLPS-ITER runtime; this fraction can be easily reduced to 50 \% by parallelising EIRENE on 8 CPUs.
Run multiple simulations at once
Provided you have enough computational power, launch several independent SOLPS-ITER runs. Looking after them feels like teaching at a kindergarten, with individual runs randomly crashing or ignoring b2fstati
, but it's still faster than launching them in succession.
In the same vein, if you have access to several servers with the same SOLPS-ITER installation, run independent runs on all of them.
Run simulations over the weekend
Computational clusters like Soroban tend to be less busy on the weekend, so not only can you hog more processors, but also the individual runs might go faster. Plus, there's less of a chance of crashes due to busy file system.
Don't start from the flat profiles solution
Even if it's a different plasma in a different machine, as long as the number of cells is the same simply copying b2fstate/i
into the new run will aid its convergence. If the number of cells doesn't match, use the b2yt
command (refer to the manual for exact instructions). In the words of Fabio Subba, 2021:
Convergence of the run and even the result might depend on the starting profiles. Flat profile case is not the best choice. Fastest convergence is reached when restarting from the similar run. Important to have the same grids structure and can be even different machine. The main reasons of this dependence is the following: the transport equations are strongly nonlinear and unphysical, or far from real (like flat profiles) can drive to wrong metastable solutions.
Switch to fluid neutrals
As [Carli, 2021] argues, the latest version of fluid neutrals (or standalone B2.5) is accurate enough to replace EIRENE if you really gotta go fast.
SOLPS-ITER speed-up schemes
I have not tried any of these because I am intimidated, but once I need to really speed SOLPS up, I might bite the bullet and figure out how to use them.
EIRENE averaging schemes are described in $SOLPSTOP/doc/SOLPS-ITER_EIRENE_averaging.pdf
. They are said to speed up convergence, but I don't see how. It just seems like a bunch of extra calculations to me.
Three speed-up schemes informed by physics/numerics are described at the ITER Sharepoint, Speed_UP_SOLPS-ITER_14-11-2018.pptx. They promise a lot of time saved.