Skip to content

Common pitfalls

This a glossary on how to not get burned by SOLPS-ITER. It covers error encountered by several users (or by one user frequently) which deserve a chapter of their own and don't logically belong in another tutorial.

The danger of timestamps

"I ran b2run b2uf on my converged simulation and deleted the results and started a whole new simulation!"

"My initial state b2fstati is being ignored and overwritten with flat profiles!"

"b2run b2something is not doing what the manual says it should!"

The issue is that b2run tries to be smart. Using Makefile under the hood, it analyses dependency chains between the input/output files and tries to propagate changes to input files by triggering additional programs you did not intend to run. In general, the main cause are any changes to file timestamps. Examples include:

  • Something was copied into the run/baserun directory.
  • The run/baserun directories themselves were copied (for instance while branching out or transferring a case)
  • Input files were edited after a simulation run.

Perform dry runs as prevention

A good general rule to check whether b2run is about to cause mayhem: perform a dry run b2run -n <your_program> first. Its output lists programs which are going to be executed, giving you a chance to see if you're fine with that. To parse the dry run output, use:

# Print a list of programs ending with "b2*.exe"
b2run -n <your_program> | grep -oE 'b2[a-z]+.exe' | uniq

If you see only the program you intended to run (like b2mn.exe or b2uf.exe), all is fine. If you see anything else, you have a problem. For example:

> b2run -n b2uf | grep -oE 'b2[a-z]+.exe' | uniq
b2mn.exe
b2uf.exe
Seeing b2mn.exe in the list is a red flag since that would start the whole simulation and overwrite the results in b2fstate. Another common example:
> b2run -n b2mn | grep -oE 'b2[a-z]+.exe' | uniq
b2ah.exe
b2ai.exe
b2ar.exe
b2mn.exe
This output means B2.5 is about to ignore your b2fstati and start from the flat profiles solution (created by b2ai for "initial").

Fix timestamps using correct_b2yt_timestamps

To update timestamps of existing files, run:

# Updates b2mn input files' timestamps
correct_b2yt_timestamps

# Switch from tcsh to bash and update b2mn output files' timestamps
bash
for f in b2mn.prt b2fparam b2fstate b2fmovie b2ftrace b2ftrack b2fplasma; do [ -s $f ] && touch $f && stat $f; done
exit

It should be noted that running correct_b2yt_timestamps effectively cloaks changes made in each of the more obscure (and rarely modified) input files listed in the table below. If you need some changes to take effect before starting a simulation, either touch the input file in question or delete the generated file according to this table:

I've modified... Before b2run b2mn, I should delete... Comment
b2ag.dat b2fgmtry, fort.30 (rarely modified)
b2ah.dat b2fpardf just use b2.*.parameters files to avoid this issue
b2ai.dat b2fstati (rarely modified)
b2ar.dat b2frates (rarely modified)

Fix timestamps by rerunning the simulation

Another option how to "revive" a completed simulation before you post-process the results is to re-run it from its final state while specifying few/zero iterations in the b2mn.dat file.

# Either set the number of iterations to zero in b2mn.dat...
'b2mndr_ntim'    '0'

# ...Or set the elapsed time to a small amount
'b2mndr_elapsed'   '30' #s

Take care to change this back after you're done. Nothing bad will happen, you just might waste time not performing a long SOLPS simulation.

Prepare the simulation for a restart:

rm b2mn.prt
cp b2fstate b2fstati

Perform a dry run to check what's going to be called, correct the time stamps if needed, and check again.

b2run -n b2mn | grep -oE 'b2[a-z]+.exe' | uniq
correct_b2yt_timestamps
b2run -n b2mn | grep -oE 'b2[a-z]+.exe' | uniq

Finally, submit the short simulation:

b2run b2mn >& run.log &

After this, you are good to perform b2run b2uf or any other post-processing you wish. These short simulation restarts are also good when printing out individual equation terms with the 'b2wdat_out' '4' switch in b2mn.dat. Again, don't forget to reset the switches after you're done.

Simulation divergence

Not all SOLPS-ITER simulations have the decency of a COMPASS deuterium runs, which will converge even if the server maintainer has contracted coronavirus and the hard disks are on fire. (No, I am not referring to Marconi Gateway, what are you suggesting.) Here are some tips how to recognise and remedy code divergence.

How to recognise a run is not converging

  • Residuals (plotted with commands such as resco) should fall with time or remain constant. (If you're running a coupled simulation, B2.5+EIRENE, ignore the regular peaks in residuals, they are EIRENE calls. See also Questions and answers: What is the meaning of residual plots.) If the residuals are going up or fluctuating wildly, something is wrong. You can check the residuals while the simulation is running.

img

Fluctuating residuals (`resall_D`). The simulation is unstable. Left to its own devices, it will probably crash eventually.

img

Residuals of C0 stop evolving (`resco`). This may or may not lead to simulation crash.
  • Inspect the simulation state using the 2dt command (see section How to check the simulation using 2dt). This can be done while the simulation is running. For example, if the outer midplane separatrix electron temperature (2dt tesepm) is 500 eV and rising, your simulation is probably diverging. If it's fluctuating wildly, ditto.

  • If you let the simulation crash, it will not produce output (either b2fstate will be missing entirely, or it will contain only a few lines, see below). As a result, you won't be able to inspect the plasma state using b2plot and many other post-processing routines. So if you suspect something is going wrong, soft-land the simulation using

    touch b2mn.exe.dir/.quit
    

...and inspect its state. There is a LOT you can look at, so use your brain and follow any clue you have, for instance from inspecting run.log.

  • After a soft landing, you can visualise overall results:

    img

    This busy figure shows key SOLPS-ITER quantities (density \(n\), temperatures \(T\), parallel/poloidal velocities \(u_a\) and plasma pressure \(p\)) for a deuterium simulation. Red stands for ions, blue for electrons and purple for combined. The quantities are plotted at the most important locations: outer midplane (OMP), both sides of the X-point (AKA divertor entrance) and both the targets. Right off the bat, you can see that the inner target \(T_i\) and \(u_a\) look weird and non-monotonic. This may and may not be a problem, but at least you've got an idea where to look.

  • Another approach is to hunt for particular details. Brain power is especially handy here; enlist your colleagues (not necessarily SOLPS experts, since you're looking for general plasma physics insights) and let them guess what may cause whichever strangeness you come across. Below I give two examples of hunting for the cause of SOLPS-ITER divergence. This process may take months and require multiple meetings, asking several experts, reading the manual back and forth, inspecting the source code, having dreams about black slaves in uncle Tom's cabin solving SOLPS-ITER equations (true story) and other painful experiences. Don't be discouraged that you don't know why or how your simulation is diverging. Take a nap, make coffee, get help, and you'll figure it out.

How to recognise a simulation crash

First make sure that the simulation is not running anymore. This can be done in several ways:

  • Check the last modification date of the b2mn.exe.dir directory. If it's very recent and is updated every few seconds, the simulation is still running. If you have mounted the remote folder, make sure to refresh a lot.
  • The same can be done for the run.log file.
  • Check if the b2fstate file is present. If it is, the simulation has definitely ended. If it's absent, the simulation might still be running or might have crashed. (Usually it's still running, though.)
  • If you ran the simulation using a submission script (such as sorobansubmit), check if the job has ended (possibly before you specified in b2mn.dat).
  • Use the cpu command twice, ten seconds apart. This often doesn't work on Soroban, but it's pretty reliable on Marconi Gateway.
  • When launching the simulation with the b2run command, sometimes this message is printed upon completion:

    [2] Done b2run b2mn >& run.log

If the simulation has ended, these are the symptoms of a crash:

  • The b2fstate file is present, but it contains only a few lines of text.
  • The b2fstate file is missing altogether.
  • The directory b2mn.exe.dir is present in the run folder. An exception is if you used sorobansubmit, that leaves b2mn.exe.dir behind even if it ends normally.
  • b2plot commands, such as echo "phys te surf" | b2plot, fail, complaining about "No rule to make target 'b2fplasma'".
  • The simulation did not run as long as you wanted it to.
  • Plasma parameters plotted by 2dt tesepm and similar commands diverge to unphysical values.

I think something went wrong here.

  • The run.log file ends in an error message. If your simulation was long, its run.log may be too large to open quickly. To save the last 100 lines from the run.log file into a short_run.log file, use the command:

    tail -100 run.log > short-run.log
    

    Alternatively, refer to run.log.last10 if present.

How to check the simulation using 2dt

A single SOLPS-ITER solution (represented mostly by the b2fstate file) is a snapshot in time, the plasma state just before the next EIRENE call. Except for spotting unphysical values in the solution, it is not meant for watching how the simulation evolves with subsequent iterations. There is, however, a built-in way to keep track of the "time evolution" of many scalar, vector and 2D quantities in this plasma solution. It is activated by the b2mndt_b2time switch inside the b2mn.dat file, its results are stored in the b2time.nc file and the results can be viewed with the 2dt command.

To activate the time evolution tracking, set the b2mndt_b2time switch to a positive integer value, for instance:

'b2mndr_b2time'     '1'
The integer controls how often the quantities should be saved. Setting its value to 1 will save at every occasion.

The quantities which are stored as the simulation goes along are listed in Appendix E of the manual in a table. They include fluxes, temperatures, densities, potentials etc. at various locations in the simulated domain. Using the b2mwti_2dwrite switch, entire 2D arrays (\(n_e\), \(T_e\), \(T_i\)) can be saved in the b2time.nc file.

The basic usage of the 2dt command is:

# Plot time evolution of the separatrix electron temperature at the outer midplane
2dt tesepm
If this produces a messy line, refer to Questions and answers: 2dt tesepm produces a messy line. You can plot any quantity you want, since a violent crash will usually diverge the entire plasma solution. My favourite is the OMP separatrix electron density (nesepm), but the target quantities (such as the inner strike point electron density nesepi) are more sensitive.

How to plot 2D maps of residuals

Given that your run has not crashed and burned yet, a 2D map of residuals can be plotted with

echo "phys rco surf" | b2plot
...for all species at once or

echo "phys rco 2 zsel surf" | b2plot
...for a particular species number. The b2yn command should do the same trick, but I haven't managed to make it work. (Might be something about the source data file b2ftrack being mostly empty.)

Divergence hunt 1: carbon sputtering

Take the resco plot above where C0 residuals stop evolving with time. Since this strange occurrence concerns carbon neutrals and the continuity equation, you may look at the carbon species densities using the following b2plot script (refer to the b2plot tutorial for more information if needed):

# Set the figure orientation and configure common subfigure elements
phys a4l 4 2 page logf aspc vesl 1.5 lbsz
0.3 rmin 0.8 rmax -0.4 zmin 0.4 zmax

# Plot density of each carbon ionisation state in turn
na 2 zsel 9e3 fmin 1.1e4 fmax surf
na 3 zsel 1e3 fmin 1e17 fmax surf
na 4 zsel 1e3 fmin 1e17 fmax surf
na 5 zsel 1e3 fmin 1e17 fmax surf
na 6 zsel 1e3 fmin 1e17 fmax surf
na 7 zsel 1e3 fmin 1e17 fmax surf
na 8 zsel 1e3 fmin 1e17 fmax surf

img

Here you see that, for some weird reason, the C0 density is 104 m-3 across the entire computation domain. Well, maybe the SOL temperature is so high that any neutral carbon becomes quickly ionised? You check the \(T_e\), setting the colour bar extent to focus on low temperatures:

echo "phys te 0.0 fmin 40.0 fmax surf" | b2plot

img

Carbon ionisation energy is 11.16 eV (as stated in the b2plot output) and \(T_e>10\) eV pretty much in the entire SOL. So the argument of high-temperature SOL holds some water. Maybe SOLPS-ITER is so wired that the C0 density can't fall under 104 m-3, even though physics says it should continue being ionised?

To verify this, you restart the simulation from the same b2fstati and break it up into smaller parts. Taking individual snapshots of the carbon densities, electron temperature and resco, you find that the carbon densities fall with every iteration, from approx. 1015 m-3 (flat profiles solution) to what you see above. The moment C0 density falls to 104 m-3, the C0 continuity equation residuals stop evolving with further iterations. Thus you conjecture that this is, at root, a numerical problem, not physical. Perhaps there is some switch in SOLPS-ITER which sets the minimum species density? You refer to the B2 input doc viewer and find that, indeed, there is a switch b2mndr_na_min which gives the minimum ion species density. Its default value is 104 m-3. VoilĂ .

This scenario does not describe divergence per se, but I did encounter it while investigating divergence following adding carbon sputtering to a COMPASS deuterium simulation. It eventually turned out that a carbon sputtering switch was set wrong in input.dat, so there was little or no carbon coming into the simulation domain. (The walls and targets were set, by default, to absorb carbon without recycling.) After adding the carbon ion species, their density began at the flat profiles solution, 1015 m-3, then decreased as the carbon was pumped out of the simulation domain. When the C0 density fell to 104 m-3, its residuals stopped evolving with subsequent iterations, but this did not cause the simulation to crash. The reason for the eventual crash is still unknown.

Divergence hunt 2: EIRENE strata

While performing a heat flux limiter scan in COMPASS H-mode plasmas, I repeatedly encountered a particular sort of crash. Its first sign was, as usual, that the simulation ended early, b2fstate contained only a few lines of text, and the 2dt nesepm plot showed a sudden jump at the end of the simulation.

Illustrative `2dt tesepm` plot - convergence at the beginning, and then a sudden jump.

The end of run.log said, among other things:

XTIM(ISTRA)=    29.9603174603175
ERROR IN VELOEI
NPANU            5
IREI            1  0.000000000000000E+000
EIRENE EXIT_OWN ENTERED FROM PROCESSOR            0
Run apparently failed: Optional files not moved

Apparently there was a problem with EIRENE. I referred to the EIRENE manual, and found that VELOEI is a routine which calculates electron-impact collisions (and lacks detailed documentation). Not very helpful. I scrolled through run.log unhappily until my eyes fell upon the tally printout. Apparently, in every B2.5-EIRENE iteration, intermediate results are printed out just for desperadoes such as I. The following line caught my attention:

tflux:   1.0387084E+56  3.6568542E+41  4.0123014E+43  6.0231569E+41  0.0000000E+00  2.9567243E+15  1.5507629E+19  3.9618810E+15  1.9964270E+54  9.9604863E+17  6.2415091E+18  6.2415091E+18  6.2415091E+18
According to the SOLPS-ITER manual, tflux gives the flux strength for the first #1 strata from b2.neutrals.parameters. I had no idea what a stratum is, but it did seem suspicious that in the previous iterations, tflux had values only about:
tflux:   3.6566185E+21  8.6639188E+19  3.1764206E+21  8.9073048E+19  2.0026835E+17  1.2713032E+16  1.9079832E+17  1.3954952E+16  5.2018662E+18  1.5476509E+18  6.2415091E+18  6.2415091E+18  6.2415091E+18
The order-of-30 increase only happened in the last iteration. Everything seemed fine until then. Was this a numerical error?

I transferred one of the crashed simulations from Soroban to Marconi Gateway, where I had installed the SOLPS version 3.0.8-29-gf89a6a23, which had been deemed safe by Oleg. I ran it from the same b2fstati as previously, trying to reproduce the error - but it did not crash. It ran just fine. I transferred the result back to Soroban. It ran just fine.

Relaunching the parameter scan again and again, I discovered that this error happened at random, in about 20 % of the simulations. When resubmitted from the same b2fstati, they usually ran through the danger point without a glance back.

2dt nesepm. The simulation starts in the upper left corner, runs for about a day and is restarted at \(t\) = 1.269 s. At \(t\) = 1.278 s, it crashes (vertical line) and is relaunched from the previous b2fstati (diagonal line). It continues running for another day without a problem, roughly reproducing the previous convergence path.

I scoured the SOLPS documentation, looking for what the numbers in tflux meant. They were fluxes for some strata, but what were strata? The research eventually bore fruit, yielding a dictionary entry and the Q&A What is a "stratum". The flux values which most commonly went haywire represented the neutral fluxes from the W and E boundary. That is, there was a huge influx of neutrals from the divertor targets, especially the inner one. The rest of run.log was the code complaining there are NaNs everywhere. Just prior to the tflux printout, the log said:

b2stbc: wrong_flow returned from b2stbc_phys
b2stbc.F, as I found by searching for b2stbc in the SOLPS-ITER source code, is a B2-5 routine responsible for calculating the boundary conditions. This, again, indicates that something weird happened on the divertor targets.

Since the crash seemed to happen at random and was not reproducible, I finally concluded that nothing was wrong with my simulation and that this divergence was accidental. Why did it happen, though? That remains a task for future me. Since the used version of SOLPS was two years old, there is a good chance updating to the latest version (or Oleg's "safe" version for that matter) would have solved the problem.

Tailor convergence path

SOLPS-ITER works best in baby steps. Lead it by the hand from the simple to the complex and it will have an easier time converging.

  • Do not start from the flat profiles solution. Even a converged b2fstate of a completely different simulation of a completely different tokamak (in the same topology) will work better than flat profiles. Ideally, start the simulation from a similar plasma state. This will prevent divergence and save you time waiting for the run to converge.
  • Start the simulation with low input power and density. When the initial run converges, raise one or both a little and wait for convergence again. You can do in steps what you can't do in one fell swoop.
  • When introducing a new element into your simulation, e.g. gas puff, don't add it all at once. Introduce it step-wise, allowing the simulation to converge and relax after each increment.
  • When adding impurities, start from a converged b2fstate with the same parameters without the impurities. SOLPS-ITER will automatically add the new ion species.
  • Keep copies of b2fstate along the path. These will enable you to resume the simulation from an earlier state.

EIRENE averaging schemes

One might think: If EIRENE produces statistical noice, perhaps it has an effect on the convergence. Then wouldn't using the already implemented averaging schemes help? Sadly, as stated in the manual, section 5.4 Averaging scheme:

This averaging scheme does NOT affect the physical run behaviour as such and therefore does not have any stabilizing effect on the run.

Averaging schemes serve speeding up the code convergence and assessing the statistical error introduced by EIRENE; it does not improve convergence.

Reduce the time step

The manual section 5.3 Changing dt delves into the effect of changing the time step (switch b2mndr_dtim in the b2mn.dat file), but it seems to conclude that the time step does not really affect convergence quality, as nearly all the solutions follow a similar time evolution of plasma parameters plotted by the 2dt command.

My own limited experiments have shown that if a simulation is going to diverge, it is going to diverge with any time step from \(10^{-4}\) to \(10^{-6}\). Examples from a D+C run, all resco residual plots:

Simulation examples Deuterium+carbon, COMPASS
img img
\(dt = 100\) \(\mu\)s, run for 10 minutes \(dt = 50\) \(\mu\)s, run for 10 minutes
img img
\(dt = 20\) \(\mu\)s, run for 10 minutes \(dt = 10\) \(\mu\)s, run for 10 minutes
img img
\(dt = 5\) \(\mu\)s, run for 10 minutes \(dt = 2\) \(\mu\)s, run for 20 minutes
img
\(dt = 1\) \(\mu\)s, run for 40 minutes

Note that in all cases, the residuals of the neutral carbon C0 continuity equation eventually start rising. This was described in paragraph Divergence hunt 1: carbon sputtering. It is caused by the neutral carbon density falling to the minimum ion species density, set by the b2mn.dat switch b2mndr_na_min. This happens because of carbon sputtering settings in the simulation, regardless of the time step. This leads me to believe that a simulation may be doomed to diverge, no matter what the time step, because its convergence path takes it to some no-go zone. With a smaller time step, it just gets there slower.

Switch to standalone B2.5 (fluid neutrals)

While switching to the standalone mode might not make your desired plasma easier to model, it will typically speed up the convergence (divergence) and might bypass EIRENE errors. The relevant switches are described in the manual section 5.1.5 b2mn.dat as well as in Questions and answers: Where can I switch between EIRENE modelling and B2.5 fluid modelling.

Make SOLPS-ITER run faster

Here tips and tricks for accelerating SOLPS-ITER runs are given. Some have been tried, others are (sadly) still theoretical.

Guide the convergence path

Cases with fewer ion species and without drifts are more stable, can endure higher time step and solve fewer equations. Cases with low density are better behaved than cases with high density. Cases which are already converged for some input parameters are better behaved than the flat profile solution. Introducing changes step-wise carries less chance of divergence than jumping straight from start to finish.

Take for instance a D+C case which was based on a pure-D case and now runs well with \(dt=10^{-5}\) s, but you need to change the plasma state substantially. Instead of labouring with the D+C case, you might revert to the pure-D case, introduce the changes there and then import the solution back into the D+C case (by copying b2fstate). This will mean the carbon species will start from flat profiles again, but at least the deuterium background will be realistic.

Another version of this trick can be used while branching out runs, for instance while making parameter scans. First, procure a converged case in the centre of the parameter space, and branch it out to seed the other runs. If one run in the scan crashes, try reseeding it with b2fstate of a nearby converged run.

Increase the time step

The manual, section 5.3 Changing dt, shows that if the simulation is inherently stable (no inconsistent inputs, no metastable states), changing the time step essentially takes it along the same convergence path, just faster. That said, you have no way of knowing in advance whether you're exploring parameter space with a single, well-defined optimum where the simulation will happily converge. Increasing the time step may make your simulation unstable, so always keep around copies of b2fstate.

Use MPI parallelisation

This applies only if you're running a coupled case, that is B2.5+EIRENE (or "kinetic neutrals"). On Soroban, this means using the sorobansubmit command (see Running SOLPS-ITER: sorobansubmit). EIRENE, as a Monte Carlo code which calculates the fates of many independent particles, naturally takes to parallelisation. In an unparallelised run, it will take up 85 \% of SOLPS-ITER runtime; this fraction can be easily reduced to 50 \% by parallelising EIRENE on 8 CPUs.

Run multiple simulations at once

Provided you have enough computational power, launch several independent SOLPS-ITER runs. Looking after them feels like teaching at a kindergarten, with individual runs randomly crashing or ignoring b2fstati, but it's still faster than launching them in succession.

In the same vein, if you have access to several servers with the same SOLPS-ITER installation, run independent runs on all of them.

Run simulations over the weekend

Computational clusters like Soroban tend to be less busy on the weekend, so not only can you hog more processors, but also the individual runs might go faster. Plus, there's less of a chance of crashes due to busy file system.

Don't start from the flat profiles solution

Even if it's a different plasma in a different machine, as long as the number of cells is the same simply copying b2fstate/i into the new run will aid its convergence. If the number of cells doesn't match, use the b2yt command (refer to the manual for exact instructions). In the words of Fabio Subba, 2021:

Convergence of the run and even the result might depend on the starting profiles. Flat profile case is not the best choice. Fastest convergence is reached when restarting from the similar run. Important to have the same grids structure and can be even different machine. The main reasons of this dependence is the following: the transport equations are strongly nonlinear and unphysical, or far from real (like flat profiles) can drive to wrong metastable solutions.

Switch to fluid neutrals

As [Carli, 2021] argues, the latest version of fluid neutrals (or standalone B2.5) is accurate enough to replace EIRENE if you really gotta go fast.

SOLPS-ITER speed-up schemes

I have not tried any of these because I am intimidated, but once I need to really speed SOLPS up, I might bite the bullet and figure out how to use them.

EIRENE averaging schemes are described in $SOLPSTOP/doc/SOLPS-ITER_EIRENE_averaging.pdf. They are said to speed up convergence, but I don't see how. It just seems like a bunch of extra calculations to me.

Three speed-up schemes informed by physics/numerics are described at the ITER Sharepoint, Speed_UP_SOLPS-ITER_14-11-2018.pptx. They promise a lot of time saved.