miniMD application

Instrumentation

Runtime filterung

  1. The miniMD application is built as follows:
    make openmpi PREP="scorep"
  2. Apply scorep-autofilter as follows:
    scorep-autofilter -t 0.1 -f scorep scorep-*/profile.cubex

The file scorep.filt contains the region names to be filtered enclosed between SCOREP REGION NAMES BEGIN and SCOREP REGION NAMES END, as shown below:

SCOREP_REGION_NAMES_BEGIN
EXCLUDE
Atom::Atom()
Atom::~Atom()
...
SCOREP_REGION_NAMES_END

A script to repeat the identification of too fine-granular regions for the miniMD applicationis available in

/projects/p_readex/ichec/test_apps/miniMD_10_ref_alpha_prototype/run_saf.sh

The three arguments to this script are: (1) the value for -t input to scorep-autofilter, (2) the input file name to miniMD, and (3) the number of MPI processes to create for the application.
For different applications, run saf.sh can be reused by updating the line to execute the application. This script requires do scorep autofilter single.sh that is present in the same directory.

Manual Instrumentation

For the miniMD application manually annotate ForceLJ::compute halfneigh() and its parents Integrate::run() and main() as significant regions as shown in the following files respectively:

/projects/p_readex/ichec/test_apps/miniMD_10_ref_alpha_prototype/force_lj.cpp
/projects/p_readex/ichec/test_apps//miniMD_10_ref_alpha_prototype/integrate.cpp
/projects/p_readex/ichec/test_apps//miniMD_10_ref_alpha_prototype/ljs.cpp

Design Time Analysis

Apply readex-dyn-detect

  1. The miniMD application with manually annotated phase region is built for readex-dyn-detect
    as follows:make openmpi PREP="scorep --online-access --user --thread=none"
  2. When miniMD is run with in2.data as its input file and readex-dyn-detect is applied on the resulting tupled profile.cubex as follows, the function ForceLJ::compute halfneigh() is identified as the significant region.readex-dyn-detect -t 0.001 -p INTEGRATE_RUN_LOOP -c 10 -v 10 -w 10 scorep-<xyz>/profile.cubexHere, readex-dyn-detect takes the granularity for the region as 1 ms with -t 0.001. The option -p INTEGRATE RUN LOOP is given to the tool to identify the phase region from the profile.cubex call tree. Next, three options -c 10 -v 10 -w 10 defines thresholds for the compute intensity variation (absolute value), time deviation in % of the mean region time and weight of the region (%) which is execution time w.r.t phase time.

A script to perform steps 2 and 3 for the miniMD application is available in

/projects/p_readex/ichec/test_apps/miniMD_10_ref_alpha_prototype/run_rdd.sh

For different applications, run rdd.sh can be reused by updating the line to execute the application. This is to be run from the location with the application’s executable and the filter file name considered to be scorep.filt.

The following lines are printed as part of the output by readex-dyn-detect for miniMD:

1 ...
2 Significant regions are:
4 void Comm::borders(Atom&)
5 void ForceLJ::compute_halfneigh(Atom&, Neighbor&, int) [with int EVFLAG = 0; int GHOST_NEWTON = 1]
6 void ForceLJ::compute_halfneigh(Atom&, Neighbor&, int) [with int EVFLAG = 1; int GHOST_NEWTON = 1]
7 void Neighbor::build(Atom&)
8
9
10 Significant region information
11 ==============================
12 Region name Min(t) Max(t) Time Dev.(%Reg) Ops/L3miss Weight(%Phase)
13
14 void Comm::borders(Atom&) 0.001 0.001 2.6 109 0
15
16 void ForceLJ::compute_hal 0.013 0.014 2.9 97 68
17
18 void ForceLJ::compute_hal 0.016 0.016 0.0 91 1
19
20 void Neighbor::build(Atom 0.047 0.048 0.7 332 23
21
22
23 Phase information
24 =================
25 Min Max Mean Dev.(% Phase) Dyn.(% Phase)
26
27 0.0138626 0.0664566 0.020337 72.731 258.612
28
29 ...
30
31 SUMMARY:
32 ========
33
34 Inter-phase dynamism due to variation of the execution time of phases
35
36 No intra-phase dynamism due to time variation
37
38 Intra-phase dynamism due to variation in the compute intensity of the following important significant regions
39
40 void ForceLJ::compute_halfneigh(Atom&, Neighbor&, int) [with int EVFLAG = 0; int GHOST_NEWTON = 1]
41
42 void Neighbor::build(Atom&)

The printed output above for the miniMD application can be divided into three parts:

First, line no from 1 to 7 lists the name of the significant regions computed from the detection algorithm. To know the algorithm detail please see deliverable D 2.1.

Secondly, line no. from 10 to 29 shows the profile statistic output for the detected significant regions and phase region. Significant region information presents the minimum, maximum of the execution time for each significant region as well as the aggregated execution for the region.
It also prints the time deviation in % with respect to its mean value. Ops/L3miss column prints the absolute compute intensity value. The last column, Weight(%Phase) is the execution time with respect to phase time.

The tool after that prints the statistics information for the phase region as well. It shows the minimum, maximum, mean values of the execution time spent on the phase regions well as the aggregated execution for the phase. The Dev.(% Phase) column prints the time deviation w.r.t the phase mean execution time. Last column, Dyn.(% Phase) prints the variation between minimum and maximum execution time w.r.t the mean execution time of the phase.

Finally, the tool prints the summary results of the dynamism analysis. First, if the standard deviation % of the phase is larger than the given variation threshold -v, the tool indicates having inter-phase dynamism due to variation of the execution time of phases. Otherwise, the application doesn’t have inter-phase dynamism. For miniMD, the variation (%) is 72.731, which is larger than the threshold, -v. So the tool detects inter-phase dynamism for miniMD.

The tool compares Weight(%Phase) with the given threshold given by the user. If a significant region has enough weight(> w) and it’s time deviation w.r.t region is more than the threshold time deviation -v, the tool detects intra-phase dynamism for these significant region(s) due to time variation. For miniMD, there are two significant regions having weights larger than the given threshold(> 10%):

void ForceLJ::compute_halfneigh(Atom&, Neighbor&, int) [ with int EVFLAG = 0; int GHOST_NEWTON = 1 ]
void Neighbor::build(Atom&)

But neither of them has time deviation greater than 10%. So the tool doesn’t detect intraphase for miniMD due to time deviation.

The tool computes the variation of the compute intensity for the set of detected significant regions having a minimum weight of 10%. For miniMD the variation value is larger than the provided threshold of compute intensity -c. So the tool detects intra-phase dynamism due to the variation in the compute intensity characteristic followed by printing the region names.

Apply PTF

A script to perform steps 3-5 for the miniMD application is available in

/projects/p_readex/ichec/test_apps/miniMD_10_ref_alpha_prototype/run_ptf.sh

For different applications, run ptf.sh can be reused by updating the command to run the application in –apprun. This script is to be run from the location with the application’s executable.

Runtime Application Tuning

Runtime Application Tuning can be performed by the Readex Runtime Library (RRL) using the following steps.

  1. The script “compile_ for_plain.sh” is used to generate a binary without Score-P instrumentation.
  2. Both the benchmark plain binary generated in the first step and the binary compiled for PTF using the script “compile_for_ptf.sh” or in case of manual tuning “compile_for_ptf_manual” is used at runtime for RRL run.
  3. The benchmark can be run using the script “run_rrl.sh”. The run script can be updated for custom configurations. To do custom configurations, edit “rrl_tests.txt” to define the new test configuration. Next, execute the command “generate_plain_rrl_hdeem.sh rrl_tests.txt <number_of_repeat_runs_per_test>”, where “number_of_repeat_runs_per_test” is an integer specifying how many times a to repeat the test. This will generate a new “run_rrl.sh” script with updated test configurations.
  4. The script “run_rrl.sh” performs tests for plain run and the RRL run and uses HDEEM for energy measurements. It takes as input the “tuning_model.json” generated by applying PTF. The outputs of the run will be in “miniMD_plain_hdeem.out” and “miniMD_rrl_hdeem.out” containing runtime and energy consumption of plain runs and RRL runs respectively.
  5. To use “sacct” instead of HDEEM for energy measurements, the script “run_sacct_rrl_plain.sh” is used. It also performs the experiments for both plain and RRL run. It outputs the “Average Plain Time”, “Average Plain Energy”, “Average RRL Time” and “Average RRL Energy” to the console.