Userguide¶
Profiling¶
Execution profiler¶
Description: produces
profiler_nodeX.txt
file for each node, which gives the execution time of each task on that node and the amount of data it passes to its child tasks. These results are required in the next step for HEFT algorithm.Input:
dag.txt
,nodes.txt
, DAG task files (task1.py
,task2.py
,… ), DAG input file (input.txt
)Output:
profiler_nodeNUM.txt
How to run
- Case 1: the file
scheduler.py
will copy theapp
folder to each of the nodes and execute the docker commands. Insidecirce/docker_execution_profiler
folder perform the following command:
python3 scheduler.py
- Case 2: copy the
app
folder to the each of the nodes using scp and insideapp
folder perform the following commands where hostname is the name of the node ( node1, node2, etc.).
docker build –t profilerimage . docker run –h hostname profilerimage
- In both cases make sure that the command inside file
app/start.sh
gives the details (IP, username and password) of your scheduler machine.
- Case 1: the file
Central network profiler¶
Description: automatically scheduling and logs communication information of all links betweet nodes in the network, which gives the quaratic regression parameters of each link representing the corresponding communication cost. These results are required in the next step for HEFT algorithm.
Input:
- File
central.txt
stores credential information of the central node
CENTRAL IP USERNAME PASSWORD IP0 USERNAME PASSWORD - File
nodes.txt
stores credential information of the nodes information
TAG NODE (username@IP) REGION node1 username@IP1 LOC1 node2 username@IP2 LOC2 node3 username@IP3 LOC3 - File
link_list.txt
stores the the links between nodes required to log the communication
SOURCE(TAG) DESTINATION(TAG) node1 node2 node1 node3 node2 node1 node2 node3 node3 node1 node3 node2 - File
Output: all quadratic regression parameters are stored in the local MongoDB on the central node.
- How to run:
At the central network profiler:
- Install required libraries:
./central_init
- Inside the folder central, input add information about the nodes and the links.
- Generate the scheduling files for each node, prepare the central database and collection, copy the scheduling information and network scripts for each node in the node list and schedule updating the central database every 10th minute.
python3 central scheduler.py
At the droplets:
- The central network profiler copies all required scheduling files and network scripts to the folder online profiler in each droplet.
- Install required libraries
./droplet_init
- Generate files with different sizes to prepare for the logging measurements, generate the droplet database, schedule logging measurement every minute and logging regression every 10th minute. (These parameters could be changed as needed.)
python3 automate droplet.py
System resource profiler¶
Description: This Resource Profiler will get system utilization from all the nodes in the system. These information will then be sent to home node and stored into mongoDB.
Output: The information includes: IP address of each node, cpu utilization of each node, memory utilization of each node, and the latest update time.
How to run:
For working nodes:
- copy the
Resource_Profiler_server
folder to each working node using scp. - In each node:
python2 Resource_Profiler_server/install_package.py
- copy the
For scheduler node:
- copy
Resource_Profiler_control
folder to home node using scp. - if a node’s IP address changes, just update the
Resource_Profiler_control/ip_path
file - optional: inside
Resource_Profiler_control
folder:
1 2
python2 install_package.py python2 jobs.py &
- copy
Note: the content of
ip_path
are several lines of working nodes’ IP address. So if a node’s IP address is changed, make sure to update theip_path
file.
Heft¶
Description: This HEFT implementation has been adapted/modified from [2].
Input: HEFT implementation takes a file of .tgff format, which describes the DAG and its various costs, as input. The first step is to construct this (
input.tgff
) file from the input filesdag.txt
,profiler_nodeNUM.txt
. Fromcirce/heft/
folder execute:python write_input_file.py
HEFT algorithm: This is the scheduling algorithm which decides where to run each task. It writes its output in a configuration file, needed in the next step by the run-time centralized scheduler. The algorithm takes input.tgff as an input and output the scheduling file
configuration.txt
. Fromcirce/heft/
run:python main.py
Centralized scheduler with profiler¶
Centralized run-time scheduler. This is the run-time scheduler. It takes the configuration file
configuration.txt
, given by HEFT, the node informationnodes.txt
and orchestrates the execution of tasks on given nodes, and output the DAG output files incirce/centralized_scheduler/output/
folder. Insidecirce/centralized_scheduler
folder run:python3 scheduler.py
Wait several seconds and move
input1.txt
toapac_scheduler/centralized_scheduler/input/
folder (repeat the same for other input files).Stopping the centralized run-time scheduler. Run:
python3 removeprocesses.py
This script will shh into every node and kill running processes, and kill the process on the master node.
If network conditions change, one might want to restart the whole application. This can be done by running:
python3 remove_and_restart.py
The first part of the script stops the system as described above. It then runs HEFT and restarts the centralized run-time scheduler with the new task-node mapping.