Userguide

Profiling

Execution profiler

  • Description: produces profiler_nodeX.txt file for each node, which gives the execution time of each task on that node and the amount of data it passes to its child tasks. These results are required in the next step for HEFT algorithm.

  • Input: dag.txt, nodes.txt, DAG task files (task1.py, task2.py,… ), DAG input file (input.txt)

  • Output: profiler_nodeNUM.txt

  • How to run

    • Case 1: the file scheduler.py will copy the app folder to each of the nodes and execute the docker commands. Inside circe/docker_execution_profiler folder perform the following command:
    python3 scheduler.py
    
    • Case 2: copy the app folder to the each of the nodes using scp and inside app folder perform the following commands where hostname is the name of the node ( node1, node2, etc.).
    docker build –t profilerimage . docker run –h hostname profilerimage
    
    • In both cases make sure that the command inside file app/start.sh gives the details (IP, username and password) of your scheduler machine.

Central network profiler

  • Description: automatically scheduling and logs communication information of all links betweet nodes in the network, which gives the quaratic regression parameters of each link representing the corresponding communication cost. These results are required in the next step for HEFT algorithm.

  • Input:

    • File central.txt stores credential information of the central node
    CENTRAL IP USERNAME PASSWORD
    IP0 USERNAME PASSWORD
    • File nodes.txt stores credential information of the nodes information
    TAG NODE (username@IP) REGION
    node1 username@IP1 LOC1
    node2 username@IP2 LOC2
    node3 username@IP3 LOC3
    • File link_list.txt stores the the links between nodes required to log the communication
    SOURCE(TAG) DESTINATION(TAG)
    node1 node2
    node1 node3
    node2 node1
    node2 node3
    node3 node1
    node3 node2
  • Output: all quadratic regression parameters are stored in the local MongoDB on the central node.

  • How to run:
    • At the central network profiler:

      • Install required libraries:
      ./central_init
      
      • Inside the folder central, input add information about the nodes and the links.
      • Generate the scheduling files for each node, prepare the central database and collection, copy the scheduling information and network scripts for each node in the node list and schedule updating the central database every 10th minute.
      python3 central scheduler.py
      
    • At the droplets:

      • The central network profiler copies all required scheduling files and network scripts to the folder online profiler in each droplet.
      • Install required libraries
      ./droplet_init
      
      • Generate files with different sizes to prepare for the logging measurements, generate the droplet database, schedule logging measurement every minute and logging regression every 10th minute. (These parameters could be changed as needed.)
      python3 automate droplet.py
      

System resource profiler

  • Description: This Resource Profiler will get system utilization from all the nodes in the system. These information will then be sent to home node and stored into mongoDB.

  • Output: The information includes: IP address of each node, cpu utilization of each node, memory utilization of each node, and the latest update time.

  • How to run:

    • For working nodes:

      • copy the Resource_Profiler_server folder to each working node using scp.
      • In each node:
      python2 Resource_Profiler_server/install_package.py
      
    • For scheduler node:

      • copy Resource_Profiler_control folder to home node using scp.
      • if a node’s IP address changes, just update the Resource_Profiler_control/ip_path file
      • optional: inside Resource_Profiler_control folder:
      1
      2
      python2 install_package.py
      python2 jobs.py &
      
    • Note: the content of ip_path are several lines of working nodes’ IP address. So if a node’s IP address is changed, make sure to update the ip_path file.

Heft

  • Description: This HEFT implementation has been adapted/modified from [2].

  • Input: HEFT implementation takes a file of .tgff format, which describes the DAG and its various costs, as input. The first step is to construct this (input.tgff) file from the input files dag.txt, profiler_nodeNUM.txt. From circe/heft/ folder execute:

    python write_input_file.py
    
  • HEFT algorithm: This is the scheduling algorithm which decides where to run each task. It writes its output in a configuration file, needed in the next step by the run-time centralized scheduler. The algorithm takes input.tgff as an input and output the scheduling file configuration.txt. From circe/heft/ run:

    python main.py
    

Centralized scheduler with profiler

  • Centralized run-time scheduler. This is the run-time scheduler. It takes the configuration file configuration.txt, given by HEFT, the node information nodes.txt and orchestrates the execution of tasks on given nodes, and output the DAG output files in circe/centralized_scheduler/output/ folder. Inside circe/centralized_scheduler folder run:

    python3 scheduler.py
    
  • Wait several seconds and move input1.txt to apac_scheduler/centralized_scheduler/input/ folder (repeat the same for other input files).

  • Stopping the centralized run-time scheduler. Run:

    python3 removeprocesses.py
    

    This script will shh into every node and kill running processes, and kill the process on the master node.

  • If network conditions change, one might want to restart the whole application. This can be done by running:

    python3 remove_and_restart.py
    

    The first part of the script stops the system as described above. It then runs HEFT and restarts the centralized run-time scheduler with the new task-node mapping.

Run-time task profiler