Introduction

See the RFE Automatic RunToolServer roster at SourceForge.

Roster Server Network Protocol

Procedures for the RunToolServer

  1. RunToolServer startup / heartbeat

    • arguments:
      • host information
      • port number
  2. RunToolServer shutdown

    • arguments:
      • port number

Procedures for the Evaluator

  1. Get host list
    • arguments:
      • platform selection criteria
    • results:
      • list of host/port pairs
  2. Report bad host
    • arguments:
      • host/port pair
      • reason (text string, error message)

Procedures for Administrative Utilities

  1. Administrator add to black-list
    • arguments:
      • host/port pair
      • comment
  2. remove from black-list
    • arguments:
      • host/port pair
  3. get black-list
    • results:
      • list of
        • host/port pair
        • comment
        • timestamp
  4. get full host list
    • arguments:
      • bool: send host info
    • results:
      • list of
        • host/port pair
        • host information (if requested)

RunToolServer implementation details

Sending a heartbeat requires a background thread and a global bool

Randomize the heartbeat sleep duration by some amount (10% of the configured about, 50% of the configured amount). This should avoid many heartbeats reaching the roster server at the same time.

There will be some configuration file setting specifying the host/port of the roster server. If no roster server is configured, the RunToolServer won't send the startup, heartbeat, or shutdown RPCs. It won't need to start the background heartbeat thread.

To send the shutdown RPC properly, we need some way to tell a RunToolServer to exit cleanly (after waiting for any running tools to finish). While waiting for tools to finish, the RunToolServer would have to refuse any new incoming tool requests.

Evaluator implementation details

If no roster server is configured, just use the hosts lists like we do currently.

If there is a roster server configured:

Roster server implementation

Table<host_port, host_info> all_hosts;

Table<host_port, black_list_info> black_list;

Table<platform_criteria, platform_hosts> requested_platforms;

Update requested_platforms on any RunToolServer startup/heartbeat call. This way the roster server can answer the "host list" call without having to match every host against the criteria each time. Since evaluators tend to ask for platforms that have been asked for before, this should be efficient.

Need config file settings for where to store the host list and black list on disk.

Have a background thread for writing the black list and host list to disk.

Related Issues

More ways to partition the pool of RunTool machines

To partition a large pool of RunToolServers, we may want to add a "comment" field and the ability to match a platform on it

ScottVenier suggests adding arbitrarily many name/value pairs to match on:

DaveAndrews suggests partitioning based on time of day (i.e. don't use workstations during the day, but OK to use them at night)

SRPC Port Choice

The roster server could eliminate the need for having the RunToolServer on a known port. Let the OS assign a port, and have the RunToolServer tell the roster the port when it registers.

This could make things tricky for some administrative tasks.