Contents
Introduction
See the RFE Automatic RunToolServer roster at SourceForge.
Roster Server Network Protocol
Procedures for the RunToolServer
RunToolServer startup / heartbeat
- arguments:
- host information
- port number
- arguments:
RunToolServer shutdown
- arguments:
- port number
- arguments:
Procedures for the Evaluator
- Get host list
- arguments:
- platform selection criteria
- results:
- list of host/port pairs
- arguments:
- Report bad host
- arguments:
- host/port pair
- reason (text string, error message)
- arguments:
Procedures for Administrative Utilities
- Administrator add to black-list
- arguments:
- host/port pair
- comment
- arguments:
- remove from black-list
- arguments:
- host/port pair
- arguments:
- get black-list
- results:
- list of
- host/port pair
- comment
- timestamp
- list of
- results:
- get full host list
- arguments:
- bool: send host info
- results:
- list of
- host/port pair
- host information (if requested)
- list of
- arguments:
RunToolServer implementation details
Sending a heartbeat requires a background thread and a global bool
- set global bool to true on any tool invocation
- heartbeat thread sets global bool to false before going to sleep
- if global bool is false when heartbeat thread wakes up, send heartbeat to roster server
Randomize the heartbeat sleep duration by some amount (10% of the configured about, 50% of the configured amount). This should avoid many heartbeats reaching the roster server at the same time.
There will be some configuration file setting specifying the host/port of the roster server. If no roster server is configured, the RunToolServer won't send the startup, heartbeat, or shutdown RPCs. It won't need to start the background heartbeat thread.
To send the shutdown RPC properly, we need some way to tell a RunToolServer to exit cleanly (after waiting for any running tools to finish). While waiting for tools to finish, the RunToolServer would have to refuse any new incoming tool requests.
Evaluator implementation details
If no roster server is configured, just use the hosts lists like we do currently.
If there is a roster server configured:
- Call it to get the hosts list when we first need to run a tool on a particular platform
Call it to report a bad RunToolServer if we get certain errors (connection refused, connection timed out, connection reset by peer)
Roster server implementation
Table<host_port, host_info> all_hosts;
Table<host_port, black_list_info> black_list;
- black_list_info
- comment
- timestamp
Table<platform_criteria, platform_hosts> requested_platforms;
- platform_criteria: matching criteria
- platform_hosts: host/port pairs that match the platform
Update requested_platforms on any RunToolServer startup/heartbeat call. This way the roster server can answer the "host list" call without having to match every host against the criteria each time. Since evaluators tend to ask for platforms that have been asked for before, this should be efficient.
Need config file settings for where to store the host list and black list on disk.
Have a background thread for writing the black list and host list to disk.
- Go to sleep for some small amount of time (e.g. 1 minute).
- Has the host list/black list changed since the last time we wrote it to disk? If not, do nothing.
- Has the host list/black list changed since the last time we woke up (e.g. 1 minute ago)? If not, write it out to disk now (i.e. wait for it to stop changing for one sleep period).
- Has it been a "long" time (e.g. 10 minutes) since the first change after the last time we wrote it to disk? If so, write it to disk now (i.e. don't put off writitng it to disk forever just because it keeps changing.)
Related Issues
More ways to partition the pool of RunTool machines
To partition a large pool of RunToolServers, we may want to add a "comment" field and the ability to match a platform on it
ScottVenier suggests adding arbitrarily many name/value pairs to match on:
- Project the machine belongs to
- Batch vs. interactive (i.e. don't send big jobs to machines used interactively)
DaveAndrews suggests partitioning based on time of day (i.e. don't use workstations during the day, but OK to use them at night)
SRPC Port Choice
The roster server could eliminate the need for having the RunToolServer on a known port. Let the OS assign a port, and have the RunToolServer tell the roster the port when it registers.
This could make things tricky for some administrative tasks.