seesaw Package¶
seesaw Package¶
ArchiveTeam seesaw kit
config Module¶
Configuration value manipulation.
-
class
seesaw.config.ConfigValue(name, title='', description='', default=None, editable=True, advanced=True)[source]¶ Bases:
objectConfiguration value validator.
The collection methods are useful for providing user configurable settings at run time. For example, when a pipeline file is executed by the warrior, the additional config values are presented in the warrior configuration panel.
-
collector= None¶
-
-
class
seesaw.config.NumberConfigValue(*args, **kwargs)[source]¶ Bases:
seesaw.config.ConfigValue
-
class
seesaw.config.StringConfigValue(*args, **kwargs)[source]¶ Bases:
seesaw.config.ConfigValue
-
seesaw.config.realize(v, item=None)[source]¶ Makes objects contain concrete values from an item.
A silly example:
class AddExpression(object): def realize(self, item): return = item['x'] + item['y'] pipeline = Pipeline(ComputeMath(AddExpression()))
In the example, we want to compute an addition expression. The values are defined in the Item.
event Module¶
Actor model.
externalprocess Module¶
Running subprocesses asynchronously.
-
class
seesaw.externalprocess.AsyncPopen(*args, **kwargs)[source]¶ Bases:
objectAsynchronous version of
subprocess.Popen.Deprecated.
-
class
seesaw.externalprocess.AsyncPopen2(*args, **kwargs)[source]¶ Bases:
objectAdapter for the legacy AsyncPopen
-
stdin¶
-
-
class
seesaw.externalprocess.CurlUpload(target, filename, connect_timeout='60', speed_limit='1', speed_time='900', max_tries=None)[source]¶ Bases:
seesaw.externalprocess.ExternalProcessUpload with Curl process runner.
-
class
seesaw.externalprocess.ExternalProcess(name, args, max_tries=1, retry_delay=30, accept_on_exit_code=None, retry_on_exit_code=None, env=None)[source]¶ Bases:
seesaw.task.TaskExternal subprocess runner.
-
class
seesaw.externalprocess.RsyncUpload(target, files, target_source_path='./', bwlimit='0', max_tries=None, extra_args=None)[source]¶ Bases:
seesaw.externalprocess.ExternalProcessUpload with Rsync process runner.
-
class
seesaw.externalprocess.WgetDownload(args, max_tries=1, accept_on_exit_code=None, retry_on_exit_code=None, env=None, stdin_data_function=None)[source]¶ Bases:
seesaw.externalprocess.ExternalProcessDownload with Wget process runner.
item Module¶
Managing work units.
-
class
seesaw.item.Item(pipeline, item_id, item_number, keep_data=False, prepare_data_directory=True, **kwargs)[source]¶ Bases:
seesaw.item.ItemDataA thing, or work unit, that needs to be downloaded.
It has properties that are filled by the
Task.An Item behaves like a mutable mapping.
Note
State belonging to a item should be stored on the actual item itself. That is, do not store variables onto a
Taskunless you know what you are doing.-
class
ItemState[source]¶ Bases:
objectState of the item.
-
canceled= 'canceled'¶
-
completed= 'completed'¶
-
failed= 'failed'¶
-
running= 'running'¶
-
-
class
Item.TaskStatus[source]¶ Bases:
objectStatus of happened on a task.
-
completed= 'completed'¶
-
failed= 'failed'¶
-
running= 'running'¶
-
-
Item.canceled¶
-
Item.completed¶
-
Item.end_time¶
-
Item.failed¶
-
Item.finished¶
-
Item.item_id¶
-
Item.item_number¶
-
Item.item_state¶
-
Item.pipeline¶
-
Item.start_time¶
-
Item.task_status¶
-
class
-
class
seesaw.item.ItemData(properties=None)[source]¶ Bases:
_abcoll.MutableMappingBase item data property container.
- Args:
properties (dict): Original dict on_property (Event): Fired whenever a property changes.
Callback accepts:
- self
- key
- new value
- old value
-
properties¶
pipeline Module¶
-
class
seesaw.pipeline.Pipeline(*tasks)[source]¶ Bases:
objectThe sequence of steps that complete a
Task.Your pipeline will probably be something like this:
- Request an assignment from the tracker.
- Run Wget to download the file.
- Upload the downloaded file with rsync.
- Tell the tracker that the assignment is done.
project Module¶
Project information.
-
class
seesaw.project.Project(title=None, project_html=None, utc_deadline=None)[source]¶ Bases:
objectBriefly describes a project metadata.
This class defines the title of the project, a short description with an optional project logo and an optional deadline. The information will be shown in the web interface when the project is running.
runner Module¶
Pipeline execution.
task Module¶
Managing steps in a work unit.
-
class
seesaw.task.ConditionalTask(condition_function, inner_task)[source]¶ Bases:
seesaw.task.TaskRuns a task optionally.
-
class
seesaw.task.LimitConcurrent(concurrency, inner_task)[source]¶ Bases:
seesaw.task.TaskRestricts the number of tasks of the same type that can be run at once.
-
class
seesaw.task.PrintItem[source]¶ Bases:
seesaw.task.SimpleTaskOutput the name of the
Item.
-
class
seesaw.task.SetItemKey(key, value)[source]¶ Bases:
seesaw.task.SimpleTaskSet a value onto a task.
-
class
seesaw.task.SimpleTask(name)[source]¶ Bases:
seesaw.task.TaskA subclassable
Taskthat should do one small thing well.Example:
class MyTask(SimpleTask): def process(self, item): item['my_message'] = 'hello world!'
tracker Module¶
Contacting the work unit server.
A Tracker refers to the Universal Tracker (https://github.com/ArchiveTeam/universal-tracker).
-
class
seesaw.tracker.GetItemFromTracker(tracker_url, downloader, version=None)[source]¶ Bases:
seesaw.tracker.TrackerRequestGet a single work unit information from the Tracker.
-
class
seesaw.tracker.PrepareStatsForTracker(defaults=None, file_groups=None, id_function=None)[source]¶ Bases:
seesaw.task.SimpleTaskApply statistical values on the item.
-
class
seesaw.tracker.SendDoneToTracker(tracker_url, stats)[source]¶ Bases:
seesaw.tracker.TrackerRequestInform the Tracker the work unit has been completed.
-
class
seesaw.tracker.TrackerRequest(name, tracker_url, tracker_command, may_be_canceled=False)[source]¶ Bases:
seesaw.task.TaskRepresents a request to a Tracker.
-
DEFAULT_RETRY_DELAY= 60¶
-
-
class
seesaw.tracker.UploadWithTracker(tracker_url, downloader, files, version=None, rsync_target_source_path='./', rsync_bwlimit='0', rsync_extra_args=[], curl_connect_timeout='60', curl_speed_limit='1', curl_speed_time='900')[source]¶ Bases:
seesaw.tracker.TrackerRequestUpload work unit results.
One of the inner task is used depending on the Tracker’s response to where to upload:
RsyncUploadCurlUpload
util Module¶
Miscellaneous functions.
-
seesaw.util.find_executable(name, version, paths, version_arg='-V')[source]¶ Returns the path of a matching executable.
See also
warrior Module¶
The warrior server.
The warrior phones home to Warrior HQ (https://github.com/ArchiveTeam/warrior-hq).
-
class
seesaw.warrior.BandwidthMonitor(device)[source]¶ Bases:
objectExtracts the bandwidth usage from the system stats.
-
devre= <_sre.SRE_Pattern object>¶
-
-
class
seesaw.warrior.Warrior(projects_dir, data_dir, warrior_hq_url, real_shutdown=False, keep_data=False)[source]¶ Bases:
objectThe warrior god object.
-
class
Status[source]¶ Bases:
object-
INVALID_SETTINGS= 'INVALID_SETTINGS'¶
-
NO_PROJECT= 'NO_PROJECT'¶
-
REBOOTING= 'REBOOTING'¶
-
RESTARTING_PROJECT= 'RESTARTING_PROJECT'¶
-
RUNNING_PROJECT= 'RUNNING_PROJECT'¶
-
SHUTTING_DOWN= 'SHUTTING_DOWN'¶
-
STARTING_PROJECT= 'STARTING_PROJECT'¶
-
STOPPING_PROJECT= 'STOPPING_PROJECT'¶
-
SWITCHING_PROJECT= 'SWITCHING_PROJECT'¶
-
UNINITIALIZED= 'UNINITIALIZED'¶
-
-
class
web Module¶
The warrior web interface.
-
class
seesaw.web.ApiHandler(application, request, **kwargs)[source]¶ Bases:
seesaw.web_util.BaseWebAdminHandlerProcesses API requests.
-
class
seesaw.web.IndexHandler(application, request, **kwargs)[source]¶ Bases:
seesaw.web_util.BaseWebAdminHandlerShows the index.html.
-
class
seesaw.web.ItemMonitor(item)[source]¶ Bases:
objectPushes item states and information to the client.
-
class
seesaw.web.SeesawConnection(session)[source]¶ Bases:
sockjs.tornado.conn.SockJSConnectionA WebSocket server that communicates the state of the warrior.
-
clients= set([])¶
-
instance_id= '31855-0.516627'¶
-
item_monitors= {}¶
-
project= None¶
-
runner= None¶
-
warrior= None¶
-
-
seesaw.web.start_runner_server(project, runner, bind_address='localhost', port_number=8001, http_username=None, http_password=None)[source]¶ Starts a web interface for a manually run pipeline.
Unlike
start_warrior_server(), this UI does not contain an configuration or project management panel.