Execution Model
Job Hierarchies
In GoCollaborate, Task serves as the minimum unit of the transaction to process, hence all operations will converge to a map of Tasks by the end of each Stage. However, the result set from an accomplished Task could be passed on to the next Stage as specified in Job. This enables both reading and writing data files to persistent memory for successive execution.
Job
A Job is a set of tasks to be processed when its handler is called
Attribute | Type | Description |
---|---|---|
JID | string | The automatically-generated hashtag for Job |
Stage | *Stage | The current stage where the Job proceeds to |
Front | *Stage | The next stage |
Back | *Stage | The previous stage |
Length | int | The length of Job in the count of Stage |
Stage
A Stage is a sub-set of Job where the tasks therein have no concurrent dependencies on one another.
Attribute | Type | Description |
---|---|---|
Previous | *Stage | The next stage |
Next | *Stage | The previous stage |
TaskSet | map[int]*Task | The task set to be executed |
Task
A Task is the minimal unit of computation on the given dataset.
Attribute | Type | Description |
---|---|---|
Type | TaskType | The timeout type of task |
Priority | TaskPriority | The priority of task |
Consumable | string | The task function signature |
Source | []Countable | The task source dataset |
Result | []Countable | The task result dataset |
Context | *TaskContext | The task context throughout execution time |
Stage | int | The Stage order of task |
TaskType
TaskType specifies the expected timeout type of each task to be executed, currently, ROUTINE and PERMANENT type are not supported.
Type | Default Timeout |
---|---|
SHORT | 500 ms |
LONG | 2000 ms |
ROUTINE | 0 ms |
PERMANENT | 0 ms |
TaskPriority
TaskPriority specifies the priority of each task, workers in the worker pool will pull tasks from Master based on this order.
Type | Priority (The higher will be processed in advance) |
---|---|
URGENT | 4 |
HIGH | 3 |
MEDIUM | 2 |
LOW | 1 |
BASE | 0 |
Function
A Function instance is a First-Class Function defined for task processing. It looks something like this:
func(source *task.Collection,
result *task.Collection,
context *task.TaskContext) bool {
// do whatever
}
Executors
The Executor is the base struct of task operation handlers such as Mapper, Reducer or Combiner, etc.
Mapper
A Mapper should always implement the Mapper Interface, where it maps the original task set to an incremental map of tasks.
Reducer
A Reducer should always implement the Reducer Interface, where it maps the original task set to a decremental map of tasks.
Shuffle
In most of the Java implementations of MapReduce algorithm, there is usually a separate Shuffle phase which is responsible for re-arranging the order of mapped dataset. In Golang, however, this is not mandatory. The range literal in Golang will automatically shuffle the existing map during runtime execution (reference).
Mechanisms
Non-Blocking Queue
The execution of tasks in GoCollaborate is non-blocking, meaning that you can still write an HTTP response while your previously-triggered tasks are still under processing. This will reduce the performance issue caused by a batch-processing task unit.
Project Structure Auto-mapping
To better organize your API exposure, the GoCollaborate framework automatically maps the routes to your project structure.
For example, the handlers defined in /core
directory are mapped to the route /core/handler_name
, by which you can focus on writing the handler of your task logic instead of considering unnecessary exposure structure.