Execution Model

Job Hierarchies

In GoCollaborate, Task serves as the minimum unit of the transaction to process, hence all operations will converge to a map of Tasks by the end of each Stage. However, the result set from an accomplished Task could be passed on to the next Stage as specified in Job. This enables both reading and writing data files to persistent memory for successive execution.

Job

A Job is a set of tasks to be processed when its handler is called

Attribute Type Description
JID string The automatically-generated hashtag for Job
Stage *Stage The current stage where the Job proceeds to
Front *Stage The next stage
Back *Stage The previous stage
Length int The length of Job in the count of Stage

Stage

A Stage is a sub-set of Job where the tasks therein have no concurrent dependencies on one another.

Attribute Type Description
Previous *Stage The next stage
Next *Stage The previous stage
TaskSet map[int]*Task The task set to be executed

Task

A Task is the minimal unit of computation on the given dataset.

Attribute Type Description
Type TaskType The timeout type of task
Priority TaskPriority The priority of task
Consumable string The task function signature
Source []Countable The task source dataset
Result []Countable The task result dataset
Context *TaskContext The task context throughout execution time
Stage int The Stage order of task

TaskType

TaskType specifies the expected timeout type of each task to be executed, currently, ROUTINE and PERMANENT type are not supported.

Type Default Timeout
SHORT 500 ms
LONG 2000 ms
ROUTINE 0 ms
PERMANENT 0 ms

TaskPriority

TaskPriority specifies the priority of each task, workers in the worker pool will pull tasks from Master based on this order.

Type Priority (The higher will be processed in advance)
URGENT 4
HIGH 3
MEDIUM 2
LOW 1
BASE 0

Function

A Function instance is a First-Class Function defined for task processing. It looks something like this:

func(source *task.Collection,
        result *task.Collection,
        context *task.TaskContext) bool {
        // do whatever
}

Executors

The Executor is the base struct of task operation handlers such as Mapper, Reducer or Combiner, etc.

Mapper

A Mapper should always implement the Mapper Interface, where it maps the original task set to an incremental map of tasks.

Reducer

A Reducer should always implement the Reducer Interface, where it maps the original task set to a decremental map of tasks.

Shuffle

In most of the Java implementations of MapReduce algorithm, there is usually a separate Shuffle phase which is responsible for re-arranging the order of mapped dataset. In Golang, however, this is not mandatory. The range literal in Golang will automatically shuffle the existing map during runtime execution (reference).


Mechanisms

Non-Blocking Queue

The execution of tasks in GoCollaborate is non-blocking, meaning that you can still write an HTTP response while your previously-triggered tasks are still under processing. This will reduce the performance issue caused by a batch-processing task unit.

Project Structure Auto-mapping

To better organize your API exposure, the GoCollaborate framework automatically maps the routes to your project structure.

For example, the handlers defined in /core directory are mapped to the route /core/handler_name, by which you can focus on writing the handler of your task logic instead of considering unnecessary exposure structure.



results matching ""

    No results matching ""