Main conceptions of the Sight workflow

The described features are fully supported since the version 3.0.0.

Sight agent. The reusable elementary unit (agent) executes a single remote or local algorithm, converting submitted request into received response.

Sight request consists of the multiple named items (fields), each storing a string value. They are normally corresponding to the fields, checkboxes and other controls in the web form that was used as the initial data to generate the agent. Differently from the request, the result of the bioinformatical web agent often needs to be an array of records. For example, a similarity search service returns multiple hits to the sequences in the database, a gene prediction program finds multiple genes in a DNA sequence, a tool for predicting transmembrane segments detects multiple transmembrane helixes. Hence Sight agent response is an array of records. These records also consists of the multiple named fields.

As the request and response format differs for each agent, the agents also contain explanatory data structures, defining these formats. For each request of response field they define its type, name and arbitrary comment. The request fields can also have the default value and the list of the other possible values.

Sight workflow. For any workflow, connection between two agents it is only possible if the result of the master agent can be converted to the request for the slave agent. In some systems require these two data structures should be identical or (like Decaf) the problem is left to solve for a programming user. Differently, Sight application generator produces a java code to create slave request from the master response. More exactly, the request is created using the master response, master request, the master of master response, the request that was sent to the master of master and so on, up till the workflow input data (Fig 1).

Fig. 1 Solution of the type conversion problem, illustrated for the case of the simple linear workflow, consisting of three agents A, B and C. The initial request for the agent A consist of 3 fields (only the field 2 is shown). This agent returns a request from two records. Each of them also have 3 fields. The request of the agent B consists from one field, and the value for this field is taken from the field 3 in the agent A result record. As the agent A have returned two records, two independent requests (a and b) for the agent B will be created (see dashed line). We suppose that for one of these two requests the agent B returned two records, each having 2 fields. Now, finally, the request to the agent C consists of 4 fields that must be filled by various values from the workflow. Field 1 is identical to the field from the agent B request, field 2 is identical to the field 2 from the agent B result record, field 3 is takes its value from the field 2 in the initial workflow request for agent A and finally the field 4 takes value from the field 2 in the agent B response record. As the agent B has returned two records to the request b, the two requests for the agent C will be created for this branch. However as the agent B also has another request (a), the total number of requests to the agent C depends from the number of records in the agent B response to its request a. If this response contains, for example, 3 records, the total number of the requests to agent C will be 3+2=5.

 

Fig. 2 Solution of the type conversion problem, illustrated for the case of the tree-like workflow. The agent B requires the field 3 from the master response record. The agent C requires the field 1 from the master response record, but it additionally needs the field 2 from the master request. As the master A has returned two records in response, both slave agents (B and C) receive the requests (a and b).

Loops. The circular workflows are used, for example, in building sequence similarity networks or in reconstruction of the metabolic pathways. The loops are realised with a pair of two communicating specialised agents: loop starter and loop closer. The loop starter just passes all its requests through. When the loop closer receives the request, it communicates the loop starter, initiating the additional “virtual request”. This “virtual request” is processed by the agents between the loop starter and loop closer and may initiate the subsequent new virtual requests. The loop is terminated when one of the agents between the starter and closer returns the empty response (no records) or when the maximal number of iterations is exceeded.

Fig 3. Sight loop conception. This figure illustrates a simple loop, where agent A is placed between the loop starter and loop closer. During the first iteration the loop starter sends one request to its slave agent A. As the agent A have returned two records in its result, the loop closer (slave agent for A) receives two requests and during the next iteration produces two virtual requests for the loop starter. The words “initial”, “a” and “b” are sample values and illustrate how the structures are converted during iterations. Loop agents can handle up to 3 loop variables.

Confluences. The confluence arises when two or more branches of the tree workflow must join together again, providing necessary data for a shared agent. Our solution for the type conversions for confluences is to process all possible combinations of the records in the two master agent responses. For example, if the workflow has branched and two slave agents have the shared slave-of-slave agent, and one of these two agents has returned 5 and another 2 records in response, it is possible to combine 10 different requests for the shared slave-of-slave agent.

Fig 4 The simple case of the confluence. The agent I is a master for two agents A and B. The agent C is a shared slave agent for A and B. If the agent I returns a single – record result (not shown), the agents A and B both receive a single request (not shown). Now, if response of both A and B contains two records, and the shared slave needs fields from both master agents, this workflow generates four requests for the agent C. Confluences are only supported in the new Sight 3.0.0 alpha version.

Storing the results.

The results of running the workflow must be stored for the subsequent viewing or analysis. For the systems with the fixed workflow the results are usually stored in the database. However each user-defined workflow usually needs a new database structure. It is difficult to implement a user-friendly interface for accessing these multiple different databases.The older versions of Sight stored the results in the html documents.

Taverna tried another approach, creating a complicated folder and subfolder structures on the local file system. In the new Sight version we implemented the averna tried another approach, creating a complicated folder and subfolder possibility to store the results in the form of network. The agent, responsible for storing the network, takes the names of the two nodes that must be connected. As the agent receives more and more requests, the number of currently existing nodes and connections increases. The created network can be viewed with the free bioinformatical graph viewer Sight also has a specialised group of agents (loggers) that just append the requests to CytoScape the local files. In this way the interesting information can be logged separately in FASTA or some other format.