[Bioclusters] bio data mirroring

David Allouche David.Allouche at toulouse.inra.fr
Tue Jan 15 12:20:48 EST 2008

Hi ,

concerning the concept :

BioMAJ (BIOlogie Mises A Jour) is a workflow engine dedicated to data 
synchronization and processing.The Software automates the update cycle 
and the supervision of the locally mirrored databank repository.

BioMAJ is designed to carry out update cycles for a data source. Each 
cycle has five stages.

*1.**Initialisation: *


The engine loads the properties file containing the workflow description 
and looks at the current status of the bank by running through the 
associated status file. After determining the bankÕs status, the 
application opens the full cycle or, if necessary, tries to finish the 
previously incomplete cycle (in the event of an error correction).




This is a sub-workflow run before the data update. It has the same 
properties as the post-processing part explained below. Its purpose is 
simple: to start tasks, controls and alerts prior to the rest of the 
updating process.



During this stage, the engine connects to the source and checks for new 
data compared to data already present locally. It determines the list of 
files to be downloaded and assigns a version name. It then carries out 
the download followed by extraction. Finally, it consolidates the data 
by producing a full version of the bank, adhering to the restrictions 
defined by the properties file.



During this stage, the motor runs the post-processing sub-workflow. The 
form can describe relatively complex workflows. It is a succession of 
task blocks that contain one or several sub-collections of meta-tasks 
that can themselves be made up of several processes. Each block is run 
in sequence. In a given block, the meta-tasks that make it up are run in 
parallel. In a meta-task, processes are run in sequence. If there is an 
error in a process, only the branch that it belongs to is stopped. This 
creates a Directed Acyclical Graph (DAG).



If the previous stages have completed without error, the application 
puts the new version of the source online. Then it deletes the obsolete 
versions and the temporary files produced when running the post-processes.


All stages in a session are written up into the status file. If there is 
an error, during the following session, the application will try to 
continue the session from the first erroneous stage of the previous 
session to complete the cycle. One cycle is associated with a data 
source. One or more sessions may be necessary to complete a cycle.

if you want more details , use the following url  :


be careful we did a mistake in the application packaging.
the Manuel included into to download is in French !
the full English documentation is available on the web site into the 
support pull down.


let me known if you have questions.

sincerly david

ps: more properties files ( i.e bank update cylce description ) are 
available in the web presentation (  ressources pulldown )

example of application results can be browse  on the following url :
The application is in production  on  the 3 parteners sites.
no major problem have been notified.

history of data processing (daily update ) is available on the following 
url :


there are daily generated with xslt from the xml statefiles proceed by 
the program.

Tony Travis a écrit :

>David Allouche wrote:
>>we are looking for  more beta-test for feed back. ( publication is going 
>>to be submit very soon).
>Hello, David
>I'm interested in beta-testing your software. Please let me know where I 
>can download it?
>	Tony.

More information about the Bioclusters mailing list