With equipment of last generation, scientists linked to FAPESP Program for Research on Bioenergy (BIOEN) begin to decipher genome of sugar cane. Read the news Pontapé inicial (Authorship: Agência FAPESP).
In the last few years, the use of many high-throughput methods has exponentially increased the amount of data available to molecular geneticists. As a result, bioinformatics and efficient data management are essential in order to generate useful information. Furthermore, most of the available biological data are complex and stored in dozens of smaller databases (Lacroix and Critchlow, 2003). These databases are frequently not easy to identify and integrate, making the use of information very difficult because of the variety of semantics, interfaces, and data formats used by the underlying data sources.
The SUCEST-FUN Database has been created to manage sugarcane genome data and provide tools of interest for sugarcane functional genomicists and molecular breeders. This
database has been developed in the concept of the mediator approach that incorporates concepts from Data Warehouse and Federation approaches. It has flexible data integration to assemble heterogeneous distributed data sources, experimental data, resources, the applications of scientific algorithms and computational analysis.
The SUCEST-FUN Database provides an uniform conceptual schemas to minimize heterogeneous data representation through data and query translation techniques (Sujansky, 2001). The definition of common data dictionary is the first step to integrate these databases. The data dictionary is shared with all members of SUCEST-FUN and describes the means and domains of database attributes. In the second step, a format file based on XML language is used to interchange data packages between SUCEST-FUN members. The quality control of data packages is the third step and made in the data warehouse environment through data validation process. Finally, in the fourth step, the data loading using the incremental approach to become easier the manager of data warehouse. The databases were developed using MySQL Server (http://www.mysql.com) while the interface and search systems were based on the WebServer Apache (http://www.apache.org) and Tomcat 6.0 (http://tomcat.apache.org/).
This Joomla platform is developed using PHP (server-side HTML-embedded scripting language). Both Joomla and PHP are free software released under the GNU/GPL License. Joomla provides a more interactive website, which allows keyword searches and also manages the access control for groups of users. Moreover, Joomla has a toolkit that provides flexibility in the integration of scripts and programmes developed in other languages.
The tools and scripts developed and implemented in the website were produced using CGI (Simple Common Gateway Interface Class), PHP (http://www.php.net),
PERL (http://www.perl.org) and R (http://www.r-project.org/) a statistical environment. The data are available to the SUCEST-FUN community through the integration of other platforms such as the Gbrowse (Generic Genome Browse – http://www.gmod.org) and BioPerl (Stajich et al., 2002) modules.
The SUCEST-FUN Database assembles different sugarcane databases such as the Sugarcane Expressed Sequence Tags (SUCEST) Genome Project (http://sucest.lad.ic.unicamp.