In the last few years, the use of many high-throughput methods has exponentially increased the amount of data available to molecular geneticists. As a result, bioinformatics and efficient data management are essential in order to generate useful information. Furthermore, most of the available biological data are complex and stored in dozens of smaller databases (Lacroix and Critchlow, 2003). These databases are frequently not easy to identify and integrate, making the use of information very difficult because of the variety of semantics, interfaces, and data formats used by the underlying data sources.
The SUCEST-FUN Database has been created to manage sugarcane genome data and provide tools of interest for sugarcane functional genomicists and molecular breeders.
This database has been developed in the concept of the mediator approach that incorporates concepts from Data Warehouse and Federation approaches. It has flexible data integration to assemble heterogeneous distributed data sources, experimental data, resources, the applications of scientific algorithms and computational analysis.
The SUCEST-FUN Database provides an uniform conceptual schemas to minimize heterogeneous data representation through data and query translation techniques (Sujansky, 2001). The definition of common data dictionary is the first step to integrate these databases. The data dictionary is shared with all members of SUCEST-FUN and describes the means and domains of database attributes. In the second step, a format file based on XML language is used to interchange data packages between SUCEST-FUN members. The quality control of data packages is the third step and made in the data warehouse environment through data validation process. Finally, in the fourth step, the data loading using the incremental approach to become easier the manager of data warehouse. The databases were developed using MySQL Server (http://www.mysql.com) while the interface and search systems were based on the WebServer Apache (http://www.apache.org) and Tomcat 6.0 (http://tomcat.apache.org/). The main website uses the Joomla platform, as a Content Management System, for the applications.
This Joomla platform is developed using PHP (server-side HTML-embedded scripting language). Both Joomla and PHP are free software released under the GNU/GPL License. Joomla provides a more interactive website, which allows keyword searches and also manages the access control for groups of users. Moreover, Joomla has a toolkit that provides flexibility in the integration of scripts and programmes developed in other languages.
The tools and scripts developed and implemented in the website were produced using CGI (Simple Common Gateway Interface Class), PHP (http://www.php.net), PERL (http://www.perl.org) and R (http://www.r-project.org/) a statistical environment. The data are available to the SUCEST-FUN community through the integration of other platforms such as the Gbrowse (Generic Genome Browse – http://www.gmod.org) and BioPerl (Stajich et al., 2002) modules.
The SUCEST-FUN Database assembles different sugarcane databases such as the Sugarcane Expressed Sequence Tags (SUCEST) Genome Project (http://sucest.lad.ic.unicamp.) (Vettore et al., 2003), the Sugarcane Gene Index (SGI), the SUCAST and the SUCAMET Catalogues, which include expression data (http://sucest-fun.org), the GRASSIUS database (Yilmaz et al., 2009) and records of the agronomic, physiological and biochemical characteristics of sugarcane cultivars. This database is part of the SUCEST-FUN Regulatory Network Project (http://sucest-fun.org), which aims to study gene expression regulation through the use of tools that will allow a Systems Biology approach to the study of sugarcane. A web portal is under construction (http://sugarcanegenome.org) that will make available tools from the SUCEST-FUN Initiative to the SUGESI Consortium participants and publications thereof.