Ugo Brunel has more than 10 years’ experience working in the field of database management. He joined Bull in 2002 to develop a database center of competence within the outsourcing business. Ugo’s extensive knowledge of the major DBMSs (Oracle, PostgreSQL, MySQL, SQL Server and Progress) makes him an invaluable contributor to many customer projects. He is also responsible for ensuring the commercial readiness of the Group’s database migration product – one of the components of Bull’s Open Source services solution, Open Energy.
In an IT world that is increasingly open to RDBMSs, we are witnessing the emergence of a new battle between Open Source databases on the one hand, and proprietary databases distributed free of charge on the other. Of course, this battle is being governed by extremely stringent and restrictive rules in an international marketplace where the cost of DBMS licenses alone is in excess of $15 billion.
Free proprietary databases, also known as ‘Express’ version, are currently very promising but also quite limited in terms of the resources they offer.
Today even the biggest software publishers have entered the race towards the open world, and are offering free versions of their databases: the most interesting being Oracle – as the leading DBMS supplier in an increasingly competitive market – and SQL Server, that relies more than ever on Microsoft technology.
These two big players offer complete but lightweight and free ‘Standard Edition’ versions of the most advanced DBMS engines currently available, and as a result are expecting to attract a lot of customers.
The only weakness of these products – albeit a major one – is that although both the Oracle and the Microsoft ‘free’ offerings are based on state-of-the-art technology, they are really only designed for smaller organizations. It seems inconceivable they could have sufficient production capacity to meet the demands of larger organizations, given that they cannot support databases larger than 4Gbytes and their CPU and memory limitations have a negative impact on their performance. So the ‘free’ offerings from Microsoft, Oracle and even Sybase are, in effect, restricted products. There is a world of difference between a product that effectively has only a limited sphere of operation and a product that is both unlimited and free (MySQL and PostgreSQL).
This is why the most sensible solution for limiting costs while conserving maximum functionality and compatibility is to turn to Open Source databases like PostgreSQL and MySQL that exploit machine resources to the maximum of their capacity.
Meanwhile, debate and news in the DBMS arena is extremely lively, and publishers are very actively protecting their market shares, while taking the competitive challenge of the open software very seriously. Oracle’s recent acquisition of the InnoDB MySQL storage engine is recent proof of this.
Migrating an existing database?
While Open Source databases are naturally a very attractive choice for new projects, the question is a good deal more complex when your project involves migrating an existing database to an open database. Because, while there may be considerable benefits when it comes to lower support and maintenance costs, it is clearly vital to ensure that the migration will not result in any loss of functionality… and that the cost of the migration will not outweigh the potential savings to be made!
So how can you complete this kind of migration with total confidence? How can the process be automated as far as possible, both for qualification and development environments and production environments?
As part of its Open Energy range of services, Bull has decided to develop just these kinds of migration tools and methodologies for porting to PostgreSQL and MySQL. For Oracle migrations, Bull makes use of the ora2pg open application, among others, as part of its purpose designed OTP migration tool (having actively contributed to the development of ora2pg).
An example of migration: the WEBTEC project
By way of concrete example, we can look at the migration of a software application used by Bull called WEBTEC, which was ported from Oracle to PostgreSQL using OTP.
WEBTEC is a monitoring and planning capacity tool developed by Bull for its outsourcing business. It provides an overview of production activity, enabling improvements to service quality and productivity to be made. All data retrieved is archived and analyzed. The application relates to databases, operating systems, back-up applications, messaging systems, networks and knowledge bases. It gives us a daily summary of the information about the previous 24 hours of production, and as such it is clearly a tool that is central to our outsourcing activities.
PostgreSQL is closer to Oracle than MySQL, and it is totally Open Source. Hence the choice of PostgreSQL for this migration, even though MySQL may of course be an attractive alternative in many other instances.
In the context of this migration, the Oracle production database (11Gbytes of useful data) used by the Bull/WEBTEC software was initially ported successfully to PostgreSQL using OTP. The configuration required and the migration method are detailed in the appendix to this article.
The application software, initially written in PHP/Oracle, has also been re-written using PHP/Oracle syntax. This detailed but relatively straightforward task was necessary for WEBTEC since the pages were developed specifically for Oracle, with queries written directly in code without passing, for example, via a query pool file. Application software elements written in Java are often much quicker when it comes to this particular conversion.
Tests were carried out on Linux Red Hat and Windows XP platforms, showing that it was also possible to change the type of platform between the source and target server, since the exported files are simple text files.
The results of this migration enabled us on the one hand to validate the transfer and data migration mechanisms, and on the other, to limit the risks that the application software would malfunction as a direct or indirect result of the migration from Oracle to PostgreSQL.
From a functional point of view, the outcome proved to be undeniably positive.
Equivalents in PostgreSQL were found for all the types of data used during testing
Procedures stored could be manually adapted to the PL/PgSQL language without difficulty
Heavy SqlLoader load operations in the Oracle database were simply replaced by COPY FROM commands
LOB type objects (images, documents, …) were ported without any problem
Operating procedures (back-up, restore) were updated to iso functionalities.
Performance and reliability tests on PostgreSQL were carried out by Bull’s database center of competence at Echirolles in France, with summary tables listing key items (data volumes, number of users, ‘best practices’…)
The ROI also proved to be excellent, with the migration of the database and application software costing the equivalent of 20 person/days of effort: much less than the anticipated savings to be made on licensing and support.
Feedback from this experience shows that it really is possible to envisage moving away from proprietary components, but a feasibility audit must be carried out beforehand to verify compatibility between software applications, the necessary functionalities and the available Open Source components.
This migration will of course serve to enrich Bull’s experience and its Open Exchange solution, designed for porting towards Open Source . For more information
Migration tools used
The tool developed by Bull (OTP) for this migration is based on the ora2pg module and designed specifically to enable the module to be used in commercial contexts (with audit trails, flexibility, parallelism…).
Ora2pg, available under a GPL license, can be downloaded from http://www.samse.fr/GPL/ora2pg/. Bull is an active contributor to its development.
Ora2Pg functions as follows:
Connection to an Oracle database
Automated validation of database structure and objects in the database
Conversion of the database into PostgreSL syntax
Export to a PostgreSQL database.
List of exported SQL elements:
CONSTRAINTS: Unique, Primary, Foreign key, Check
FUNCTIONS, PROCEDURES, PACKAGES
RIGHTS (Grant, Role, User)
TABLESPACES (PostgreSQL v.8.x.x)
Data can be extracted from the Oracle base in two forms:
DATA : one line of INSERT per tuple
COPY: block of data.
They can be loaded in two ways:
As individual files using the psql command
Directly via a link to the PostgreSQL database.
Only the TABLE, INDEX, CONSTRAINTS and SEQUENCE elements are automatically converted to PostgreSQL.
Other types (VIEWS, TRIGGER, FUNCTIONS, PROCEDURES, PACKAGES and RIGHTS) need to be adapted manually for PostgreSQL since the PL/SQL language used by Oracle is significantly different from PL/PGSQL as used by PostgreSQL.
How OTP works:
OTP is the module that encapsulates and automates the use of ora2pg. Using OTP, Oracle tables can be migrated to PostgreSQL step by step, with each step producing its own trace files.
In addition, this means the data export and import steps can be carried out in parallel depending on the source and target server capacities (memory and number of CPUs), and integrity and index constraints only need to be established once data import is complete (data import also being faster and more flexible).
This module is also written in Perl. It has a unique configuration file for all the steps making up the migration.
Data are exported to a file in flat format, and not directly to the PostgreSQL database. In the event of an import problem, this means the export stage does not have to be re-initiated.
In addition to Perl 5, ora2pg and OTP require the following Perl modules:
DBD: Pg (optional)
Compress: Zlib (optional)
The module runs on all platforms supporting Perl, Oracle (client) and PostgreSQL (client).
Pre-requisites for the target platform
PostgreSQL is available on a great many operating systems, with a number of pre-requisites for its installation:
For UNIX and LINUX platforms:
GNU make is required; other make programs will not work
A C ISO/ANSI compiler is required. A recent version of GCC is recommended but PostgreSQL is known to be compatible with numerous compilers from diverse distributors
tar is required to unzip? the source distribution with either gzip or bzip2
The GNU Readline library will be used by default (under NetBSD, the libedit library is Readline compatible, and is used if the libreadline file is not found)
The zlib compression library will be used by default.
For Windows platforms:
PostgreSQL for the NT versions of Windows (such as Windows XP and 2003) can be constructed using MinGW
You can also construct PostgreSQL using Cygwin (old versions of Windows).
Optional packet configurations:
For PERL: shared libperl library and header files
For Python: Python installed with header files and the distutils module
For the procedure language PL/Tcl: Tcl must be installed
To activate native language support (NLS): an implementation of API Gettext is required. This is already included in certain operating systems (for example, Linux, NetBSD and Solaris)
For authentication or encryption: Kerberos, OpenSSL or PAM are required
When compiling from a CVS hierarchical tree: GNU Flex 2.5.4 and Bison 1.875 (or later versions) packeting solutions are required.
A list of which PostgreSQL functionalities are Oracle-compatible is available on request.