Re: [Gusdev-gusdev] GUS to PostgreSQL

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

i don't have time right now to digest this, but...

hey, wow, this is great.

steve

Laurie Kramer wrote:

> Steve wrote on Feb 13:
>
>> ok, to follow on this with a new subject heading...
>>
>> For those groups that are gearing up to port GUS to PostgreSQL, here 
>> are some quick thoughts on getting started.
>>
>> The two main tasks, to start with, I think, are:
>>   - porting the schema creation scripts to postgres
>>   - remove the oracle dependencies from the perl object layer
>>
>> Porting the Schema
>> As mentioned below, one way to go here is using a system-neutral 
>> syntax (xml) to create the schemas.   I think this should be the long 
>> term goal.  But, for a quick solution, there must be some way for us 
>> to manually transform the existing oracle-specific syntax to 
>> postgres.   I envision some kind of ad hoc scripts to do the 
>> transform.   This shouldn't be too big a deal.
>>
>> I would say the first step is to identify what the transforms will be.
>>
>>
>> Object Layer
>> I just did a quick look in the object layer code (Perl).  (we can 
>> defer for now on the java object layer since its not in wide use).   
>> Only two files seem to use oracle system tables, both in 
>> $PROJECT_HOME/GUS/ObjRelP/lib/perl:  DbiTable.pm and 
>> DbiDatabase.pm.   There are about 10 separate sql queries that use 
>> the oracle system tables (listed below).   A quick fix would be to 
>> supply an Oracle.pm and a PostgreSQL.pm which have methods that 
>> provide the correct sql for those queries.   IF postgres has parallel 
>> tables, then this should be trivial.  if not...?    here are the 
>> tables used:
>>
>> all_sequences
>> all_cons_columns
>> all_constraints
>> all_tab_columns
>> all_views
>>  
>>
>
> I am joining this discussion late, but would like to get involved as
> soon as possible in order to get GUS up and running here at Princeton.
> on PostgreSQL 7.3 (is that what others are using?) Steve and Angel,
> have either of you looked into the ora2pg perl module?  It is at
> http://www.samse.fr/GPL/ora2pg/ and requires an Oracle connection
> in order to run.
>
> In response to Steve's identification of the Oracle metadata queries
> which need to be converted, I have completed and tested all of them
> and only have trouble with this one from DbiTable.pm:
>
>        select column_name, data_type, nullable, column_id,
>                data_precision, data_length, data_scale
>        from    all_tab_columns
>        where   table_name = '$table_name' and owner = '$owner'
>        order by column_id
>
> Here is my best attempt at a conversion:
>
>        select  a.attname, t.typname,
>                case when a.attnotnull then 'N' else 'Y' end as nullable,
>                a.attnum,
>                case when t.typlen < 1 then null else t.typlen end as 
> data_precision,
>                case when a.atttypmod < 0 then null else a.atttypmod-4 
> end  as data_length,
>                null as data_scale
>        from    pg_attribute a, pg_class c, pg_type t
>        where    a.attrelid = c.oid and c.relname = 'assay' and 
> a.atttypid = t.oid and a.attnum > 0
>        order by a.attnum;
>
> Using RAD3.assay, the Oracle results yield the following:
>
>
> COLUMN_NAME                   DATA_TYPE  NULLABLE col_id  precision 
> length  scale
> ----------------------------  ---------  -------- ------  --------- 
> ------  ----------
> ASSAY_ID                      NUMBER     N        1       8         
> 22      0
> ARRAY_ID                      NUMBER     N        2       4         
> 22      0
> PROTOCOL_ID                   NUMBER     Y        3       10        
> 22      0
> ASSAY_DATE                    DATE       Y        4       null      
> 7       null
> ARRAY_IDENTIFIER              VARCHAR2   Y        5       null      
> 100     null
> ARRAY_BATCH_IDENTIFIER        VARCHAR2   Y        6       null      
> 100     null
> OPERATOR_ID                   NUMBER     N        7       10        
> 22      0
> EXTERNAL_DATABASE_RELEASE_ID  NUMBER     Y        8       5         
> 22      0
> SOURCE_ID                     VARCHAR2   Y        9       null      
> 50      null
> NAME                          VARCHAR2   Y        10      null      
> 100     null
> DESCRIPTION                   VARCHAR2   Y        11      null      
> 500     null
> MODIFICATION_DATE             DATE       N        12      null      
> 7       null
> USER_READ                     NUMBER     N        13      1         
> 22      0
> USER_WRITE                    NUMBER     N        14      1         
> 22      0
> GROUP_READ                    NUMBER     N        15      1         
> 22      0
> GROUP_WRITE                   NUMBER     N        16      1         
> 22      0
> OTHER_READ                    NUMBER     N        17      1         
> 22      0
> OTHER_WRITE                   NUMBER     N        18      1         
> 22      0
> ROW_USER_ID                   NUMBER     N        19      12        
> 22      0
> ROW_GROUP_ID                  NUMBER     N        20      3         
> 22      0
> ROW_PROJECT_ID                NUMBER     N        21      4         
> 22      0
> ROW_ALG_INVOCATION_ID         NUMBER     N        22      12        
> 22      0
>
> while the PostgreSQL results yield:
>
>           attname            | typname | nullable | attnum | 
> data_precision | data_length | data_scale
> ------------------------------+---------+----------+--------+----------------+-------------+------------ 
>
> assay_id                     | int8    | N        |      1 
> |              8 |             |
> array_id                     | int4    | N        |      2 
> |              4 |             |
> protocol_id                  | int8    | Y        |      3 
> |              8 |             |
> assay_date                   | date    | Y        |      4 
> |              4 |             |
> array_identifier             | varchar | Y        |      5 
> |                |         100 |
> array_batch_identifier       | varchar | Y        |      6 
> |                |         100 |
> operator_id                  | int8    | N        |      7 
> |              8 |             |
> external_database_release_id | int4    | Y        |      8 
> |              4 |             |
> source_id                    | varchar | Y        |      9 
> |                |          50 |
> name                         | varchar | Y        |     10 
> |                |         100 |
> description                  | varchar | Y        |     11 
> |                |         500 |
> modification_date            | date    | N        |     12 
> |              4 |             |
> user_read                    | int2    | N        |     13 
> |              2 |             |
> user_write                   | int2    | N        |     14 
> |              2 |             |
> group_read                   | int2    | N        |     15 
> |              2 |             |
> group_write                  | int2    | N        |     16 
> |              2 |             |
> other_read                   | int2    | N        |     17 
> |              2 |             |
> other_write                  | int2    | N        |     18 
> |              2 |             |
> row_user_id                  | int8    | N        |     19 
> |              8 |             |
> row_group_id                 | int4    | N        |     20 
> |              4 |             |
> row_project_id               | int4    | N        |     21 
> |              4 |             |
> row_alg_invocation_id        | int8    | N        |     22 
> |              8 |             |
>
> The PostgreSQL numeric data types are different from those in Oracle 
> and this
> raises the issue of how to convert them.  There is no equivalent of 
> the NUMBER(10)
> or NUMBER(12) in the assay table above, for example, in PostgreSQL.  
> The common
> integer types are int2, int4 and int8.  There is a numeric type which 
> supports
> precision and scale but it is much slower than the integer types.  
> Hence, we
> *must* come to agreement on this conversion issue.  Do I hear any 
> volunteers?
>
> Here are the other queries, successfully converted.
> ============
> DbiTable.pm
> ============
> Oracle:
>
>        select    text
>        from    all_views
>        where    owner = '$owner' and
>                view_name = '$table'
>
>
> PostgreSQL:
>
>        select    definition
>        from    pg_views
>        where    schemaname = '$owner' and
>                viewname = '$table'
>
> -------------------------------------------
>
> Oracle:
>
>        select    ac.owner,                 accs.table_name,
>                accs.column_name,
>                accr.owner,
>                accr.table_name,
>                accr.column_name
>        from    all_cons_columns accr,
>                all_cons_columns accs,
>                all_constraints ac
>        where    accs.owner = '$owner' and
>                ac.table_name = '$table' and
>                ac.owner = '$owner' and
>                ac.constraint_type = 'R' and
>                accr.constraint_name = ac.r_constraint_name and
>                ac.r_owner = accr.owner and
>                accs.constraint_name = ac.constraint_name
>       PostgreSQL:
>
>        select  n1.nspname,
>                r1.relname,
>                a1.attname,
>                n2.nspname,
>                r2.relname,
>                a2.attname
>        from    pg_namespace n1,
>                pg_namespace n2,
>                pg_class r1,
>                pg_class r2,
>                pg_attribute a1,
>                pg_attribute a2,
>                pg_constraint c
>        where   r1.relname = '$table' and
>                n1.nspname = '$owner' and
>                r1.relnamespace = n1.oid and
>                r1.oid = c.conrelid and
>                r1.oid = a1.attrelid and
>                c.contype = 'f' and
>                c.confrelid = r2.oid and
>                r2.relnamespace = n2.oid and
>                c.conkey[1] = a1.attnum and
>                c.confkey[1] = a2.attnum and
>                a2.attrelid = r2.oid
>
> -------------------------------------------
>
> Oracle:
>
>        select    distinct acc.column_name
>        from    all_cons_columns acc, all_constraints ac
>        where    acc.owner = '$owner' and
>                ac.table_name = '$queryTable' and
>                ac.constraint_type = 'P' and
>                acc.constraint_name = ac.constraint_name
>
> PostgreSQL:
>
>        select    distinct a.attname           from    pg_namespace n,
>                pg_class r,
>                pg_attribute a,
>                pg_constraint c
>        where    n.nspname = '$owner' and
>                r.relname = '$table' and
>                n.oid = r.relnamespace and
>                r.oid = a.attrelid and
>                r.oid = c.conrelid and
>                c.contype = 'p' and
>                c.conkey[1] = a.attnum;
>
> ============
> DbiDatabase.pm
> ============
>
> Oracle:
>
>        select    sequence_owner,sequence_name
>        from    all_sequences
>
> PostgreSQL:
>
>        select    schemaname, relname
>        from    pg_statio_user_sequences;
>
>
> -------------------------------------------
>
> Oracle:
>
>        select    
> acon.owner,acc1.owner,acc1.table_name,acc1.column_name, 
> acon.table_name ,acc2.column_name
>        from    all_cons_columns acc1, all_constraints 
> acon,all_cons_columns acc2
>        where    acon.r_constraint_name = acc1.constraint_name and
>                acc1.owner = '$owner' and
>                acon.r_owner = acc1.owner and
>                acc2.constraint_name = acon.constraint_name and
>                acc2.owner = acon.owner
>               PostgreSQL:
>
>        select  n1.nspname,
>                n2.nspname,
>                r1.relname,
>                a1.attname,
>                r2.relname,
>                a2.attname
>        from    pg_namespace n1,
>                pg_namespace n2,
>                pg_class r1,
>                pg_class r2,
>                pg_attribute a1,
>                pg_attribute a2,
>                pg_constraint c
>        where   r1.nspname = '$owner' and
>                r1.relnamespace = n1.oid and
>                r1.oid = c.conrelid and
>                r1.oid = a1.attrelid and
>                c.contype = 'f' and
>                c.confrelid = r2.oid and
>                r2.relnamespace = n2.oid and
>                c.conkey[1] = a1.attnum and
>                c.confkey[1] = a2.attnum and
>                a2.attrelid = r2.oid
>
> The next topic to tackle is CLOBs.
>
>
> Laurie
>
> -------------------------------
> Laurie Kramer
> Department of Molecular Biology
> Princeton University
> lk...@mo...
> 609.258.5657
>
>
>
>
>
>
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> Gusdev-gusdev mailing list
> Gus...@li...
> https://lists.sourceforge.net/lists/listinfo/gusdev-gusdev