Tuesday, May 22, 2007

Keeping up with new features

if there is one thing hard with learning oracle , its catching up with its pace of releasing new features & functionalities.

Recently I saw a plsql code some developer had written at a client place, to remove all unwanted characters from a varchar2 column and just to retain alphabets and numbers..

I was quick to suggest a simple "translate" function would do the trick and PLSQL call can be totally avoided..I learnt about translate function when it was Oracle version 7.3 and I should consider lucky enough that its still there in database kernel, working the same way.

here is a test table to demonstrate the functionality..

SQL> select * from ccs_test_trans ;

C1 C2
---------- ----------
1 12 34 560
2 abcdefgh
3 ABC#$%@(
4 *&^^!%%
5 12345678
6 a;p 27s
7 olka8762;;

7 rows selected.



Assume c2 is the column, which somehow has all extra characters (since varchar2 accepts everything) and needs little bit of cleansing work.

this is how I would have used translate function..
logic is..use nested translate function..
1) first translate - capture all unwanted characters
2) second translate - removes them..leaving only the wanted characters intact.

and then use a replace function to remove the extra spaces (since translate would have replaced all unwanted characters to space/any other character you specify)

SQL> ed
Wrote file afiedt.buf

1 With v1 as
2 (select
3 'abcdefghijklmnopqrstuvwxyz'||
4 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'||
5 '0123456789' retain_chrs
6 from dual)
7 select c1,c2,
8 replace(translate( c2,
9 translate( c2,
10 retain_chrs,
11 lpad( '*', length(retain_chrs), '*' )
12 ),
13 ' ' ), ' ', '') as new_c2
14 from ccs_test_trans ,v1

SQL> /

C1 C2 NEW_C2
---------- ---------- ----------
1 12 34 560 1234560
2 abcdefgh abcdefgh
3 ABC#$%@( ABC
4 *&^^!%%
5 12345678 12345678
6 a;p 27s ap27s
7 olka8762;; olka8762

7 rows selected.



Now coding that wasnt difficult..but understanding or explaining that to anyone new certainly is..because they need to understand how translate works..

but as shown above, it certainly works..and does the work needed..

The question is..do we really need such a complicated query ..isnt there a simple way of solving the same?

Lets see with 10g we have regular expressions..and Oracle allows all POSIX references in SQL. So all I want is to retain all characters which are either numbers or alphabets and remove the rest..

:alnum -> denotes exactly the same ; either alphabets or numbers
REGEXP_REPLACE-> does the replace functionality

SQL> ed
Wrote file afiedt.buf

1 With v1 as
2 (select
3 'abcdefghijklmnopqrstuvwxyz'||
4 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'||
5 '0123456789' retain_chrs
6 from dual)
7 select c1,c2,
8 replace(translate( c2,
9 translate( c2,
10 retain_chrs,
11 lpad( '*', length(retain_chrs), '*' )
12 ),
13 ' ' ), ' ', '') as new_c2,
14 REGEXP_REPLACE( c2, '[^[:alnum:]]' , '') as RE_c2
15 from ccs_test_trans ,v1

SQL> /
-- results

C1 C2 NEW_C2 RE_C2
---------- ---------- ---------- ----------
1 12 34 560 1234560 1234560
2 abcdefgh abcdefgh abcdefgh
3 ABC#$%@( ABC ABC
4 *&^^!%%
5 12345678 12345678 12345678
6 a;p 27s ap27s ap27s
7 olka8762;; olka8762 olka8762

7 rows selected.



How cool and simple is that?

Lets say I also want to allow comma and periods in the c2 column and they should be retained. (added two new rows with sample data to test)

SQL> ed
Wrote file afiedt.buf

1 With v1 as
2 (select
3 'abcdefghijklmnopqrstuvwxyz'||
4 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'||
5 '0123456789' ||
6 ',.' retain_chrs
7 from dual)
8 select c1,c2,
9 replace(translate( c2,
10 translate( c2,
11 retain_chrs,
12 lpad( '*', length(retain_chrs), '*' )
13 ),
14 ' ' ), ' ', '') as new_c2,
15 REGEXP_REPLACE( c2, '[^[:alnum:]{,.}]' , '') as RE_c2
16 from ccs_test_trans ,v1

SQL> /
-- results

C1 C2 NEW_C2 RE_C2
---------- ---------- ---------- ----------
1 12 34 560 1234560 1234560
2 abcdefgh abcdefgh abcdefgh
3 ABC#$%@( ABC ABC
4 *&^^!%%
5 12345678 12345678 12345678
6 a;p 27s ap27s ap27s
7 olka8762;; olka8762 olka8762
8 12,34,56.0 12,34,56.0 12,34,56.0
9 abc, def.g abc,def.g abc,def.g

9 rows selected.



I havent dealt a lot with REGULAR expressions..but it certainly looks an interesting area..if NOT performance, atleast its worth a try - for its readability and flexibility for future modifications.
I am sure you would agree the following SQL reads way too easy that the previous one (using translate) and running for 10+ lines.

SQL> select c1,c2,REGEXP_REPLACE( c2, '[^[:alnum:]{,.}]' , '') as RE_c2
2 from ccs_test_trans
3 /

C1 C2 RE_C2
---------- ---------- ----------
1 12 34 560 1234560
2 abcdefgh abcdefgh
3 ABC#$%@( ABC
4 *&^^!%%
5 12345678 12345678
6 a;p 27s ap27s
7 olka8762;; olka8762
8 12,34,56.0 12,34,56.0
9 abc, def.g abc,def.g

9 rows selected.





Happy SQL-ing until next..





No comments: