The Anti-Kyte
Installing a Gnome Desktop Extension in Ubuntu
To distract myself from England’s ongoing woes at the Cricket World Cup, I’ve been playing the simple-yet-addictive London Tube Map Memory Game .
Having not taken the tube for a few years now, it’s proving something of a challenge.
It doesn’t help that they’ve added quite a few stations recently.
Another test of my memory occurs on those rare occasions where I want to terminate a frozen window on my Gnome desktop.
There’s an excellent utility called xkill which does the job…and which I can never remember the name of !
Fortunately there is a Gnome Shell Extension which will add an icon to the gnome-panel at the top of the screen so
it’s only a mouse-click away…
Let’s start with a quick look at xkill in action :
Before we go any further, it’s probably a good idea to confirm which versions of Ubuntu and Gnome I currently have running.
For Ubuntu itself :
cat /etc/release Ubuntu 22.04.3 LTS \n \l
…and for Gnome…
gnome-shell --version GNOME Shell 42.9Installing Extensions Manager
One way to manage Gnome Shell Extensions is to install the appropriately named Extension Manager.
To do this, open a Terminal and run :
sudo apt install gnome-shell-extension-manager
Once installed, you should then find Extension Manager in Activities :

Open it and you’ll see

Click the Browse tab to search for extensions

I know that the Extension I want is called Force Quit and the author is megh. Extension Manager can’t find the Extension by it’s name, but it does if we search for the author

Clicking the Install Button and confirming when prompted causes the extension to be installed and activiated.
We can now see it in the Activity Bar :

Let’s give it a try
Well, that’s one less thing to remember.
Back to ignoring the cricket !
Cut-and-Paste-Driven-Development – Using utPLSQL to build a test harness in Oracle.
If you’re working on a Decision Support System (DSS) then sooner or later you’re likely to need to create or change a package which runs as part of a lengthy batch process.
In such cirumstances, it would be useful to have a good old-fashioned test harness to run your code so you can test it’s functionality without having to kick off the entire batch.
Fortunately, utPLSQL is not just useful when it comes to TDD, CI/CD pipelines and the like. It can easily be used for the purpose of creating a simple stand-alone unit test for a single package.
Having used utPLSQL for this purpose quite regularly, I’ve found that my test harnesses tend to follow the same basic pattern.
What follows is a walkthrough of a typical utPLSQL package I might construct for this purpose.
Obviously, it won’t be to everyone’s taste but hopefully, you’ll find something of use in it.
If you’re interested, you can find the code in this Github Repo.
Testing ApproachThe main purpose of these tests is to check that the program being tested produces a set result for a set input.
Each individual test is structured in four sequential phases :
- Setup – setup the input data to use in the test
- Execute – run the test and retrieve the output
- Validate – check that the output matches what we expect
- Teardown – reset the everything back to how it was before we ran the test.
Whilst tables in DSS applications can contain a large number of columns, our test records only need to contain values that are :
- mandatory in the table ( or possibly part of an RI constraint or Unique Key)
- relevant to the test we’re conducting
In terms of the values themselves, they are not usually constrained other than by the definition of the column into which they are being inserted.
Therefore, we may be able to use negative numbers for integer identifiers to avoid clashes with any pre-existing records.
Also there may be nothing to stop us using random strings for VARCHAR2 values. Alternatively, if you’re easily amused, you may craft your test values to follow a theme. Not that I’d ever do anything so frivolous…
The database I’m running on is Oracle 19c on Oracle Cloud (Free Tier).
The version of utPLSQL used here is 3.1.13, downloaded from the Project’s Github Repo
It was installed without the DDL trigger by connecting to my OCI instance as the Admin user and following the appropriate instructions here.
The Code we’ll be TestingThe package we want to test performs updates on this table :
drop table employees_core;
create table employees_core
(
-- Columns populated by initial load
load_id number not null,
employee_id number not null,
first_name varchar2(20),
last_name varchar2(25),
email varchar2(25),
phone_number varchar2(20),
hire_date date,
job_id varchar2(10),
salary number,
commission_pct number,
manager_id number,
department_id number,
-- Additional columns populated as part of enrichment process
-- Job details
job_title varchar2(50),
-- Department Details
department_name varchar2(50),
-- Enrichment status
record_status varchar2(25),
constraint employees_core_pk primary key (load_id, employee_id)
)
/
To do so, it queries these tables :
create table departments
(
department_id number not null,
department_name varchar2(30) not null,
manager_id number,
location_id number,
constraint departments_pk primary key ( department_id)
)
/
create table jobs
(
JOB_ID varchar2(10) not null,
JOB_TITLE varchar2(35) not null,
MIN_SALARY number,
MAX_SALARY number,
constraint jobs_pk primary key( job_id)
)
/
The package itself is :
create or replace package enrich_employees
is
procedure department_details( i_load_id in number);
procedure job_details( i_load_id in number);
end;
/
create or replace package body enrich_employees
is
procedure department_details( i_load_id in number)
is
-- Populate the Department Name.
-- Suspend the record if we don't get a match.
begin
merge into employees_core emp
using departments dept
on
(
emp.department_id = dept.department_id
and emp.load_id = i_load_id
and emp.record_status = 'PROCESS'
)
when matched then update
set emp.department_name = dept.department_name;
update employees_core
set record_status = 'DEPARTMENT_ID_NOT_FOUND'
where record_status = 'PROCESS'
and department_name is null;
commit;
end department_details;
procedure job_details( i_load_id in number)
is
-- Don't suspend if we don't get a match, just leave the job_title empty.
begin
merge into employees_core emp
using jobs j
on
(
emp.job_id = j.job_id
and emp.record_status = 'PROCESS'
and emp.load_id = i_load_id
)
when matched then update
set emp.job_title = j.job_title;
commit;
end job_details;
end;
/
Note that, as is common in such routines, commits are done immediately after potentially large DML statements in order to minimise the length of time that Undo space is in use.
This is significant as we’ll need to account for it explicitly in our tests.
For the DEPARTMENT_DETAILS procedure, we want to check what happens when :
- we find a match in the DEPARTMENTS table
- we don’t find a match
- the DEPARTMENT_ID on the core record is null
For JOB_DETAILS, the conditions to test are similar :
- RECORD_STATUS is not ‘PROCESS’
- we find a match in the JOBS table
- we don’t find a match
- the JOB_ID is null
From this list of scenarios, we can construct our test package specification :
create or replace package enrich_employees_ut
as
--%suite(enrich_employees_ut)
--%rollback(Manual)
--%test( Department Lookup Succeeds)
procedure department_is_found;
--%test( Department does not exist)
procedure department_not_found;
--%test( Department ID is null)
procedure department_is_null;
--%test( Status is not PROCESS)
procedure status_not_process;
--%test( Job Lookup Succeeds)
procedure job_is_found;
--%test( Job does not exist)
procedure job_not_found;
--%test( Job ID is null)
procedure job_is_null;
end enrich_employees_ut;
The annotations we’re using here are :
--%suite(enrich_employees_ut)
…which allows us to the option to group multiple test packages into the same suite should we want to.
--%rollback(Manual)
…which prevents the Framework attempting it’s default rollback behaviour ( rollback to a savepoint) which won’t work here due to the commits in the code we’re testing.
--%test
…which identifies and describes the tests themselves.
Creating a Stub for the test Package BodyWe may well want to write tests and run them one-at-a-time so that we can adjust them ( or, indeed the code we’re testing) as we go.
In such circumstances, a script such as this, which uses the already created test package specification to generate a file containing the ddl for the package body, might come in handy :
clear screen
set heading off
set lines 130
set pages 500
set feedback off
set verify off
column pkg new_value v_package_name noprint
select '&1' as pkg from dual;
spool '&v_package_name..pkb'
with package_skeleton as
(
select 1 as line, 'create or replace package body &v_package_name' as text from dual
union
select line,
case
when line < ( select max(line) from user_source where name = upper('&v_package_name') and type = 'PACKAGE')
then
replace (
replace(
replace(text, '%test('),
')')
,';', q'[ is begin ut.fail('Not yet written'); end;]')
else text
end as text
from user_source
where name = upper('&v_package_name')
and type = 'PACKAGE'
and line > 1
and
(
regexp_replace(text, '[[:space:]]') not like '--\%%' escape '\'
or
regexp_replace(text, '[[:space:]]') like '--\%test%' escape '\'
)
)
select text from package_skeleton order by line
/
In this case, the generated package body looks like this, and compiles straight away :
create or replace package body enrich_employees_ut
as
-- Department Lookup Succeeds
procedure department_is_found is begin ut.fail('Not yet written'); end;
-- Department does not exist
procedure department_not_found is begin ut.fail('Not yet written'); end;
-- Department ID is null
procedure department_is_null is begin ut.fail('Not yet written'); end;
-- Status is not PROCESS
procedure status_not_process is begin ut.fail('Not yet written'); end;
-- Job Lookup Succeeds
procedure job_is_found is begin ut.fail('Not yet written'); end;
-- Job does not exist
procedure job_not_found is begin ut.fail('Not yet written'); end;
-- Job ID is null
procedure job_is_null is begin ut.fail('Not yet written'); end;
end enrich_employees_ut;
I prefer to create global variables to hold the test data.
This is because each test is likely to use a similar data set so, this way, the variables only need to be declared once.
As we’re dealing with table rows here, I’m just declaring a single record variable for each table.
This makes it very simple to add columns that we want to populate as they’re pre-declared as part of the record type.
For this test, my globals are :
g_emp employees_core%rowtype; g_job jobs%rowtype; g_dept departments%rowtype; g_result employees_core%rowtype;
Next, we need a procedure to initialise these globals :
procedure set_globals
is
begin
-- start by setting the globals to the values required for the first test, which
-- I usually make the test for the most commonly expected behaviour
--
-- Values for Employees Core record
--
-- Making numeric values negative means that they are less likely to clash
-- with an existing sequence generated value
g_emp.load_id := -1;
g_emp.employee_id := -8;
-- However wide the table, we only have to populate mandatory columns, and
-- any columns we want for the tests...
g_emp.department_id := -64;
g_emp.record_status := 'PROCESS';
-- Job Id is a Varchar - no constraints on it other than length...
g_emp.job_id := 'WIZZARD';
--
-- Values for the Department Lookup ( starting with ones that we expect to find)
--
-- Values set independently of the EMPLOYEES_CORE values as we'll want to see what happens
-- if they DON't match, as well as if they do.
g_dept.department_id := -64;
g_dept.department_name := 'Cruel and Unusual Geography';
--
-- Values for the Job lookup
--
g_job.job_id := 'WIZZARD';
g_job.job_title := 'Professor';
end set_globals;
Then there is a procedure to create the test records using the global variable values.
This separation between initialising the globals and creating the records is needed so that we can “tweak” the values we use for each test as appropriate :
procedure setup_data
is
-- Populate the tables with our test data
begin
insert into employees_core values g_emp;
insert into departments values g_dept;
insert into jobs values g_job;
commit;
end setup_data;
This is followed by a procedure to retrieve the actual results of the program execution :
procedure fetch_results
is
cursor c_result is
select *
from employees_core
where load_id = g_emp.load_id
and employee_id = g_emp.employee_id;
begin
open c_result;
fetch c_result into g_result;
close c_result;
end fetch_results;
Finally, as we need to tidy up after we’ve finished, there’s a teardown procedure to remove any test records we’ve set up :
procedure teardown_data
is
-- Tidy up by removing the test data using unique values where possible.
begin
delete from employees_core
where employee_id = g_emp.employee_id
and load_id = g_emp.load_id;
delete from departments
where department_id = g_dept.department_id;
delete from jobs
where job_id = g_job.job_id;
commit;
end teardown_data;
You might think we’ve done quite a bit of typing without writing any tests. The payoff for this up-front effort becomes apparent when you start on the first test, as you’re pretty much just calling everything you’ve already written and only need to add a couple of expectations, and an exception handler :
-- Department Lookup Succeeds
procedure department_is_found
is
begin
-- Setup
set_globals;
setup_data;
-- Execute
enrich_employees.department_details(g_emp.load_id);
-- Get the actual results
fetch_results;
-- Validate
ut.expect( g_result.department_name).to_(equal(g_dept.department_name));
ut.expect( g_result.record_status).to_(equal('PROCESS'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
The exception handler is there to ensure that the teardown runs even if we hit an unexpected error.
This is more likely than usual as you’re developing the test code and (potentially) the code your testing iteratively when running these tests.
We can execute our test standalone :
set serverout on exec ut.run('enrich_employees_ut.department_is_found'); enrich_employees_ut Department Lookup Succeeds [.005 sec] Finished in .007019 seconds 1 tests, 0 failed, 0 errored, 0 disabled, 0 warning(s) PL/SQL procedure successfully completed.
The code for our second test is almost identical, apart from the change in one variable value and the expected results :
-- Department does not exist
procedure department_not_found
is
begin
-- Setup
set_globals;
-- Almost exactly the same as the first test excep...
g_emp.department_id := -4096;
setup_data;
-- Execute
enrich_employees.department_details(g_emp.load_id);
fetch_results;
-- Validate
ut.expect( g_result.department_name).to_(be_null());
ut.expect( g_result.record_status).to_(equal('DEPARTMENT_ID_NOT_FOUND'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
The rest of the test follow a broadly similar pattern.
The finished test package body looks like this :
create or replace package body enrich_employees_ut
as
-- Global variables for use in tests
g_emp employees_core%rowtype;
g_job jobs%rowtype;
g_dept departments%rowtype;
g_result employees_core%rowtype;
procedure set_globals
is
begin
g_emp.load_id := -1;
g_emp.employee_id := -8;
g_emp.department_id := -64;
g_emp.record_status := 'PROCESS';
g_emp.job_id := 'WIZZARD';
g_dept.department_id := -64;
g_dept.department_name := 'Cruel and Unusual Geography';
g_job.job_id := 'WIZZARD';
g_job.job_title := 'Professor';
end set_globals;
procedure setup_data
is
begin
insert into employees_core values g_emp;
insert into departments values g_dept;
insert into jobs values g_job;
commit;
end setup_data;
procedure fetch_results
is
cursor c_result is
select *
from employees_core
where load_id = g_emp.load_id
and employee_id = g_emp.employee_id;
begin
open c_result;
fetch c_result into g_result;
close c_result;
end fetch_results;
procedure teardown_data
is
-- Tidy up by removing the test data using unique values where possible.
begin
delete from employees_core
where employee_id = g_emp.employee_id
and load_id = g_emp.load_id;
delete from departments
where department_id = g_dept.department_id;
delete from jobs
where job_id = g_job.job_id;
commit;
end teardown_data;
-- Department Lookup Succeeds
procedure department_is_found
is
begin
-- Setup
set_globals;
setup_data;
-- Execute
enrich_employees.department_details(g_emp.load_id);
fetch_results;
-- Validate
ut.expect( g_result.department_name).to_(equal(g_dept.department_name));
ut.expect( g_result.record_status).to_(equal('PROCESS'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
-- Department does not exist
procedure department_not_found
is
begin
-- Setup
set_globals;
-- Almost exactly the same as the first test excep...
g_emp.department_id := -4096;
setup_data;
-- Execute
enrich_employees.department_details(g_emp.load_id);
fetch_results;
-- Validate
ut.expect( g_result.department_name).to_(be_null());
ut.expect( g_result.record_status).to_(equal('DEPARTMENT_ID_NOT_FOUND'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
-- Department ID is null
procedure department_is_null
is
begin
-- Setup
set_globals;
-- Again, just a single change required :
g_emp.department_id := null;
setup_data;
-- Execute
enrich_employees.department_details(g_emp.load_id);
fetch_results;
-- Validate
ut.expect( g_result.department_name).to_(be_null());
ut.expect( g_result.record_status).to_(equal('DEPARTMENT_ID_NOT_FOUND'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
-- Status is not PROCESS
procedure status_not_process
is
begin
-- Setup
set_globals;
-- This time set the status to prevent processing
g_emp.record_status := 'SUSPENDED';
setup_data;
-- Execute
enrich_employees.department_details(g_emp.load_id);
fetch_results;
-- Validate
ut.expect( g_result.department_name).to_(be_null());
ut.expect( g_result.record_status).to_(equal('SUSPENDED'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
-- JOB Lookup Succeeds
procedure job_is_found
is
begin
-- Setup
-- We can use the default values here
set_globals;
setup_data;
-- Execute
enrich_employees.job_details(g_emp.load_id);
fetch_results;
-- Validate
ut.expect( g_result.job_title).to_(equal(g_job.job_title));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
-- Job does not exist
procedure job_not_found
is
begin
-- Setup
set_globals;
g_emp.job_id := -32768;
setup_data;
-- Execute
enrich_employees.job_details(g_emp.load_id);
-- Get the actual results
fetch_results;
-- Validate
ut.expect( g_result.job_title).to_(be_null());
ut.expect( g_result.record_status).to_(equal('PROCESS'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
-- Job ID is null
procedure job_is_null
is
begin
-- Setup
set_globals;
g_emp.job_id := null;
setup_data;
-- Execute
enrich_employees.job_details(g_emp.load_id);
fetch_results;
-- Validate
ut.expect( g_result.job_title).to_(be_null());
ut.expect( g_result.record_status).to_(equal('PROCESS'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
end enrich_employees_ut;
To run all of the tests in a suite, I usually use a script like this, which handles recompilation of either the test package or the code unit being tested without raising all those pesky ORA-4068 errors :
exec dbms_session.modify_package_state(dbms_session.reinitialize); clear screen set serveroutput on size unlimited exec ut.run('enrich_employees_ut'); enrich_employees_ut Department Lookup Succeeds [.004 sec] Department does not exist [.003 sec] Department ID is null [.003 sec] Status is not PROCESS [.003 sec] Job Lookup Succeeds [.003 sec] Job does not exist [.003 sec] Job ID is null [.003 sec] Finished in .026568 seconds 7 tests, 0 failed, 0 errored, 0 disabled, 0 warning(s) PL/SQL procedure successfully completed.
Of course I will probably still need to run a batch at some point if I want to perform a full System Test and (optionally) a load test. For now however, I can be confident that my code is at least functionally correct.
A Performance Comparison between Oracle’s REGEXP_SUBSTR and SUBSTR functions
Oracle SQL offers support for POSIX regular expressions by means of a suite of native functions.

Whilst providing the ability for complex search conditions that regular expressions offer, REGEXP_SUBSTR has acquired a reputation for being a fair bit slower when compared to the standard SUBSTR function.
What I’m going to demonstrate here is :
- how SUBSTR seems generally to be faster than REGEXP_SUBSTR
- how the performance of REGEXP_SUBSTR can improve dramatically when used with INSTR
- REGEXP_SUBSTR performs better when it matches the start of a string
To start with though, well discover why you’ll never see a Sith Lord on Sesame Street ( hint : it’s the way they count in a Galaxy Far, Far Away)…
A Test TableWe have a two-column table that looks like this :
create table star_wars_numbers
(
id number generated always as identity,
episode varchar2(4000)
)
/
insert into star_wars_numbers( episode)
values('Four');
insert into star_wars_numbers( episode)
values('Four|Five');
insert into star_wars_numbers( episode)
values('Four|Five|Six');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven|Three-and-a-bit');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven|Three-and-a-bit|Three and a bit less of a bit');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven|Three-and-a-bit|Three and a bit less of a bit|Eight');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven|Three-and-a-bit|Three and a bit less of a bit|Eight|Nine');
commit;
Whilst the contents of the EPISODES column may be the result of arcane Sith ritual, Moof-Milker activity in the design process cannot be ruled out.
Either way, the EPISODES column contains up to 11 values, with multiple columns being separated by a ‘|’.
Extracting each value in SQLUsing SUBSTR and INSTR to separate out the values in EPISODES is, as ever, quite a bit of typing…
select
id,
episode,
substr
(
episode,
1,
case
when instr(episode, '|') > 0 then instr(episode, '|') -1
else length(episode)
end
) as "A New Hope",
case when instr(episode, '|', 1, 1) > 0 then
substr
(
episode,
instr(episode, '|', 1, 1) + 1, -- start
case
when instr(episode, '|', 1, 2) > 0 then instr(episode, '|', 1,2) -1
else length(episode)
end
-
instr(episode, '|', 1, 1)
)
end as "The Empire Strikes Back",
case when instr(episode, '|', 1, 2) > 0 then
substr( episode, instr(episode, '|', 1, 2) + 1,
case when instr(episode, '|', 1,3) > 0 then instr(episode, '|',1,3) - 1 else length(episode) end
-
instr(episode, '|', 1, 2))
end as "Return of the Jedi",
case when instr(episode, '|', 1, 3) > 0 then
substr( episode, instr(episode, '|', 1, 3) + 1,
case when instr(episode, '|', 1,4) > 0 then instr(episode, '|',1,4) - 1 else length(episode) end
-
instr(episode, '|', 1, 3))
end as "Phantom Menace",
case when instr(episode, '|', 1, 4) > 0 then
substr( episode, instr(episode, '|', 1, 4) + 1,
case when instr(episode, '|', 1,5) > 0 then instr(episode, '|',1,5) - 1 else length(episode) end
-
instr(episode, '|', 1, 4))
end as "Attack of the Clones",
case when instr(episode, '|', 1, 5) > 0 then
substr( episode, instr(episode, '|', 1, 5) + 1,
case when instr(episode, '|', 1,6) > 0 then instr(episode, '|',1,6) - 1 else length(episode) end
-
instr(episode, '|', 1, 5))
end as "Revenge of the Sith",
case when instr(episode, '|', 1, 6) > 0 then
substr( episode, instr(episode, '|', 1, 6) + 1,
case when instr(episode, '|', 1,7) > 0 then instr(episode, '|',1,7) - 1 else length(episode) end
-
instr(episode, '|', 1, 6))
end as "The Force Awakens",
case when instr(episode, '|', 1, 7) > 0 then
substr( episode, instr(episode, '|', 1, 7) + 1,
case when instr(episode, '|', 1,8) > 0 then instr(episode, '|',1,8) - 1 else length(episode) end
-
instr(episode, '|', 1, 7))
end as "Rogue One",
case when instr(episode, '|', 1, 8) > 0 then
substr( episode, instr(episode, '|', 1, 8) + 1,
case when instr(episode, '|', 1,9) > 0 then instr(episode, '|',1,9) - 1 else length(episode) end
-
instr(episode, '|', 1, 8))
end as "Solo",
case when instr(episode, '|', 1, 9) > 0 then
substr( episode, instr(episode, '|', 1, 9) + 1,
case when instr(episode, '|', 1, 10) > 0 then instr(episode, '|',1, 10) - 1 else length(episode) end
-
instr(episode, '|', 1, 9))
end as "The Last Jedi",
case when instr(episode, '|', 1, 10) > 0 then
substr( episode, instr(episode, '|', 1, 10) + 1,
case when instr(episode, '|', 1, 11) > 0 then instr(episode, '|',1, 11) - 1 else length(episode) end
-
instr(episode, '|', 1, 10))
end as "The Rise of Skywalker"
from star_wars_numbers
/
…although these days ( version 12c and later), we can save ourselves a bit of effort by using an inline function…
with function extract_episode( i_string in varchar2, i_pos in pls_integer)
return varchar2
is
begin
if i_pos = 1 then
return substr(i_string, 1, case when instr(i_string, '|') > 0 then instr(i_string, '|') - 1 else length(i_string) end);
end if;
return
case when instr(i_string, '|', 1, i_pos -1 ) > 0 then
substr( i_string, instr(i_string, '|', 1, i_pos -1) + 1,
case when instr(i_string, '|', 1, i_pos) > 0 then instr(i_string, '|',1, i_pos) - 1 else length(i_string) end
-
instr( i_string, '|', 1, i_pos - 1))
end;
end;
select
id, episode,
extract_episode( episode, 1) as "A New Hope",
extract_episode( episode, 2) as "Empire Strikes Back",
extract_episode( episode, 3) as "Return of the Jedi",
extract_episode( episode, 4) as "The Phantom Menace",
extract_episode( episode, 5) as "Attack of the Clones",
extract_episode( episode, 6) as "Revenge of the Sith",
extract_episode( episode, 7) as "The Force Awakens",
extract_episode( episode, 8) as "Rogue One",
extract_episode( episode, 9) as "Solo",
extract_episode( episode, 10) as "The Last Jedi",
extract_episode( episode, 11) as "The Rise of Skywalker"
from star_wars_numbers
/
Whether you find the equivalent regexp query more elegant or just horrible is somewhat subjective :
select
regexp_substr(episode, '[^|]+') as "A New Hope",
regexp_substr(episode, '[^|]+',1, 2) as "The Empire Strikes Back",
regexp_substr(episode, '[^|]+',1, 3) as "Return of the Jedi",
regexp_substr(episode, '[^|]+',1, 4) as "The Phantom Menace",
regexp_substr(episode, '[^|]+',1, 5) as "Attack of the Clones",
regexp_substr(episode, '[^|]+',1, 6) as "Revenge of the Sith",
regexp_substr(episode, '[^|]+',1, 7) as "The Force Awakens",
regexp_substr(episode, '[^|]+',1, 8) as "Rogue One",
regexp_substr(episode, '[^|]+',1, 9) as "Solo",
regexp_substr(episode, '[^|]+',1, 10) as "The Last Jedi",
regexp_substr(episode, '[^|]+',1, 11) as "The Rise of Skywalker"
from star_wars_numbers
/
Irrespective of aesthetic considerations, I should explain the regexp in use here, if only so I can understand it when I read this a few months from now :
[^|] - match any character that isn't pipe ("|") + - match one or more times
Next, we need to find out how the regexp function stacks up when it comes to performance, and we’re not going to find that with a mere 11 rows of data…
Performance Test SetupLet’s make some use of all that space in the Galaxy Far, Far Away…
create table galaxy (
id number generated always as identity,
episode varchar2(4000))
/
begin
for i in 1..100000 loop
insert into galaxy(episode)
select episode from star_wars_numbers;
commit;
end loop;
end;
/
exec dbms_stats.gather_table_stats(user, 'galaxy');
All of the following tests were run on Oracle 19c Enterprise Edition running on Oracle Linux.
Everything was run in a SQL*Plus session from the command line on the database server.
The queries were run in SQL*Plus with the following settings :
set lines 130 clear screen set autotrace trace set timing on
Each query was executed at least twice consecutively to ensure that results were not skewed by the state of the buffer cache.
It’s also worth noting that, I found no low-level explanation for the performance discrepancies between the two functions when trawling through trace files. Therefore, I’ve concentrated on elapsed time as the main performance metric in these tests.
Test 1 – All Fields extracted in the Select ClauseLet’s start with the SUBSTR function ( referred to as “Standard” henceforth) :
select
id,
substr
(
episode, -- input string
1, -- start position
case
when instr(episode, '|') > 0 then instr(episode, '|') -1
else length(episode)
end -- number of characters to extract
) as "A New Hope",
case when instr(episode, '|', 1, 1) > 0 then
substr
(
episode,
instr(episode, '|', 1, 1) + 1, -- start
case
when instr(episode, '|', 1, 2) > 0 then instr(episode, '|', 1,2) -1
else length(episode)
end
-
instr(episode, '|', 1, 1)
)
end as "The Empire Strikes Back",
case when instr(episode, '|', 1, 2) > 0 then
substr( episode, instr(episode, '|', 1, 2) + 1,
case when instr(episode, '|', 1,3) > 0 then instr(episode, '|',1,3) - 1 else length(episode) end
-
instr(episode, '|', 1, 2))
end as "Return of the Jedi",
case when instr(episode, '|', 1, 3) > 0 then
substr( episode, instr(episode, '|', 1, 3) + 1,
case when instr(episode, '|', 1,4) > 0 then instr(episode, '|',1,4) - 1 else length(episode) end
-
instr(episode, '|', 1, 3))
end as "Phantom Menace",
case when instr(episode, '|', 1, 4) > 0 then
substr( episode, instr(episode, '|', 1, 4) + 1,
case when instr(episode, '|', 1,5) > 0 then instr(episode, '|',1,5) - 1 else length(episode) end
-
instr(episode, '|', 1, 4))
end as "Attack of the Clones",
case when instr(episode, '|', 1, 5) > 0 then
substr( episode, instr(episode, '|', 1, 5) + 1,
case when instr(episode, '|', 1,6) > 0 then instr(episode, '|',1,6) - 1 else length(episode) end
-
instr(episode, '|', 1, 5))
end as "Revenge of the Sith",
case when instr(episode, '|', 1, 6) > 0 then
substr( episode, instr(episode, '|', 1, 6) + 1,
case when instr(episode, '|', 1,7) > 0 then instr(episode, '|',1,7) - 1 else length(episode) end
-
instr(episode, '|', 1, 6))
end as "The Force Awakens",
case when instr(episode, '|', 1, 7) > 0 then
substr( episode, instr(episode, '|', 1, 7) + 1,
case when instr(episode, '|', 1,8) > 0 then instr(episode, '|',1,8) - 1 else length(episode) end
-
instr(episode, '|', 1, 7))
end as "Rogue One",
case when instr(episode, '|', 1, 8) > 0 then
substr( episode, instr(episode, '|', 1, 8) + 1,
case when instr(episode, '|', 1,9) > 0 then instr(episode, '|',1,9) - 1 else length(episode) end
-
instr(episode, '|', 1, 8))
end as "Solo",
case when instr(episode, '|', 1, 9) > 0 then
substr( episode, instr(episode, '|', 1, 9) + 1,
case when instr(episode, '|', 1, 10) > 0 then instr(episode, '|',1, 10) - 1 else length(episode) end
-
instr(episode, '|', 1, 9))
end as "The Last Jedi",
case when instr(episode, '|', 1, 10) > 0 then
substr( episode, instr(episode, '|', 1, 10) + 1,
case when instr(episode, '|', 1, 11) > 0 then instr(episode, '|',1, 11) - 1 else length(episode) end
-
instr(episode, '|', 1, 10))
end as "The Rise of Skywalker"
from galaxy
/
Runnnig this query, we get :
1100000 rows selected. Elapsed: 00:00:20.32 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 35857676 bytes sent via SQL*Net to client 811886 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Before we look at the REGEXP_SUBSTR equivalent, it’s probably worth considering the more streamlined in-line function version of this query :
with function extract_episode( i_string in varchar2, i_pos in pls_integer)
return varchar2
is
begin
return
case when instr(i_string, '|', 1, i_pos -1 ) > 0 then
substr( i_string, instr(i_string, '|', 1, i_pos -1) + 1,
case when instr(i_string, '|', 1, i_pos) > 0 then instr(i_string, '|',1, i_pos) - 1 else length(i_string) end
-
instr( i_string, '|', 1, i_pos - 1))
end;
end;
select
id,
substr
(
episode,
1,
case
when instr(episode, '|') > 0 then instr(episode, '|') -1
else length(episode)
end
) as "A New Hope",
extract_episode( episode, 2) as "Empire Strikes Back",
extract_episode( episode, 3) as "Return of the Jedi",
extract_episode( episode, 4) as "The Phantom Menace",
extract_episode( episode, 5) as "Attack of the Clones",
extract_episode( episode, 6) as "Revenge of the Sith",
extract_episode( episode, 7) as "The Force Awakens",
extract_episode( episode, 8) as "Rogue One",
extract_episode( episode, 9) as "Solo",
extract_episode( episode, 10) as "The Last Jedi",
extract_episode( episode, 11) as "The Rise of Skywalker"
from galaxy
/
Whilst it’s a bit more readable, it’s also a lot slower :
1100000 rows selected. Elapsed: 00:00:41.76 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 35857676 bytes sent via SQL*Net to client 810042 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
REGEXP_SUBSTR however, takes slow to a new level…
select id,
regexp_substr(episode, '[^|]+') as "A New Hope",
regexp_substr(episode, '[^|]+',1, 2) as "The Empire Strikes Back",
regexp_substr(episode, '[^|]+',1, 3) as "Return of the Jedi",
regexp_substr(episode, '[^|]+',1, 4) as "The Phantom Menace",
regexp_substr(episode, '[^|]+',1, 5) as "Attack of the Clones",
regexp_substr(episode, '[^|]+',1, 6) as "Revenge of the Sith",
regexp_substr(episode, '[^|]+',1, 7) as "The Force Awakens",
regexp_substr(episode, '[^|]+',1, 8) as "Rogue One",
regexp_substr(episode, '[^|]+',1, 9) as "Solo",
regexp_substr(episode, '[^|]+',1, 10) as "The Last Jedi",
regexp_substr(episode, '[^|]+',1, 11) as "The Rise of Skywalker"
from galaxy
/
1100000 rows selected. Elapsed: 00:01:27.25 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 35857680 bytes sent via SQL*Net to client 809519 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Now see what happens when we give REGEXP_SUBSTR a little help from the humble INSTR (the “hybrid” query) :
select id,
regexp_substr(episode, '[^|]+') as "A New Hope",
decode( instr(episode, '|'), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,1) + 1)) as "The Empire Strikes Back",
decode( instr(episode, '|',1, 2), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,2) + 1)) as "Return of the Jedi",
decode( instr(episode, '|',1, 3), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,3) + 1)) as "The Phantom Menace",
decode( instr(episode, '|',1, 4), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,4) + 1)) as "Attack of the Clones",
decode( instr(episode, '|',1, 5), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,5) + 1)) as "Revenge of the Sith",
decode( instr(episode, '|',1, 6), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,6) + 1)) as "The Force Awakens",
decode( instr(episode, '|',1, 7), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,7) + 1)) as "Rogue One",
decode( instr(episode, '|',1, 8), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,8) + 1)) as "Solo",
decode( instr(episode, '|',1, 9), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,9) + 1)) as "The Last Jedi",
decode( instr(episode, '|',1, 10), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,10) + 1)) as "The Rise of Skywalker"
from galaxy
/
Yes, I have cheated a bit on the aesthetics here by using the more compact DECODE instead of CASE.
However, this does not affect the runtime of the query, which is a bit faster than the pure REGEXP_SUBSTR equivalent :
1100000 rows selected. Elapsed: 00:00:30.83 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 35857680 bytes sent via SQL*Net to client 810158 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processedTest 1 Results Summary Query MethodTime (secs)SUBSTR ( Standard)20.32REGEXP_SUBSTR with INSTR (Hybrid)30.83SUBSTR with in-line function41.76REGEXP_SUBSTR ( Regexp)87.25
The performance of the Hybrid approach does raise the question of how the REGEXP_SUBSTR compares when we’re just looking to extract a single field from a string, rather than all of them…
Test 2 – Extract the first field onlyIn this instance we’re just looking for the first field in EPISODES.
In this context, the hybrid approach doesn’t apply because we’re always starting our search at the start of the input string.
Starting, once again with the Standard approach :
select id,
substr
(
episode, 1,
case
when instr(episode, '|', 1,1) > 0 then instr(episode, '|', 1,1) -1
else length(episode)
end
) as "A New Hope"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:05.33 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25086067 bytes sent via SQL*Net to client 808790 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Now for REGEXP_SUBSTR :
select id,
regexp_substr( episode, '[^|]+') as "A New Hope"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:06.38 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25086067 bytes sent via SQL*Net to client 808868 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Interestingly, whilst it’s still not as fast, the Regexp runtime is only 20% longer than the Standard.
This contrasts markedly with the 430% longer that the Regexp took for our first test.
Test 2 Results Summary Query MethodTime (secs)Standard5.33Regexp6.38Does this relative performance hold true for any single field in the input string ?
Test 3 – Extract the last field onlyStarting with the Standard query :
select
id,
case when instr(episode, '|', 1, 10) > 0 then
substr( episode, instr(episode, '|', 1, 10) + 1,
case when instr(episode, '|', 1, 11) > 0 then instr(episode, '|',1, 11) - 1 else length(episode) end
-
instr(episode, '|', 1, 10))
end as "The Rise of Skywalker"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:05.44 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25686033 bytes sent via SQL*Net to client 808915 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Once again, the Regexp Query is much slower…
select
id,
regexp_substr(episode, '[^|]+',1, 11) as "The Rise of Skywalker"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:16.16 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25686033 bytes sent via SQL*Net to client 808888 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
…unless we use the hybrid approach…
select
id,
decode( instr(episode, '|',1, 10), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,10) + 1)) as "The Rise of Skywalker"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:05.60 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25686033 bytes sent via SQL*Net to client 808736 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processedTest 3 Results Summary Query MethodTime (secs)Standard 5.44Hybrid 5.60Regexp16.16 Conclusions
These tests seem to bear out the fact that, in general, SUBSTR offers better performance than REGEXP_SUBSTR.
However, the performance of REGEXP_SUBSTR can be greatly improved if used in conjunction with INSTR.
As always, the decision as to which function to use will depend on the specific circumstances you find yourself in.
Things that make you go VAAARRRGGGHHHH ! – ORA-38104 in a Stored Program Unit
I do have some empathy with Lauren James getting quite cross at work.
OK, it’s not as pressurised as striving to reach a World Cup Quarter Final, but I am known to become somewhat annoyed when Oracle presents me with :
ORA-38104: Columns referenced in the ON Clause cannot be updated
The error itself is reasonable enough. The thing is though, that the offending merge statement will not trigger a compile-time error if it’s in a Stored Program Unit. Oh no.
Instead, Oracle will let you carry on in blissful ignorance and only give you the bad news at runtime…
Let’s say we have a table called TEAM_AVAILABILITY :
create table team_availability
(
squad_number number,
first_name varchar2(250),
last_name varchar2(250),
available_flag varchar2(1)
)
/
insert into team_availability( squad_number, first_name, last_name)
values( 1, 'Mary', 'Earps');
insert into team_availability( squad_number, first_name, last_name)
values( 2, 'Lucy', 'Bronze');
insert into team_availability( squad_number, first_name, last_name)
values( 3, 'Niamh', 'Charles');
insert into team_availability( squad_number, first_name, last_name)
values( 4, 'Keira', 'Walsh');
insert into team_availability( squad_number, first_name, last_name)
values( 5, 'Alex', 'Greenwood');
insert into team_availability( squad_number, first_name, last_name)
values( 6, 'Millie', 'Bright');
insert into team_availability( squad_number, first_name, last_name)
values( 7, 'Lauren', 'James');
insert into team_availability( squad_number, first_name, last_name)
values( 8, 'Georgia', 'Stanway');
insert into team_availability( squad_number, first_name, last_name)
values( 9, 'Rachel', 'Daly');
insert into team_availability( squad_number, first_name, last_name)
values( 10, 'Ella', 'Toone');
insert into team_availability( squad_number, first_name, last_name)
values( 11, 'Lauren', 'Hemp');
insert into team_availability( squad_number, first_name, last_name)
values( 12, 'Jordan', 'Nobbs');
insert into team_availability( squad_number, first_name, last_name)
values( 13, 'Hannah', 'Hampton');
insert into team_availability( squad_number, first_name, last_name)
values( 14, 'Lotte', 'Wubben-Moy');
insert into team_availability( squad_number, first_name, last_name)
values( 15, 'Esme', 'Morgan');
insert into team_availability( squad_number, first_name, last_name)
values( 16, 'Jess', 'Carter');
insert into team_availability( squad_number, first_name, last_name)
values( 17, 'Laura', 'Coombs');
insert into team_availability( squad_number, first_name, last_name)
values( 18, 'Chloe', 'Kelly');
insert into team_availability( squad_number, first_name, last_name)
values( 19, 'Bethany', 'England');
insert into team_availability( squad_number, first_name, last_name)
values( 20, 'Katie', 'Zelem');
insert into team_availability( squad_number, first_name, last_name)
values( 21, 'Ellie', 'Roebuck');
insert into team_availability( squad_number, first_name, last_name)
values( 22, 'Katie', 'Robinson');
insert into team_availability( squad_number, first_name, last_name)
values( 23, 'Alessia', 'Russo');
commit;
We have a procedure that we can use to mark a player as being unavailable for selection :
create or replace procedure mark_player_unavailable( i_squad_number team_availability.squad_number%type)
is
begin
merge into team_availability
using dual
on
(
squad_number = i_squad_number
and available_flag is null
)
when matched then update
set available_flag = 'N';
end mark_player_unavailable;
/
When we create the procedure, everything seems fine…
Procedure MARK_PLAYER_UNAVAILABLE compiled
However, running it causes Oracle to do the equivalent of a VAR check, which results in :

As to why Oracle does this, other than to annoy me, I have no idea.
Then again, I’m not sure why I keep falling into this trap either.
To avoid this altogether, there are a couple of options…
WorkaroundsWe could simply use an update instead of a merge
create or replace procedure mark_player_unavailable( i_squad_number team_availability.squad_number%type)
is
begin
update team_availability
set available_flag = 'N'
where squad_number = i_squad_number
and available_flag is null;
end mark_player_unavailable;
/

However, if we still want to use a merge, then we need to use an inline view in the USING clause :
create or replace procedure mark_player_unavailable( i_squad_number team_availability.squad_number%type)
is
begin
merge into team_availability ta
using
(
select tau.squad_number
from team_availability tau
where tau.squad_number = i_squad_number
and tau.available_flag is null
) ilv
on
(
ta.squad_number = ilv.squad_number
)
when matched then update
set ta.available_flag = 'N';
end mark_player_unavailable;
/
…which has the same result.
With my equanimity restored I can now reflect that yes, Womens football is different from the Mens. In the Womens game, England DO win penalty shoot-outs !
Happy Hatters and Oracle Interval Partitioning
I wouldn’t say that Simon is obsessed about Luton Town FC, but he has been known to express his affection through the medium of creosote :

Needless to say, he’s quite looking forward to next season as the Hatters have completed their Lazarus-like resurrection from the depths of the Conference to the pinnacle of English football – the Premier League.
Over the years, I’ve occasionally accompanied Simon to Kennelworth Road to watch Luton battle their way through the divisions, so it’s only fitting that I should pay tribute to their achievement in the examples that follow.
The technical subject at hand here is the advantages available when using Interval Partitioning syntax to Range Partition a table by a date.
As we’re talking about partitioning, he’s the standard warning about licensing…
Partitioning is usually an additional cost option on the Enterprise Edition license. It’s a really good idea to make sure that whatever platform you’re on is appropriately licensed before you use this feature.
For this post, I’ll be using 23c Free, which does include Partitioning.
We’ll start by creating a conventional Range Partitioned table and see what happens when we try to create data in a non-existent partition.
We’ll then try the same thing, but this time using an Interval Partitioned Table.
Finally, we’ll take a look at the SQL functions available to define intervals when creating Interval Partitions.
Partitioning by Date using conventional Range PartitioningFor the purposes of this example, I’m going to create a table which holds details of Luton’s final record for a League Season. I’m going to arbitrarily “end” each season on 30th June each year.
I’m going to separate each year into it’s own partition.
Whilst this may not be a realistic use of partitioning – you’re unlikely to create a table to hold a single row per partition – it does simplify the examples that follow :
create table luton_seasons
(
season_end_date date,
competition varchar2(25),
games_played number,
games_won number,
games_drawn number,
games_lost number,
goals_for number,
goals_against number,
points number,
finishing_position number,
notes varchar2(4000)
)
partition by range( season_end_date)
(
partition season_2013_14 values less than (to_date ('01-JUL-2014', 'DD-MON-YYYY')),
partition season_2014_15 values less than (to_date ('01-JUL-2015', 'DD-MON-YYYY')),
partition season_2015_16 values less than (to_date ('01-JUL-2016', 'DD-MON-YYYY')),
partition season_2016_17 values less than (to_date ('01-JUL-2017', 'DD-MON-YYYY')),
partition season_2017_18 values less than (to_date ('01-JUL-2018', 'DD-MON-YYYY'))
)
/
Populating the table works as you’d expect…
-- 2013-14
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2014', 'DD-MON-YYYY'),
'Conference',
46, 30, 11, 5, 102, 35, 101,
1, 'Promoted back to the Football League !'
);
-- 2014-15
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2015', 'DD-MON-YYYY'),
'League 2',
46, 19, 11, 16, 54, 44, 68,
8, null
);
-- 2015-16
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2016', 'DD-MON-YYYY'),
'League 2',
46, 19, 9, 18, 63, 61, 66,
11, null
);
-- 2016-17
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2017', 'DD-MON-YYYY'),
'League 2',
46, 20, 17, 9, 70, 43, 77,
4, 'Lost in Promotion Play-Off Semi-Final'
);
-- 2017-18
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2018', 'DD-MON-YYYY'),
'League 2',
46, 25, 13, 8, 94, 46, 88,
2, 'Promoted'
);
commit;
select season_end_date, competition, finishing_position, notes
from luton_seasons
order by season_end_date
/
SEASON_END_DATE COMPETITION FINISHING_POSITION NOTES
--------------- ----------- -------------------- ----------------------------------------
01-JUN-2014 Conference 1 Promoted back to the Football League !
01-JUN-2015 League 2 8
01-JUN-2016 League 2 11
01-JUN-2017 League 2 4 Lost in Promotion Play-Off Semi-Final
01-JUN-2018 League 2 2 Promoted
5 rows selected.
…until we try to insert a record for which a partition does not exist…
-- 2018-19
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2019', 'DD-MON-YYYY'),
'League 1',
46, 27, 13, 6, 90, 42, 94,
1, 'Champions ! Promoted to The Championship'
);
…when we are met with :
ORA-14400: no partition found in table MIKE.LUTON_SEASONS for inserted partition key "2019-06-01T00:00:00"
Wouldn’t it be good if we could just get Oracle to create partitions automatically, as and when they were needed ? Well, we can…
Creating an Interval Partitioned Table.We’re going to re-create the table, but this time, we’re going to use Interval Partitioning :
drop table luton_seasons;
create table luton_seasons
(
season_end_date date,
competition varchar2(25),
games_played number,
games_won number,
games_drawn number,
games_lost number,
goals_for number,
goals_against number,
points number,
finishing_position number,
notes varchar2(4000)
)
partition by range( season_end_date)
interval (numtoyminterval(1, 'year'))
(
partition season_2013_14 values less than (to_date('01-JUL-2014', 'DD-MON-YYYY'))
)
/
We’re still partitioning by range. However, we’re now specifying an interval, in this case 1 year.
For now though, our chances of avoiding ORA-14400 don’t look good as we’ve only created one partition :
select partition_name, high_value
from user_tab_partitions
where table_name = 'LUTON_SEASONS'
order by 1;
PARTITION_NAME HIGH_VALUE
-------------------- ------------------------------------------------------------------------------------------
SEASON_2013_14 TO_DATE(' 2014-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
Side Note – In 23c Oracle have added HIGH_VALUE_CLOB and HIGH_VALUE_JSON to USER_TAB_PARTITIONS so you can access the partition HIGH_VALUE via a datatype that is more malleable than the LONG HIGH_VALUE column that’s been in this view since David Pleat was last Luton manager.
Anyhow, let’s give it a go :
insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2014', 'DD-MON-YYYY'), 'Conference', 46, 30, 11, 5, 102, 35, 101, 1, 'Promoted back to the Football League !' ); -- 2014-15 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2015', 'DD-MON-YYYY'), 'League 2', 46, 19, 11, 16, 54, 44, 68, 8, null ); -- 2015-16 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2016', 'DD-MON-YYYY'), 'League 2', 46, 19, 9, 18, 63, 61, 66, 11, null ); -- 2016-17 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2017', 'DD-MON-YYYY'), 'League 2', 46, 20, 17, 9, 70, 43, 77, 4, 'Lost in Promotion Play-Off Semi-Final' ); -- 2017-18 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2018', 'DD-MON-YYYY'), 'League 2', 46, 25, 13, 8, 94, 46, 88, 2, 'Promoted' ); -- 2018-19 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2019', 'DD-MON-YYYY'), 'League 1', 46, 27, 13, 6, 90, 42, 94, 1, 'Champions ! Promoted to The Championship' ); -- 2019-20 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2020', 'DD-MON-YYYY'), 'Championship', 46, 14, 9, 23, 54, 82, 15, 19, null ); -- 2020-21 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2021', 'DD-MON-YYYY'), 'Championship', 46, 17, 11, 18, 41, 52, 62, 12, null ); -- 2021-22 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2022', 'DD-MON-YYYY'), 'Championship', 46, 21, 12, 13, 63, 55, 75, 6, 'Lost in Promotion Play-Off Semi-Final' ); -- 2022-23 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2023', 'DD-MON-YYYY'), 'Championship', 46, 21, 17, 8, 57, 39, 80, 3, 'Won Play-Off Final and promoted to Premiership !' ); commit;
Oh look, that all worked :

…and the new partitions have been created automatically…
exec dbms_stats.gather_table_stats(user, 'LUTON_SEASONS');
select partition_name, high_value, num_rows
from user_tab_partitions t
where table_name = 'LUTON_SEASONS'
order by partition_position
/

We can further confirm that a particular partition holds just one record. For example :
select season_end_date, competition, finishing_position
from luton_seasons partition (SYS_P996);
…or, as we’re partitioning by date…
select season_end_date, competition, finishing_position
from luton_seasons partition for (to_date ('31-MAY-2019', 'DD-MON-YYYY'));
…confirms a single row in the partition…
SEASON_END_DATE COMPETITION FINISHING_POSITION ------------------------------ ---------------- ------------------ 01-JUN-2019 League 1 1 1 row selected.Interval Partitioining Clairvoyance
Now, you’re probably thinking that Interval partitioning will ensure that your table is equi-partitioned in terms of time periods, provided the value of the partition key is inserted sequentially.
But what happens if you enter dates out-of-sequence ?
For example :
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2025', 'DD-MON-YYYY'),
'Premier League',
38, 38, 0, 0, 112, 7, 114,
1, 'Clean sweep of trophies and a perfect season in one go !'
);
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2024', 'DD-MON-YYYY'),
'Premier League',
38, 20, 10, 8, 64, 40, 70,
4, 'Champions League here we come !'
);
commit;
Now, you might think that Oracle would create the partition for values less than 01-JUL-2025 on the creation of the first record and then assign the second record to that partition as it already exists.
However, it’s actually smart enough to create the “missing” partition for the second insert :
select season_end_date, competition, finishing_position
from luton_seasons partition for (to_date ('01-JUN-2024', 'DD-MON-YYYY'))
order by 1;
SEASON_END_DATE COMPETITION FINISHING_POSITION
------------------------------ ------------------- ------------------
01-JUN-2024 Premier League 4
1 row selected.
We can see that the table’s last partition by position has a lower system-generated number in it’s name than it’s predecessor, indicating that it was created first :
select partition_name, partition_position, high_value
from user_tab_partitions
where table_name = 'LUTON_SEASONS'
order by partition_position;

There are two functions that you can use when specifying an interval for your partitions :
Despite the camel-case formatting of these function names they are in-fact case insensitive when being used in SQL or PL/SQL.
NumToYMInterval lets you specify an interval in either ‘YEARS’ or ‘MONTHS’ :
select to_date('29-FEB-2020', 'DD-MON-YYYY') + numtoyminterval(4, 'year') as next_leap_year_day
from dual;
NEXT_LEAP_YEAR_DAY
------------------------------
29-FEB-2024
select to_date('01-JUL-2023', 'DD-MON-YYYY') + numtoyminterval(3, 'month') as start_q4_2023
from dual;
START_Q4_2023
------------------------------
01-OCT-2023
NumToDSInterval
NumToDSInterval allows you to specify more granular intervals – i.e :
- DAY
- HOUR
- MINUTE
- SECOND
How long is a football match ?
select numToDSInterval(5400, 'second') as match_length
from dual;
select numtodsinterval(90, 'minute') as match_length
from dual;
select numToDSInterval(1.5, 'hour') as match_length
from dual;
select numtodsinterval(1.5/24, 'day') as match_length
from dual;
Each of these queries return :
MATCH_LENGTH
------------------------------
+00 01:30:00.000000
Note that I’ve included the FROM clause in the above queries because, until the full 23c release sees the light of day, we’ll still need to use it in the day job.
Whatever you’re planning to do to pass the time until …
alter session set nls_date_format='DD-MON-YYYY HH24:MI';
select
to_date('12-AUG_2023', 'DD-MON-YYYY') + numtodsinterval(15, 'hour') as kick_off
from dual;
…using interval partitioning means that it should involve less manual partition creation.
The Oracle Database Features Site – reducing Oracle Documentation Drudgery
If there’s one thing Teddy approves of it’s eating you’re own Dog Food…although, he does appreciate the value of a varied diet…

Some nameless heroes in the Oracle Documentation Team seem to agree and have produced a publicly available APEX application which allows you to search:
- the features available in each successive Oracle release
- the various licensing options and packs available with each Edition
Time to head on over to the Oracle Database Features site and take a closer look…
FeaturesI’ve heard a fair bit about schema only accounts. I wonder when they came along ?

It looks like they have been around since 18c with further enhancements in later releases.
If I want details on a particular listing, I can just click on the title :

As you can see, there’s also a link to the relevant documentation.
It’s also possible to do a less specific search. For example, if I’m looking to upgrade from 11.2 to 19c and I want to see all of the new features in all of the intermediate releases :

The application also allows you to focus on areas of interest.
For example, If I want to see what’s new in 21c as far as Analytic Functions are concerned,
I can choose Data Warehousing/Big Data as my focus area then further refine my search by specifying Analytic Functions as the Sub Area of Interest.
Together with checking 21c as the Version and that I’m interested in New features only, the Application returns this :

It’s worth noting that the information in this application is based on the New Features documentation for the respective Oracle versions. Therefore, whilst you’ll find an entry for the VALIDATE_CONVERSION SQL function, introduced in 12.1…

Selecting the Licensing tab reveals functionality for navigating the often complex area of Oracle Licensing.
For example, if we want to find out which Editions Partitioning is available on :

Once again, further details are available by clicking on the title of a result row :

The Oracle Database Features site is a welcome augmentation to the existing documentation.
It’s categorised and searchable content offer the potential for significant saving in time and effort, especially when you’re preparing to move between Oracle versions.
The Ultimate Question of Life, the Universe, and… how big is my Oracle Table ?
At a time when the debate rages about how much you should trust what AI tells you, it’s probably worth recalling Deep Thought’s wildly incorrect assertion that the answer to the Ultimate Question of Life, the Universe, and Everything is forty-two.
As any Database specialist will know, the answer is the same as it is to the question “How big is my Oracle Table ?” which is, of course, “It depends”.
What it depends on is whether you want to know the volume of data held in the table, or the amount of space the database is using to store the table and any associated segments (e.g. indexes).
Connecting to my trusty Free Tier OCI Oracle Instance ( running 19c Enterprise Edition), I’ve set out on my journey through (disk) space to see if I can find some more definitive answers…
How big is my table in terms of the raw data stored in it ?Before going any further, I should be clear on the database language settings and character set that’s being used in the examples that follow. Note particularly that I’m not using a multi-byte character set :
select parameter, value from gv$nls_parameters order by parameter / PARAMETER VALUE ------------------------------ ------------------------------ NLS_CALENDAR GREGORIAN NLS_CHARACTERSET AL32UTF8 NLS_COMP BINARY NLS_CURRENCY £ NLS_DATE_FORMAT DD-MON-YYYY NLS_DATE_LANGUAGE ENGLISH NLS_DUAL_CURRENCY € NLS_ISO_CURRENCY UNITED KINGDOM NLS_LANGUAGE ENGLISH NLS_LENGTH_SEMANTICS BYTE NLS_NCHAR_CHARACTERSET AL16UTF16 NLS_NCHAR_CONV_EXCP FALSE NLS_NUMERIC_CHARACTERS ., NLS_SORT BINARY NLS_TERRITORY UNITED KINGDOM NLS_TIMESTAMP_FORMAT DD-MON-RR HH24.MI.SSXFF NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH24.MI.SSXFF TZR NLS_TIME_FORMAT HH24.MI.SSXFF NLS_TIME_TZ_FORMAT HH24.MI.SSXFF TZR 19 rows selected.
Now, let see if we can work out how much raw data is held in a table.
We’ll start with a very simple example :
create table marvin as
select 1000 + rownum as id
from dual
connect by rownum <= 1024;
Marvin may have a brain the size of a planet but his tabular namesake has more modest space requirements.
It’s 1024 records are all 4 digits long.
Therefore, the size of the table data should be 4096 bytes, right ?
“Hang on”, your thinking, “why not just lookup the size in USER_SEGMENTS and make this a really short post ?”
Well :
select bytes from user_segments where segment_name = 'MARVIN' and segment_type = 'TABLE' / BYTES ---------- 65536
USER_SEGMENTS will give you a size in bytes, but it’s not the same as the amount of raw data.
We’ll come back to this in a bit.
For now though, we can cross-check the size from elsewhere in the data dictionary, provided the stats on the table are up-to-date.
To ensure that this is so, I can run :
exec dbms_stats.gather_table_stats('MIKE', 'MARVIN');
This will ensure that statistics data is populated in the USER_TABLES view. This means that we can estimate the data volume by running the following query :
select num_rows, avg_row_len,
num_rows * avg_row_len as data_in_bytes
from user_tables
where table_name = 'MARVIN'
/
…which returns…
NUM_ROWS AVG_ROW_LEN DATA_IN_BYTES ---------- ----------- ------------- 1024 4 4096
That looks promising.
We can further verify this by running :
select sum(length(id)) as data_in_bytes from marvin / DATA_IN_BYTES ------------- 4096
OK, now let’s see what happens with a slightly more complex data set, and an index as well …
create table hitchikers
(
id number generated always as identity,
character_name varchar2(250),
quote varchar2(4000),
constraint hitchikers_pk primary key (id)
)
/
declare
procedure ins( i_character in varchar2, i_quote in varchar2)
is
begin
insert into hitchikers( character_name, quote)
values(i_character, i_quote);
end;
begin
for i in 1..1024 loop
ins('Deep Thought', 'Forty-Two');
ins('Trillian', q'[we have normality... anything you can't cope with is, therefore, your own problem]');
ins('Ford Prefect', 'Time is an illusion. Lunchtime doubly so.');
ins('Zaphod Beeblebrox', q'[If there's anything more important than my ego around, I want it caught and shot right now!]');
ins(null, 'Anyone who is capable of getting themselves made President should on no account be allowed to do the job');
ins('Marvin', q'[Life! Loathe it or hate it, you can't ignore it.]');
ins('Arthur Dent', 'This must be Thursday. I never could get the hang of Thursdays');
ins('Slartibartfast', q'[I'd rather be happy than right any day]');
end loop;
commit;
end;
/
commit;
Once stats are present on the table, we can check the expected data size as before :
select num_rows, avg_row_length, num_rows * avg_row_length as data_in_bytes from user_tables where table_name = 'HITCHIKERS' / NUM_ROWS AVG_ROW_LEN DATA_IN_BYTES ---------- ----------- ------------- 8192 75 614400
This time, the size in bytes figure we get back is not exact, as we can confirm with :
select sum( length(id) + nvl(length(character_name),0) + nvl(length(quote), 0)) as data_in_bytes from hitchikers / DATA_IN_BYTES ------------- 598957
To verify the actual size in bytes, we can dump the contents of a table into a csv file. In this case, I’m using SQLDeveloper :

The resulting file is a different size again :
ls -l hitchikers.csv -rw-rw-r-- 1 mike mike 656331 May 13 11:50 hitchikers.csv
This can be accounted for by the characters added as part of the csv formatting.
First, the csv file includes a header row :
head -1 hitchikers.csv "ID","CHARACTER_NAME","QUOTE"
Including the line terminator this is 30 bytes :
head -1 hitchikers.csv |wc -c
30
The format in each of the 8192 data rows includes :
- a comma after all but the last attribute on a row
- a line terminator after the last attribute
- double quotes enclosing each of the two VARCHAR attributes.
For example :
grep ^42, hitchikers.csv
42,"Trillian","we have normality... anything you can't cope with is, therefore, your own problem"
That’s a total of 7 extra bytes per data row.
Add all that up and it comes to 57374 bytes which are a consequence of csv formatting.
Subtract that from the file size and we get back to the calculated data size we started with :
656331 - 57374 = 598957
This confirms that the figures in USER_TABLES are approximate and you’ll need to bear this in mind if you’re relying on them to calculate the size of the data in a table.
Whilst were here, let’s see what effect compression might have on our ability to determine the raw data size.
We can do this by creating a table that has the same structure as HITCHIKERS and contains the same data, but which is compressed :
create table magrathea
(
id number,
character_name varchar2(250),
quote varchar2(4000),
constraint magrathea_pk primary key (id)
)
row store compress advanced
/
insert into magrathea( id, character_name, quote)
select id, character_name, quote
from hitchikers
/
commit;
exec dbms_stats.gather_table_stats(user, 'MAGRATHEA');
It turns out that, for the purposes of our raw data calculation, the effect of table compression is…none at all :
select num_rows, avg_row_len, num_rows * avg_row_len from user_tables where table_name = 'MAGRATHEA' / NUM_ROWS AVG_ROW_LEN NUM_ROWS*AVG_ROW_LEN ---------- ----------- -------------------- 8192 75 614400
However, if you look at the number of blocks used to store the table, the effects of compression are more evident :
select table_name, blocks from user_tables where table_name in ('HITCHIKERS', 'MAGRATHEA') order by 2 / TABLE_NAME BLOCKS ------------------------------ ---------- MAGRATHEA 20 HITCHIKERS 95
Incidentally, it’s worth noting that, as with the data size, the number of blocks reported in USER_TABLES are somewhat approximate.
USER_SEGMENTS reports the number of blocks for each table as :
select segment_name, blocks from user_segments where segment_name in ('HITCHIKERS', 'MAGRATHEA') order by 2 / SEGMENT_NAME BLOCKS ------------------------------ ---------- MAGRATHEA 24 HITCHIKERS 104
So it looks like compression will affect the amount of database space required to store an object but not the size of the actual data. This brings us nicely on to…
How big is my table in terms of the amount of space it’s taking up in the database ?Let’s go back to MARVIN. Remember, this table contains 4K of raw data, but USER_SEGMENTS claims that it’s quite a bit larger :
select bytes from user_segments where segment_name = 'MARVIN' and segment_type = 'TABLE' / BYTES ---------- 65536
To understand how Oracle has come up with this figure, you need to consider that :
- the smallest unit of space that Oracle addresses is measured in blocks
- the size of these blocks is defined at tablespace level.
- any object that uses space is allocated that space in units of an extent – which is a number of contiguous blocks.
If we take a look at MARVIN, we can see that the table resides in the DATA tablespace and has been allocated a single extent of 8 blocks :
select tablespace_name, bytes, blocks, extents from user_segments where segment_name = 'MARVIN'; TABLESPACE_NAME BYTES BLOCKS EXTENTS ------------------------------ ---------- ---------- ---------- DATA 65536 8 1
The block size is defined at the tablespace level and is held in USER_TABLESPACES in bytes :
select block_size from user_tablespaces where tablespace_name = 'DATA'; BLOCK_SIZE ---------- 8192
If we now multiply the number of blocks in the table by the size of those blocks, we get back to the size that USER_SEGMENTS is reporting :
select seg.blocks * tsp.block_size from user_segments seg inner join user_tablespaces tsp on seg.tablespace_name = tsp.tablespace_name where seg.segment_name = 'MARVIN'; SEG.BLOCKS*TSP.BLOCK_SIZE ------------------------- 65536
MARVIN is a table with no ancillary segments, such as indexes.
To find the total space being used for the HITCHIKERS table, we’ll also need to consider the space being taken up by it’s index, HITCHIKERS_PK :
select seg.segment_name, seg.segment_type, seg.blocks, ts.block_size, seg.bytes from user_segments seg inner join user_tablespaces ts on ts.tablespace_name = seg.tablespace_name where seg.segment_name in ('HITCHIKERS', 'HITCHIKERS_PK') / SEGMENT_NAME SEGMENT_TYPE BLOCKS BLOCK_SIZE BYTES -------------------- --------------- ---------- ---------- ---------- HITCHIKERS TABLE 104 8192 851968 HITCHIKERS_PK INDEX 24 8192 196608
…in other words…
select sum(seg.bytes) from user_segments seg where seg.segment_name in ('HITCHIKERS', 'HITCHIKERS_PK') / SUM(SEG.BYTES) -------------- 1048576
On the subject of ancillary segments, what about LOBS ?
create table the_guide(
id number generated always as identity,
message clob);
declare
v_msg clob;
begin
for i in 1..1000 loop
v_msg := v_msg||q'[Don't Panic!]';
end loop;
insert into the_guide(message) values( v_msg);
end;
/
commit;
Unlike other segment types, LOBSEGMENT and LOBINDEX segments do not have their parent tables listed as the SEGMENT_NAME in USER_SEGMENTS.
Therefore, we need to look in USER_LOBS to identify it’s parent table for a LOBSEGMENT and USER_INDEXES for a LOBINDEX :
select segment_name, segment_type, bytes, blocks from user_segments where( segment_name = 'THE_GUIDE' or segment_name in ( select segment_name from user_lobs where table_name = 'THE_GUIDE' ) or segment_name in ( select index_name from user_indexes where table_name = 'THE_GUIDE' ) ) / SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS ------------------------------ --------------- ---------- ---------- THE_GUIDE TABLE 65536 8 SYS_IL0000145509C00002$$ LOBINDEX 65536 8 SYS_LOB0000145509C00002$$ LOBSEGMENT 1245184 152
In this instance, although the table segment itself is only taking up 65536 bytes, when you add in all of the supporting objects, the total space requirement increases to 1376256 bytes.
If you’ve managed to make this far then meet me at the Big Bang Burger Bar for a Pan Galactic Garble Blaster. I need a drink after that.
(You Gotta) Fight for Your Right (To Query !) – the easy way to use External Tables with CSV files
Since the days when dinosaurs roamed the Earth – and I realised that programming was indoor work with little physical effort – the CSV file has been an ETL workhouse in the world of Data Warehousing.
In the case of Oracle, working with CSVs is somewhat simplified by the use of External Tables.
That’s not to say there aren’t challenges, especially when things don’t quite go as expected and you have to wade through various operating system files looking for exactly why some or all of the records in a file have not been loaded.
As enthusiastic as I am about the joys of the linux command-line, the prospect of grappling with sed, awk and regexp to diagnose data issues does not appeal.
I’d much rather use SQL. After all, that’s what it’s designed for.
Fortunately, for me, external tables make it relatively simple to upload the contents of a text file directly into a database table, from where SQL can be used to :
- validate the structure and contents of the file
- transform the data into it’s target data types
- identify any problems with the data we’ve received
As you may have guessed from the title (and the age of the author), the data we’re using has a bit of a Beastie Boys theme.
We have a file containing details of Beastie Boys singles that have charted in the UK :
row_type,track_name,peak_chart_position,album,release_date "","She's On It",10,,19860912, "","(You Gotta) Fight for Your Right (To Party!)",11,"Licensed to Ill",, "","No Sleep till Brooklyn",14,"Licensed to Ill",19870301, "","Girls",34,"Licensed to Ill",19870506, "","Hey Ladies",76,"Paul's Boutique",19890725, "","Pass the Mic",47,"Check Your Head",19920407, "","Jimmy James",55,"Check Your Head",19920828, "","Sabotage",19,"Ill Communication",19940128, "","Sure Shot",27,"Ill Communication",19940602, "","Intergalactic",5,"Hello Nasty",19980602, "","Remote Control/Three MC's and One DJ",21,"Hello Nasty",19990123 "","Alive",28,,, "","Ch-Check It Out",8,"To the 5 Boroughs",20040503, "","Triple Trouble",37,"To the 5 Boroughs",, "","An Open Letter to NYC",38,"To the 5 Boroughs",, "TRAILER",15
I want to load this data into my application tables on an Oracle 19c database.
One thing that often gets overlooked when considering a CSV file is that, whatever datatype the fields started out as (or indeed, are intended to end up as), in the file they are all merely text strings.
This can make the design of the external table we’re using to read this file quite simple in terms of datatypes…
create table beastie_xt (
row_type varchar2(4000),
track_name varchar2(4000),
peak_chart_position varchar2(4000),
album varchar2(4000),
release_date varchar2(4000)
)
organization external (
type oracle_loader
default directory dbshare
access parameters (
fields csv without embedded
missing field values are null
reject rows with all null fields
)
location('beastie.csv')
)
reject limit unlimited
/
Note that using the csv keyword in the access parameters clause, causes the external table to assume that the delimiter character is a “,” and that fields may be enclosed by double-quotes (“)
If we test the table now, we can see that all of the rows in the file are loaded, including the header and trailer records :

Now, we could do all of our data interrogation and manipulation using the external table. However, each time the table is queried, the file is read in it’s entirety and the fact recorded in the external table’s associated logfile.
As well as being a bit slow for large files, this does cause the logfile to grow rather rapidly.
As an alternative to this then, I’m going to use the external table only to read the file as it’s contents is loaded into a permanent database segment :
create table beastie_stg(
file_name varchar2(500),
file_row_number number,
row_type varchar2(4000),
track_name varchar2(4000),
peak_chart_position varchar2(4000),
album varchar2(4000),
release_date varchar2(4000))
/
insert into beastie_stg
select
'beastie.csv' as file_name,
rownum as file_row_number,
row_type,
track_name,
peak_chart_position,
album,
release_date
from beastie_xt
/
commit;
Now we have our data in a permanent segment, we can validate and load it into the application proper.
Remember, because both our External Table and Staging table definitions are quite permissive, we can load the entire file, including it’s header and trailer records.
To start with, let’s check that the file contains the all of the fields we expect, in the order in which we expect them :
-- Expected ordered field list
select listagg( lower(column_name), ',') within group( order by column_id)
from user_tab_columns
where table_name = 'BEASTIE_XT'
minus
-- Actual ordered field list
select
row_type
||','||track_name
||','||peak_chart_position
||','||album
||','||release_date
from beastie_stg
where row_type = 'row_type';
If this query returns any rows then there is a mismatch between the header record and what we’re expecting so we’ll want to investigate.
In this instance though, we’re OK.
Next we need to make sure that we’ve received all of the data records. We can see that the trailer record is saying we should have 15 data records in the file. Note that, because we’re still treating everything as a string, the count in the trailer record is actually in the TRACK_NAME column of our staging table :
select track_name as data_record_count from beastie_stg where row_type = 'TRAILER' and file_name = 'beastie.csv'; DATA_RECORD_COUNT ------------------ 15
We can verify this by simply counting the records that we’ve staged, excluding the header and trailer…
select count(*) from beastie_stg where row_type is null and file_name = 'beastie.csv' / COUNT(*) ---------- 15
Finally, we can load our data records into our core table, converting them to their target datatypes in the process.
The core table looks like this :
create table beastie_core (
file_name varchar2(500),
file_row_number number,
track_name varchar2(500),
peak_chart_position number,
album varchar2(500),
release_date date)
/
If we want to load all of the valid records whilst handling any that we cannot transform correctly, there are several options available. In this case we’re going to use an error logging table.
To create the error logging table itself…
exec dbms_errlog.create_error_log('beastie_core');
Now we can populate the core application table using the LOG ERRORS clause to direct any failing records to the error table :
insert into beastie_core
select
file_name,
file_row_number,
track_name,
to_number(peak_chart_position) as peak_chart_position,
album,
to_date(release_date, 'YYYYMMDD')
from beastie_stg
where row_type is null -- just load the data records
log errors('beastie.csv')
reject limit unlimited
/
commit;
Once the core load is completed, we can check that we’ve loaded all of the records…
select count(*) from beastie_core where file_name = 'beastie.csv'; COUNT(*) ---------- 15
…and confirm that there are no errors…
select count(*) from err$_beastie_core where ora_err_tag$ = 'beastie.csv'; COUNT(*) ---------- 0
Our file has been successfully validated and loaded, all without the need for any fiendishly clever regular expressions.
Loading data into Oracle directly from compressed or enrcypted files
Whilst it’s not uncommon to transfer data between systems by means of text files, the files themselves often turn-up in a binary format.
They may have been compressed or even encrypted before transit.
Turning them back into text so that they can be processed may be a bit of an overhead.
Not only can you end up with two copies of the data ( the binary original and the re-constituted text version), the process of conversion may be both time consuming and resource intensive.
In the case of encrypted files, persisting the unencrypted data in a file may have additional security implications.
Fortunately, it’s possible to load data from such binary formatted files into Oracle without first having to write it to a text file.
Irrespective of whether your incoming feed file is enrcypted or merely compressed, loading it into Oracle should be effortless for you after reading this.
You need not worry about any potential pitfalls because I’ve already fallen into each pit in turn, as I shall now recount.
To start with, I’ll be looking at how to use an External Table Preprocessor to load data from a compressed file.
I’ll then go through loading data that’s GPG encrypted.
Following that, we’ll take a look at why PDB_OS_CREDENTIAL might not be the help you hoped it might be when dealing with GPG decryption and how SQL*Loader can help.
Whilst I was writing this, Oracle considerately released 23c Free and made it available in a VirtualBox appliance running Oracle Linux Server 8.7, so it’d be rude not to use it for the examples that follow…
SetupTo start with I’ve created a new OS user called app_user and a group called app_os.
I’ve made this the primary group for app_user as well as adding oracle to the group.
sudo useradd -m app_user
sudo groupadd app_os
sudo usermod -g app_os app_user
sudo usermod -a -G app_os oracle
NOTE – for this change to take effect for oracle, I had to restart the VM.
Next, I’ve created some directories under the app_user home and granted appropriate permissions.
The app_user home is visible to the group :
cd $HOME ls -ld drwxr-x---. 7 app_user app_os 180 Apr 8 11:55 .
The bin directory will hold any shell scripts we need to execute.
The upload directory will hold feed files to be loaded into the database.
Note that oracle will have access to execute these through the group, but will not be able to write to them :
drwxr-x---. 2 app_user app_os 27 Apr 8 12:16 bin drwxr-x---. 2 app_user app_os 50 Apr 9 10:36 upload
Finally, we have a directory to hold any log files generated. Obviously, oracle does need write access to this directory :
drwxrwx---. 2 app_user app_os 54 Apr 8 14:09 logs
In the Database, I’ve created directory objects on each of these directories.
Note that home on Oracle Linux Server is a symbolic link to /opt/oracle/userhome, so we need to use the physical path in the Directory Object definition :
create or replace directory app_bin as '/opt/oracle/userhome/app_user/bin'
/
create or replace directory app_log as '/opt/oracle/userhome/app_user/logs'
/
create or replace directory app_upload as '/opt/oracle/userhome/app_user/upload'
/
…and granted the appropriate permissions to the HR database user :
grant read, execute on directory app_bin to hr;
grant read, write on directory app_log to hr;
grant read on directory app_upload to hr;
I’ve created a csv file containing all of the records in the employees table.
In SQLCL this is accomplished by connecting to the database as HR and running :
set sqlformat csv
spool employees.csv
select 'DATA' as row_type,
emp.*
from employees
/
spool off
I’ve also tweaked the row count at the end of the file to make it look like a trailer record you might expect to see on a feed file…
TRA,107Coming gundun with gunzip
I’ve got a compressed version of the file in the uploads directory :
-rw-r--r--. 1 app_user app_os 3471 Apr 9 10:36 employees.csv.gz
Now I need to create a simple shell script to call from an external table preprocessor to unzip the file on the fly and send the output to STDOUT. This will then be read and uploaded into the external table automatically.
The script is called unzip_file.sh and, initially, it looks like this :
#!/usr/bin/sh
/usr/bin/gunzip -c $1
When invoked by an external table preprocessor call, the script will be passed the fully qualified path of the file specified as the location of the external table.
The permissions on the script are :
-rwxr-x---. 1 app_user app_os 198 Apr 8 13:04 unzip_file.sh
Now for the external table itself. This is called employees_zip_xt and is created in the HR schema :
create table employees_zip_xt
(
row_type varchar2(10),
employee_id number,
first_name varchar2(20),
last_name varchar2(25),
email varchar2(25),
phone_number varchar2(20),
hire_date date,
job_id varchar2(10),
salary number,
commission_pct number,
manager_id number,
department_id number
)
organization external
(
type oracle_loader
default directory app_upload
access parameters
(
records delimited by newline
logfile app_log : 'employees_zip_xt.log'
badfile app_log : 'employees_zip_xt.bad'
nodiscardfile
preprocessor app_bin : 'unzip_file.sh'
skip 1
load when row_type = 'DATA'
fields terminated by ',' optionally enclosed by '"'
missing field values are null
(
row_type char(10),
employee_id integer external(6),
first_name char(20),
last_name char(25),
email char(25),
phone_number char(20),
hire_date date "DD-MON-RR",
job_id char(10),
salary float external,
commission_pct float external,
manager_id integer external(6),
department_id integer external(4)
)
)
location('employees.csv.gz')
)
reject limit unlimited
/
The table will process the data written to STDOUT by the preprocessor script.
Hmmm, I wonder what that suspicious clump of leaves and twigs is covering…
select * from employees_zip_xt; ORA-29913: error while processing ODCIEXTTABLEFETCH routine ORA-29400: data cartridge error KUP-04095: preprocessor command /opt/oracle/userhome/app_user/bin/unzip_file.sh encountered error "/usr/bin/gunzip: line 57: /opt/oracle/product/23c/dbhomeFree/dbs/gzip: No such file or directory
Give me a hand up, will you ?
As well as the ever inscrutable ODCIEXTTABLEFETCH, we have KUP-04095 complaining that it can’t find an executable that we haven’t called.
Whilst we’ve specified the full path to the gunzip executable in our script, GNU gunzip calls gzip during execution. Because that doesn’t use a fully qualified path, it looks in the wrong place.
The solution then, is to set the PATH environment variable in our script…
#!/usr/bin/sh
# Need the PATH specified because gunzip executable is calling gzip under the covers
export PATH=/usr/bin
gunzip -c $1
Now, when we query the table, it works as expected :

…and the unencrypted data is not persisted in a file…
ls -l emp* -rw-r--r--. 1 app_user app_os 3471 Apr 9 10:36 employees.csv.gz
Now we’ve established the concept, let’s try something a little more ambitious…
Where did I leave my keys ?Just for a change, I’m going to start with the example that actually works as expected.
I’ve generated GPG keys for oracle and used the public key to encrypt the original csv. The encrypted file has been moved to the upload directory :
ls -l *.gpg -rw-r--r--. 1 app_user app_os 3795 Apr 9 12:31 employees.csv.gpg
Note that the passphrase has been saved in passphrase.txt in oracle‘s home directory with the following permissions :
ls -l passphrase.txt -rw-------. 1 oracle oracle 19 Apr 9 15:41 passphrase.txt
Now we need a script in our bin directory to decrypt the file (decrypt_file.sh) :
#!/usr/bin/sh
/usr/bin/gpg \
--decrypt \
--pinentry-mode loopback \
--passphrase-file /opt/oracle/userhome/oracle/passphrase.txt \
--batch $1 2>/dev/null
We have a new external table – employees_enc_xt, which may look somewhat familiar :
create table employees_enc_xt
(
row_type varchar2(10),
employee_id number,
first_name varchar2(20),
last_name varchar2(25),
email varchar2(25),
phone_number varchar2(20),
hire_date date,
job_id varchar2(10),
salary number,
commission_pct number,
manager_id number,
department_id number
)
organization external
(
type oracle_loader
default directory app_upload
access parameters
(
records delimited by newline
logfile app_log : 'employees_enc_xt.log'
badfile app_log : 'employees_enc_xt.bad'
nodiscardfile
preprocessor app_bin : 'decrypt_file.sh'
skip 1
load when row_type = 'DATA'
fields terminated by ',' optionally enclosed by '"'
missing field values are null
(
row_type char(10),
employee_id integer external(6),
first_name char(20),
last_name char(25),
email char(25),
phone_number char(20),
hire_date date "DD-MON-RR",
job_id char(10),
salary float external,
commission_pct float external,
manager_id integer external(6),
department_id integer external(4)
)
)
location('employees.csv.gpg')
)
reject limit unlimited
/
…as will the results when we query it :

The point of a private GPG key is that it’s only accessible to the user that owns it. If that’s not oracle then we need to find a means of executing as the account which does own the key.
No biggie, the PDB_OS_CREDENTIAL parameter has that covered…oh, what have I stepped in.
PDB_OS_CREDENTIAL has been around since 12c and should allow you to specify the os user which runs when you invoke an external table preprocessor.
Coincidentally, the bug that causes this setting to be ignored has been around for a similar length of time.
I have personally confirmed this on versions 19c (splash !), 21c (squelch !) and now 23c (splat !).
I do hope that’s mud.
In these articles, Szymon Skorupinski does a marvellous job of explaining system behaviour both before and after the application of the patch to fix this issue :
If applying the patch is going to be problematic, not least because of the regression testing effort required to make sure none of your existing code breaks, then you do have an alternative…
I should be able to scrape it off once it driesBefore external tables there was SQL*Loader. It’s still supported, which is just as well in this case as we can use it to run a script as the owner of the PGP key and load the data directly into a staging table in the database.
The advantage of this approach is that you can execute it as the owner of the GPG key (although I’m still using the oracle user in this example).
To start with, we need a table to stage the data into :
create table employees_stg
(
row_type varchar2(10),
employee_id number,
first_name varchar2(20),
last_name varchar2(25 ),
email varchar2(25),
phone_number varchar2(20),
hire_date date,
job_id varchar2(10),
salary number,
commission_pct number,
manager_id number,
department_id number
)
/
Now we create a SQL*Loader control file in the bin directory on the server (load_employees.ctl) :
options(skip=1)
load data
badfile '/opt/oracle/userhome/app_user/logs/load_employees.bad'
discardfile '/opt/oracle/userhome/app_user/logs/load_employees.dis'
append
into table employees_stg
when row_type='DATA'
fields terminated by "," optionally enclosed by '"'
trailing nullcols
(
row_type char(10),
employee_id integer external(6),
first_name char(20),
last_name char(25),
email char(25),
phone_number char(20),
hire_date date "DD-MON-RR",
job_id char(10),
salary float external,
commission_pct float external,
manager_id integer external(6),
department_id integer external(4)
)
Finally, we create script to perform the load (load_employees.sh) :
#!/usr/bin/sh
/usr/bin/gpg \
--decrypt \
--pinentry-mode loopback \
--passphrase-file /opt/oracle/userhome/oracle/passphrase.txt \
--batch \
/home/app_user/upload/employees.csv.gpg 2>/dev/null|sqlldr \
control=load_employees.ctl \
userid=connect_string \
log=/home/app_user/logs/load_employees.log \
data=\'-\'
…replacing connect_string with the connect string for the schema you’re loading into.
Incidentally, in order to avoid having a database password hard-coded in the script, an Oracle Wallet would come in handy here.
Note that we tell SQL*Loader to read it’s STDIN for the data by specifying :
data =\'-\'
When we run this, we can see that it works as expected :


You could easily automate this load process by creating a DBMS_SCHEDULER external job.
ConclusionAs we’ve seen, it’s perfectly possible to pipe data from binary format files into the database without having to persist it in plain text. However, if you do decide to go down this path, you may want to wear your wellies.
Loading selected fields from a delimited file into Oracle using an External Table
If you’ve dealt with ETL processes for any length of time, sooner or later, you’ll be faced with the need to load data from a delimited file into your database.
In the case of Oracle, External Tables are tailor-made for this purpose.
However, whilst they might make loading an entire file is very simple, exactly how do you persuade them to just load certain fields ?
What we’ll be looking at here is :
- an external table that loads an entire file
- an external table that just loads the first few fields of each record in the file
- an external table that loads a selection of fields that are not contiguous in the file
All of these examples have been tested on Oracle 21cXE, although I’ve not seen any behaviour here that’s not consistent with Oracle versions back to 11.2.
The example file is simply a csv (called employees.csv) containing all the records from the EMPLOYEES table in the HR sample schema.

The external table which reads the entire file is called ALL_EMPS_XT :
create table all_emps_xt(
employee_id number(6,0),
first_name varchar2(20),
last_name varchar2(25),
email varchar2(25),
phone_number varchar2(20),
hire_date date,
job_id varchar2(10),
salary number(8,2),
commission_pct number(2,2),
manager_id number(6,0),
department_id number(4,0))
organization external (
type oracle_loader
default directory my_files
access parameters (
records delimited by newline
logfile 'all_emps.log'
badfile 'all_emps.bad'
discardfile 'all_emps.dis'
skip 1
load when (1:3) != 'TRL'
fields terminated by ',' optionally enclosed by '"'
missing field values are null (
employee_id integer external(6),
first_name char(20),
last_name char(25),
email char(25),
phone_number char(20),
hire_date date "DD-MON-YYYY",
job_id char(10),
salary float external,
commission_pct float external,
manager_id integer external(6),
department_id integer external(4))
)
location('employees.csv')
)
reject limit unlimited
/
Sure enough, if we select from the external table, we can see that all of the fields have been populated as expected …
select *
from all_emps_xt
where job_id in ('AD_PRES', 'SA_MAN')
order by employee_id
/

… and that all 107 data records have been loaded :
select count(*) from all_emps_xt; COUNT(*) ---------- 107Loading first few fields only
Let’s say that we weren’t interested in all of the fields in the file and we just wanted the first five – i.e. :
- EMPLOYEE_ID
- FIRST_NAME
- LAST_NAME
- PHONE_NUMBER
This is simple enough, we just specify definitions for those fields in the access parameters clause and our external table will ignore all of the fields after phone_number, simply by making sure that
missing field values are null
…is still specified in the access parameters clause of the External Table. This one is called START_EMPS_XT :
create table start_emps_xt(
employee_id number(6,0),
first_name varchar2(20),
last_name varchar2(25),
email varchar2(25),
phone_number varchar2(20))
organization external (
type oracle_loader
default directory my_files
access parameters (
records delimited by newline
logfile 'start_emps.log'
badfile 'start_emps.bad'
discardfile 'start_emps.dis'
skip 1
load when (1:3) != 'TRL'
fields terminated by ',' optionally enclosed by '"'
missing field values are null (
employee_id integer external(6),
first_name char(20),
last_name char(25),
email char(25),
phone_number char(20))
)
location('employees.csv')
)
reject limit unlimited
/
select *
from start_emps_xt
where employee_id between 100 and 110
order by employee_id
/

This is where it gets a bit more tricky…but not much, as it turns out. Say we want to load :
Field NoField Name1EMPLOYEE_ID2FIRST_NAME3LAST_NAME6HIRE_DATE10MANAGER_ID11DEPARTMENT_IDWe simply need to include placeholder values for in the access parameters clause for the columns we don’t want to load. In this case, I’ve used the actual field names but not specified a data type for them. That way, if I ever need to add one of these columns into the external table definition, I know where it appears in the file and the change is compartively simple. Anyhow, the table is called SELECTED_EMPS_XT :
create table selected_emps_xt(
employee_id number(6,0),
first_name varchar2(20),
last_name varchar2(25),
hire_date date,
manager_id number(6,0),
department_id number(4,0))
organization external (
type oracle_loader
default directory my_files
access parameters (
records delimited by newline
logfile 'selected_emps.log'
badfile 'selected_emps.bad'
discardfile 'selected_emps.dis'
skip 1
load when (1:3) != 'TRL'
fields terminated by ',' optionally enclosed by '"'
missing field values are null (
employee_id integer external(6),
first_name char(20),
last_name char(25),
email,
phone_number,
hire_date date "DD-MON-YYYY",
job_id,
salary,
commission_pct,
manager_id integer external(6),
department_id integer external(4))
)
location('employees.csv')
)
reject limit unlimited
/
Sure enough, if we query the table :
select *
from selected_emps_xt
where department_id = 60
order by hire_date
/

FORALL DML – why context isn’t everything
This post is the latest in an occasional series on the theme of stuff that doesn’t work quite how I thought it did.
It is the result of finding out the fun way that, rather than being “much faster” than using a humble Cursor For Loop , Bulk Collect/Forall for DML in PL/SQL can merely be a bit less slow.
Just in case my boss is reading this, I’d better get my excuses in right at the beginning.
This is what the Oracle PL/SQL Language Reference has to say about Forall :
A FORALL
statement is usually much faster than an equivalent FOR
LOOP
statement. However, a FOR
LOOP
statement can contain multiple DML statements, while a FORALL
statement can contain only one. The batch of DML statements that a FORALL
statement sends to SQL differ only in their VALUES
and WHERE
clauses. The values in those clauses must come from existing, populated collections.
…and later in the same document ….
The FORALL
statement runs one DML statement multiple times, with different values in the VALUES
and WHERE
clauses.
The different values come from existing, populated collections or host arrays. The FORALL
statement is usually much faster than an equivalent FOR
LOOP
statement.
I’ll be testing that assertion in this post.
Using a simple test case, which involves performing different DML operations on different tables using a single record set, I’ll begin by comparing the relative performance of:
- simple SQL statements inside a PL/SQL block
- Cursor For Loops
- Bulk Collect/Forall operations
I’ll then explore the potential performance gains available using database objects such as VARRAYs and Global Temporary Tables to hold the array being used in the DML.
NOTE – If you want to run these tests yourself on your own environment/database version to validate these findings, you can find the scripts I’ve used here on my Github Repo.
Test ScenarioI have a single set of records that I want to use in an insert, an update and a delete.
Each of these operations has a different target table.
I’ve created a simple test case, using this script ( called setup.sql) :
clear screen
set serverout on size unlimited
drop table input_records;
drop table record_statuses;
drop table target_table;
create table input_records
as
select rownum as id
from dual
connect by rownum <=100000;
create table record_statuses
as
select id, 'WAITING' as status
from input_records;
create table target_table ( id number);
The tests have all been run on Oracle Enterprise Edition 19c, although the behaviour described here appears to be consistent across Oracle Database versions from 11g upwards.
I’ve run each script twice in succession and then taken the fastest runtime. This is to ensure I’ve not been caught out by the effects of any caching. Unlikely as I drop and re-create the tables each time, but as I’ve already fallen foul of one assumption recently, I didn’t want to take any chances.
I’ve used tkprof to format tracefile information, so it’s probably worth bearing in mind the column heading definitions for tkprof output :
****************************************************************************** count = number of times OCI procedure was executed cpu = cpu time in seconds executing elapsed = elapsed time in seconds executing disk = number of physical reads of buffers from disk query = number of buffers gotten for consistent read current = number of buffers gotten in current mode (usually for update) rows = number of rows processed by the fetch or execute call ******************************************************************************
To generate the trace files, I’ve wrapped the anonymous blocks with the test code with the following statements.
Before the PL/SQL block containing the test code :
alter session set timed_statistics = true;
alter session set sql_trace = true;
alter session set tracefile_identifier = '<a string>';
…where <a string> is a string to include in the tracefile name to make it easier to find in the trace directory on the database server.
After the block :
alter session set sql_trace = false;
select value
from v$diag_info
where name = 'Default Trace File'
/
A Simple SQL Baseline
Let’s start by benchmarking just using SQL.
This is as fast as it gets in Oracle terms…generally speaking.
Therefore it should provide a useful indicator of what the RDBMS is capable of in terms of performing these DML operations ( straight_iud.sql ):
@setup.sql
set timing on
begin
insert into target_table(id)
select id from input_records;
update record_statuses
set status = 'LOADED'
where id in ( select id from input_records);
delete from input_records
where id in ( select id from record_statuses where status = 'LOADED');
commit;
end;
/
set timing off
Once the setup is done, the block completes in 1.878 seconds.
Whilst it may be faster than a fast thing on fast pills, a SQL-only solution is not always practical.
As Steven Feuerstein points out in his excellent article on this subject, there are times when you need PL/SQL to get the job done.
This is where the Bulk Collect/Forall and Cursor For Loop approaches come to the fore.
Let’s start by looking at how they compare when called upon to perform the first of our DML operations…
Let’s start with the straight SQL option (straight_insert.sql ):
@setup.sql
set timing on
begin
insert into target_table(id)
select id from input_records;
commit;
end;
/
set timing off
This runs in a predictably rapid 0.128 seconds.
Now for the Cursor For Loop ( forloop_insert.sql ):
@setup.sql
set timing on
declare
v_count pls_integer := 0;
begin
for r_rec in(
select id
from input_records)
loop
insert into target_table(id)
values( r_rec.id);
end loop;
commit;
end;
/
set timing off
…which completes in an, equally predictable, laggardly 1.445 seconds.
Incidentally, tracing this operation confirms that the select statement in a Cursor For Loop behaves in the same way as a Bulk Collect with the LIMIT set to 100. The Fetch count shows the 100K records have been fetched in batches of 100.
SELECT ID FROM INPUT_RECORDS call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 2 0.00 0.00 0 0 0 0 Execute 2 0.00 0.00 0 0 0 0 Fetch 1002 0.43 0.38 154 328059 0 100000 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 1006 0.43 0.39 154 328059 0 100000
This brings us to the much vaunted Forall.
If we start by running with a limit of 100 on the Bulk Collect, we can get an idea of how much difference it makes when we minimize the context switches between the SQL and PL/SQL engine as compared to the humble Cursor For Loop ( forall_insert_100.sql ):
@../setup.sql
set timing on
declare
v_count pls_integer := 0;
cursor c_input is
select id
from input_records;
type typ_input is table of input_records%rowtype index by pls_integer;
v_arr_id typ_input;
begin
open c_input;
loop
fetch c_input
bulk collect into v_arr_id
limit 100;
forall i in 1..v_arr_id.count
insert into target_table(id)
values( v_arr_id(i).id);
exit when v_arr_id.count = 0;
end loop;
close c_input;
commit;
end;
/
set timing off
Yep, that’s much faster at 0.165 seconds.
We can conclude from this that most of the Cursor For Loop execution time is due to context switching between the SQL and PL/SQL engines.
However, after some trial and error, I’ve found that setting a limit of 1000 is about optimal for the test case at hand – not using a limit clause actually proves to be a bit slower – so I’ll be using that in the rest of the Bulk Collect tests in this post. Starting with this ( forall_insert.sql ):
@../setup.sql
set timing on
declare
v_count pls_integer := 0;
cursor c_input is
select id
from input_records;
type typ_input is table of input_records%rowtype index by pls_integer;
v_arr_id typ_input;
begin
open c_input;
loop
fetch c_input
bulk collect into v_arr_id
limit 1000;
forall i in 1..v_arr_id.count
insert into target_table(id)
values( v_arr_id(i).id);
exit when v_arr_id.count = 0;
end loop;
close c_input;
commit;
end;
/
set timing off
This runs in 0.145 seconds.
Let’s tabulate those execution times for ease of comparison :
RankMethodRuntime (seconds)1Simple SQL0.1282Forall (Limit 1000)0.1453Cursor For Loop1.445So far, it all seems to be pretty much as you might expect.
OK, let’s try some more DML statements…
We’re going to insert records into a table as before.
Then, using the same record set, we’ll update a second table then delete from a third table.
We’ve already seen the unadorned SQL script for this test, so let’s move on to the Cursor For Loop (forloop_iud.sql). Once I kick this off, you may want to take the opportunity to go and make a coffee…
@setup.sql
set timing on
declare
v_count pls_integer := 0;
begin
for r_rec in(
select id
from input_records)
loop
insert into target_table(id)
values( r_rec.id);
update record_statuses
set status = 'LOADED'
where id = r_rec.id;
delete from input_records
where id = r_rec.id;
end loop;
commit;
end;
/
set timing off
That took almost four minutes ( 238.486 seconds).
Let’s see what “much faster” looks like with Forall (forall_iud.sql ):
@setup.sql
set timing on
declare
v_count pls_integer := 0;
cursor c_input is
select id
from input_records;
type typ_input is table of input_records%rowtype index by pls_integer;
v_arr_id typ_input;
begin
open c_input;
loop
fetch c_input
bulk collect into v_arr_id
limit 1000;
forall i in 1..v_arr_id.count
insert into target_table(id)
values( v_arr_id(i).id);
forall j in 1..v_arr_id.count
update record_statuses
set status = 'LOADED'
where id = v_arr_id(j).id;
forall k in 1..v_arr_id.count
delete from input_records
where id = v_arr_id(k).id;
exit when v_arr_id.count = 0;
end loop;
close c_input;
commit;
end;
/
set timing off
My coffee’s gone cold. That’s 224.162 seconds – around 6% faster if you’re interested.
I’m not sure that qualifies as “much faster”.
Remember, the simple SQL runtime for the same DML operations was 1.878 seconds.
Time for another table :
RankMethodRuntime (seconds)1Simple SQL 1.8782Forall (Limit 1000)224.1623Cursor For Loop238.486In order to find out just what’s going on, it may be worth isolating a single DML operation, as we did with the INSERT statement earlier. This time, we’ll try an UPDATE…
Tracing an UpdateThe simple SQL benchmark for the update is ( straight_update.sql ):
@../setup.sql
insert into target_table
select id from input_records;
commit;
set timing on
begin
update record_statuses
set status = 'LOADED'
where id in ( select id from input_records);
commit;
end;
/
set timing off
This runs in 0.798 seconds.
Now for the Cursor For Loop ( forloop_update.sql):
@setup.sql
insert into target_table
select id from input_records;
commit;
set timing on
declare
v_count pls_integer := 0;
begin
for r_rec in(
select id
from input_records)
loop
update record_statuses
set status = 'LOADED'
where id = r_rec.id;
end loop;
commit;
end;
/
set timing off
144.856 seconds.
Now for the Forall (forall_update.sql ):
@setup.sql
insert into target_table(id)
select id from input_records;
commit;
set timing on
declare
v_count pls_integer := 0;
cursor c_input is
select id
from input_records;
type typ_input is table of input_records%rowtype index by pls_integer;
v_arr_id typ_input;
begin
open c_input;
loop
fetch c_input
bulk collect into v_arr_id
limit 1000;
forall j in 1..v_arr_id.count
update record_statuses
set status = 'LOADED'
where id = v_arr_id(j).id;
exit when v_arr_id.count = 0;
end loop;
close c_input;
commit;
end;
/
set timing off
That’s a not-much-faster-at-all-really 141.449 seconds.
RankMethodRuntime (seconds)1Simple SQL 0.7982Forall (Limit 1000)141.4493Cursor For Loop144.856Once I’ve turned tracing on and re-executed these scripts, the reason for the performance discrepancy becomes clear.
Here’s the tkprof output for the Simple SQL Update. Remember, query is the number of buffers gotten for a consistent read :
UPDATE RECORD_STATUSES SET STATUS = 'LOADED' WHERE ID IN ( SELECT ID FROM INPUT_RECORDS) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 2 0 Execute 1 0.51 0.92 249 416 102416 100000 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 2 0.51 0.92 249 416 102418 100000
As you’d expect, the Cursor For Loop performs a huge number of gets by comparison as it’s executing 100K update statements, rather than just one :
UPDATE RECORD_STATUSES SET STATUS = 'LOADED' WHERE ID = :B1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 100000 215.15 221.86 248 25104059 103919 100000 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 100001 215.15 221.86 248 25104059 103919 100000
Now, although the Forall is only executing the insert 100 times, the number of gets is on a par with the Cursor For Loop :
UPDATE RECORD_STATUSES SET STATUS = 'LOADED' WHERE ID = :B1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 100 217.41 220.58 248 25105754 103319 100000 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 101 217.41 220.58 248 25105754 103319 100000
Further digging shows that it’s a similar story with the DELETEs.
At this point, maybe we should take another look at that documentation I quoted at the start. Specifically, the bit which says :
The FORALL statement runs one DML statement multiple times, with different values in the VALUES and WHERE clauses.
So, despite the tkprof execution count for the statement suggesting otherwise, it looks like Forall is actually running a seperate UPDATE statement for each value in the collection.
If we’re going to get this particular DML code to perform, we’ll need to look at storing our array in a structure other than a PL/SQL collection.
VARRAYIn their guise as a database object (as opposed to a PL/SQL collection), VARRAYs are directly visible to the SQL engine. This means that they can be directly referenced in SQL queries without the necessity for any context switching (varray_iud.sql) …
@setup.sql
-- If we try to declare and use the varray entirely within the PL/SQL block
-- we run into PLS-00642 : local collection types not allowed in SQL statements
-- so...
create or replace type prince_varray as varray(100000) of number
/
set timing on
declare
v_id_arr prince_varray := prince_varray();
begin
select id
bulk collect into v_id_arr
from input_records;
insert into target_table(id)
select * from table(v_id_arr);
update record_statuses
set status = 'LOADED'
where id in (select * from table(v_id_arr));
delete from input_records
where id in (select * from table(v_id_arr));
commit;
end;
/
set timing off
Side Note : some may consider it bad practice to name objects simply for the purposes of making a weak pun.
The big news is that the runtime is a rather respectable 1.846 seconds.
If you’re sure that you’re never going to need to process more than 32767 records in your VARRAY you can even save yourself the effort of coming up with a sensible name for one.
The following types are already created in Oracle for each of the base datatypes:
- sys.ODCINumberList
- sys.ODCIDateList
- sys.ODCIVarchar2List
Incidentally, in case you’re wondering about the maximum size of a VARRAY…
create or replace type varray_kane as varray(2147483647) of number / Type VARRAY_KANE compiled
Whilst a VARRAY is an option in this instance, in most cases, you’ll be dealing with a multi-dimensional array.
If only we could use a table…
Using a Global Temporary Table (GTT) offers all of the advantages of using a VARRAY with the added bonus that it’ll work for multi-dimensional arrays (gtt_iud.sql).
@setup.sql
drop table gtt_array;
create global temporary table gtt_array (
id number)
on commit delete rows;
set timing on
begin
insert into gtt_array
select id from input_records;
insert into target_table(id)
select id from gtt_array;
update record_statuses
set status = 'LOADED'
where id in (select id from gtt_array);
delete from input_records
where id in (select id from gtt_array);
commit;
end;
/
set timing off
Runtime is 1.950 seconds.
If we look at our final results table for the runs that involved all three DML statements, we can see that the performance of the VARRAY and GTT methods is broadly similar to that of using simple SQL statements, mainly because they are also, effectively simple SQL statements.
I say “broadly similar” because I’ve not found to one method to be consistently faster than the other two. All of them are, however, much faster than Forall.
RankMethodRuntime (seconds)1VARRAY 1.8462Simple SQL 1.8783GTT 1.9504Forall224.1625Cursor For Loop238.486 ConclusionsUsing Forall for an INSERT is likely to be significantly faster than using a Cursor For Loop due to the reduction in context switching. However, this does not hold for other types of DML.
If you find yourself in circumstances where performance is an issue then you may want to consider using a database object, such as a VARRAY or GTT, in your array processing.
Using a full outer join to “diff” two tables
I was inspired to write this post by Joshua Ottwell’s thoughts on finding rows present in one table but not another.
What follows is an exploration of how we can use a Full Outer Join to perform a “diff” on the data in two tables. We’ll also look at doing something similar for two distinct result sets from the same table.
In this instance, we want to identify :
- records that exist in the first table but not the second
- records that exist in the second table but not the first
- records that exist in both tables but where some values differ
Before going any further, I should say that the example that follows will make more sense if you consider Terry Pratchett’s observation that :
“In ancient times cats were worshipped as gods; they have not forgotten this.”

Let’s say I have a report that I want to change.
I’ve simulated the output of the report and captured it in a table like this :
create table employee_report_baseline(
id number,
emp_name varchar2(100),
job_title varchar2(100) )
/
insert into employee_report_baseline( id, emp_name, job_title)
values(1,'DEBBIE', 'CEO');
insert into employee_report_baseline( id, emp_name, job_title)
values(2, 'MIKE', 'MINION');
insert into employee_report_baseline( id, emp_name, job_title)
values(3, 'TEDDY', 'DOG');
commit;
The output of the new version of the report is simulated like this :
create table employee_report_new
as
select *
from employee_report_baseline
where emp_name != 'MIKE';
update employee_report_new
set job_title = 'LADY-IN-WAITING'
where emp_name = 'DEBBIE';
insert into employee_report_new( id, emp_name, job_title)
values(4, 'CLEO', 'CAT');
commit;
Identifying missing or new records
First of all, let’s see if the records returned are the same for both reports. Happily ID is a unique key for the dataset, which makes the comparison fairly simple :
select bsl.id, nr.id,
case
when bsl.id is null then 'Added in New Report'
when nr.id is null then 'Missing from New Report'
end as status
from employee_report_baseline bsl
full outer join employee_report_new nr
on bsl.id = nr.id
where bsl.id is null
or nr.id is null
order by 1 nulls last;
The absence of the key value from either table indicates that the record is in one table but not the other.
Sure enough, when we run this query we get :
BASELINE_ID NEW_ID STATUS ----------- ------ ------------------------------ 2 Missing from New Report 4 Added in New ReportAll Differences
If we want to identify all of the differences, including changed records, we can do so with just a little extra typing :
select
nvl(bsl.id, nr.id) as id,
bsl.emp_name as old_emp_name,
nr.emp_name as new_emp_name,
bsl.job_title as old_job_title,
nr.job_title as new_job_title,
case
when bsl.id is null then 'Added in New Report'
when nr.id is null then 'Missing from New Report'
else 'Different between Report Versions'
end as status
from employee_report_baseline bsl
full outer join employee_report_new nr
on bsl.id = nr.id
where bsl.id is null
or nr.id is null
or nvl(bsl.emp_name, 'X') != nvl(nr.emp_name, 'X')
or nvl(bsl.job_title, 'X') != nvl(nr.job_title, 'X')
order by 1 nulls last
/
As well as the two records that are missing from either result set, we can see that Debbie has been demoted ( or possibly usurped). Cleo is now the top dog(!) :

One thing to bear in mind when using a full outer join is that it will match any row in the tables being joined.
This gets a bit annoying when you want to do something with a subset of data such as comparing the column definitions of two tables in USER_TAB_COLUMNS in Oracle.
To demonstrate :
create table old_firm(
id number,
first_name varchar2(500),
last_name varchar2(500),
job_title varchar2(100),
start_date date,
end_date date)
/
create table new_firm(
id number,
first_name varchar2(250),
last_name varchar2(500),
salary number,
start_date date,
end_date varchar2(20))
/
If we want to compare only the records relating to these two tables then we’ll need a couple of in-line-views to restrict the result sets that our full outer join will look at.
Incidentally, as we’re running this on Oracle 19c, we can throw in an in-line function as well to save a bit of typing…
with function col_diff(
i_old_name in varchar2 default null,
i_new_name in varchar2 default null,
i_old_type in varchar2 default null,
i_new_type in varchar2 default null,
i_old_len in number default null,
i_new_len in number default null)
return varchar2
is
begin
if i_old_name is null then
return 'NEW';
elsif i_new_name is null then
return 'MISSING';
end if;
-- compare the attributes for the columns
if i_old_type != i_new_type then
return 'TYPE';
elsif i_old_len != i_new_len then
return 'LENGTH';
end if;
end;
ofm as (
select column_name, data_type, data_length
from user_tab_columns
where table_name = 'OLD_FIRM'),
nfm as (
select column_name, data_type, data_length
from user_tab_columns
where table_name = 'NEW_FIRM')
select
nvl(ofm.column_name, nfm.column_name) as column_name,
case col_diff( ofm.column_name, nfm.column_name, ofm.data_type, nfm.data_type, ofm.data_length, nfm.data_length)
when 'NEW' then 'New Column'
when 'MISSING' then 'Missing Column in NEW_FIRM'
when 'TYPE' then 'Type Mismatch. OLD_FIRM type is '||ofm.data_type||' NEW_FIRM is '||nfm.data_type
when 'LENGTH' then 'Length Mismatch. OLD_FIRM length is '||ofm.data_length||' NEW_FIRM is '||nfm.data_length
end as status
from ofm
full outer join nfm
on ofm.column_name = nfm.column_name
where (ofm.column_name is null or nfm.column_name is null)
or( ofm.data_type != nfm.data_type or ofm.data_length != nfm.data_length)
/

I’ll have to leave it there. The “boss” is demanding Dreamies to appease her wrath for being referred to as “top dog” earlier.
The Oracle SQL Limit Clause and teaching a New Dog Old Tricks
It’s been a few weeks now and I’ve finally gotten over England’s latest World Cup penalty drama.
I’m not so sure about Teddy though…

Anyhow, he has decided that I need to know about the SQL Limit Clause that Oracle introduced in 12c and has decided to use data from the Tournament in the examples that follow…
EnvironmentThese examples have all been run on my OCI Free Tier database ( Oracle 19c at the time of writing).
The table we’re going to use contains details of players who scored or gave an assist during the recent Men’s World Cup Finals and looks like this :
create table goal_scorers (
player varchar2(4000),
team varchar2(500),
goals number,
assists number)
/
The table contains 176 rows.
select player, team, goals, assists from goal_scorers order by goals desc, assists desc / PLAYER TEAM GOALS ASSISTS ----------------------- ----------------------- ----- ------- MBAPPE FRANCE 8 2 MESSI ARGENTINA 7 3 ALVAREZ ARGENTINA 4 0 GIROUD FRANCE 4 0 MORATA SPAIN 3 1 GONCALO RAMOS PORTUGAL 3 1 SAKA ENGLAND 3 0 RASHFORD ENGLAND 3 0 GAKPO NETHERLANDS 3 0 RICHARLISON BRAZIL 3 0 E VALENCIA ECUADOR 3 0 ... snip ... PRECIADO ECUADOR 0 1 MCGREE AUSTRALIA 0 1 RODRYGO BRAZIL 0 1
Before we get into the new(ish) syntax, let’s have a quick history lesson…
Before 12cBack in the mists of time, if you wanted to retrieve the first five rows in a table based on an order you specified, you’d need to do something like this :
select player, team, goals, assists
from (
select player, team, goals, assists
from goal_scorers
order by goals desc, assists desc, player, team)
where rownum <= 5;
PLAYER TEAM GOALS ASSISTS -------------------- -------------------- ----- ------- MBAPPE FRANCE 8 2 MESSI ARGENTINA 7 3 ALVAREZ ARGENTINA 4 0 GIROUD FRANCE 4 0 GONCALO RAMOS PORTUGAL 3 1 5 rows selected.
Teddy reckons that you’ll only come across code like this if it’s really old…or written by a really old programmer.
Ignoring the cheeky young pup, we move on to the point were Analytic Functions become available (8i if you’re counting), which allows us to dispense with the – in this context – artificial order criteria of player and team and ensure that our query returns any tied records :
with top_scorers as (
select player, team, goals, assists,
rank() over ( order by goals desc, assists desc) as recnum
from goal_scorers )
select *
from top_scorers
where recnum <= 5
order by recnum;
PLAYER TEAM GOALS ASSISTS RECNUM -------------------- -------------------- ----- ------- ---------- MBAPPE FRANCE 8 2 1 MESSI ARGENTINA 7 3 2 ALVAREZ ARGENTINA 4 0 3 GIROUD FRANCE 4 0 3 MORATA SPAIN 3 1 5 GONCALO RAMOS PORTUGAL 3 1 5 6 rows selected.
In the latest Oracle versions, Teddy reckons that you can get that same analytical function goodness but with a bit less typing…
Fetch FirstLet’s re-write our top 5 query using the limit clause…
select player, team, goals, assists
from goal_scorers
order by goals desc, assists desc
fetch first 5 rows only
/
PLAYER TEAM GOALS ASSISTS -------------------- -------------------- ----- ------- MBAPPE FRANCE 8 2 MESSI ARGENTINA 7 3 ALVAREZ ARGENTINA 4 0 GIROUD FRANCE 4 0 MORATA SPAIN 3 1 5 rows selected.
As we’ve already seen, that’s ever so slightly different from the top 5 goal scorers. Like Morata, Goncalo Ramos also has 3 goals and 1 assist but he has been arbitrarily excluded from the result set.
Remember, Oracle does not guarantee the order of a result set of a SELECT statement other than that specified in the ORDER BY clause. Therefore, there’s no guarantee a future run of this query won’t include Goncalo Ramos and exclude Morata.
A more reliable query would include all of the tied records for 5th place, as with the RANK() query above.
Happily, we can achieve the same effect with a limit clause :
select player, team, goals, assists
from goal_scorers
order by goals desc, assists desc
fetch first 5 rows with ties
/
PLAYER TEAM GOALS ASSISTS -------------------- -------------------- ----- ------- MBAPPE FRANCE 8 2 MESSI ARGENTINA 7 3 ALVAREZ ARGENTINA 4 0 GIROUD FRANCE 4 0 MORATA SPAIN 3 1 GONCALO RAMOS PORTUGAL 3 1 6 rows selected.
As well as specifying a set number of rows, you can also specify a percentage using either the ONLY clause…
select player, team, goals, assists
from goal_scorers
order by goals desc, assists desc
fetch first 5 percent rows only
/
PLAYER TEAM GOALS ASSISTS -------------------- -------------------- ----- ------- MBAPPE FRANCE 8 2 MESSI ARGENTINA 7 3 ALVAREZ ARGENTINA 4 0 GIROUD FRANCE 4 0 MORATA SPAIN 3 1 GONCALO RAMOS PORTUGAL 3 1 SAKA ENGLAND 3 0 RASHFORD ENGLAND 3 0 GAKPO NETHERLANDS 3 0 9 rows selected.
… or the WITH TIES clause…
select player, team, goals, assists
from goal_scorers
order by goals desc, assists desc
fetch first 5 percent rows with ties
/
PLAYER TEAM GOALS ASSISTS -------------------- -------------------- ----- ------- MBAPPE FRANCE 8 2 MESSI ARGENTINA 7 3 ALVAREZ ARGENTINA 4 0 GIROUD FRANCE 4 0 MORATA SPAIN 3 1 GONCALO RAMOS PORTUGAL 3 1 SAKA ENGLAND 3 0 RASHFORD ENGLAND 3 0 GAKPO NETHERLANDS 3 0 RICHARLISON BRAZIL 3 0 E VALENCIA ECUADOR 3 0 11 rows selected.Offset
If we want to skip the first n rows we can use the OFFSET clause :
select player, team, goals, assists
from goal_scorers
order by goals desc, assists desc
offset 10 rows
/
PLAYER TEAM GOALS ASSISTS -------------------- -------------------- ----- ------- E VALENCIA ECUADOR 3 0 KANE ENGLAND 2 3 BRUNO FERNANDES PORTUGAL 2 1 LEWANDOWSKI POLAND 2 1 NEYMAR BRAZIL 2 1 ...snip... RODRYGO BRAZIL 0 1 166 rows selected.
We can also specify the number of rows to fetch in the offset clause :
select player, team, goals, assists
from goal_scorers
order by goals desc, assists desc
offset 10 rows fetch next 10 rows only
/
or…
select player, team, goals, assists
from goal_scorers
order by goals desc, assists desc
offset 10 rows fetch next 10 rows with ties
/
If you’re thinking of “paging” your results using this method, but you are not certain that your order by clause will not result in any ties, you may get some unexpected results.
For example, when I ran :
select player, team, goals, assists
from goal_scorers
order by goals desc, assists desc
fetch first 10 rows only
/
I got :
PLAYER TEAM GOALS ASSISTS -------------------- -------------------- ----- ------- MBAPPE FRANCE 8 2 MESSI ARGENTINA 7 3 ALVAREZ ARGENTINA 4 0 GIROUD FRANCE 4 0 MORATA SPAIN 3 1 GONCALO RAMOS PORTUGAL 3 1 E VALENCIA ECUADOR 3 0 SAKA ENGLAND 3 0 RICHARLISON BRAZIL 3 0 GAKPO NETHERLANDS 3 0 10 rows selected.
However, when I then ran…
select player, team, goals, assists
from goal_scorers
order by goals desc, assists desc
offset 10 rows fetch next 10 rows only
/
Gapko appeared again…
PLAYER TEAM GOALS ASSISTS -------------------- -------------------- ----- ------- GAKPO NETHERLANDS 3 0 KANE ENGLAND 2 3 NEYMAR BRAZIL 2 1 LEWANDOWSKI POLAND 2 1 BRUNO FERNANDES PORTUGAL 2 1 TAREMI IRAN 2 1 ABOUBAKAR CAMEROON 2 1 CHO SOUTH KOREA 2 0 KRAMARIC CROATIA 2 0 AL DAWSARI SAUDI ARABIA 2 0 10 rows selected.
In this example, if I want to eliminate duplicates then I need to make sure that my order by clause does not allow any ties :
select player, team, goals, assists
from goal_scorers
order by goals desc, assists desc, player, team
fetch first 10 rows only
/
PLAYER TEAM GOALS ASSISTS -------------------- -------------------- ----- ------- MBAPPE FRANCE 8 2 MESSI ARGENTINA 7 3 ALVAREZ ARGENTINA 4 0 GIROUD FRANCE 4 0 GONCALO RAMOS PORTUGAL 3 1 MORATA SPAIN 3 1 E VALENCIA ECUADOR 3 0 GAKPO NETHERLANDS 3 0 RASHFORD ENGLAND 3 0 RICHARLISON BRAZIL 3 0 10 rows selected.
select player, team, goals, assists from goal_scorers order by goals desc, assists desc, player, team offset 10 rows fetch next 10 rows only / PLAYER TEAM GOALS ASSISTS -------------------- -------------------- ----- ------- SAKA ENGLAND 3 0 KANE ENGLAND 2 3 ABOUBAKAR CAMEROON 2 1 BRUNO FERNANDES PORTUGAL 2 1 LEWANDOWSKI POLAND 2 1 NEYMAR BRAZIL 2 1 TAREMI IRAN 2 1 AL DAWSARI SAUDI ARABIA 2 0 CHO SOUTH KOREA 2 0 DE ARRASCAETA URUGUAY 2 0 10 rows selected.
So that’s it then, give the dog a biscuit and be on our way ?
Well, it’s just possible that the rownum method from olden times is not ready to be consigned to history quite yet…
Let’s say we have a table with lots of rows…
create table bigtab as
select rownum as id from dual connect by rownum <= 10000000;
If we run the following queries against it, the relative performance may be a bit of a surprise…
set timing on with ordered_recs as ( select id from bigtab order by 1) select id from bigtab where rownum <=5; ID ---------- 1 2 3 4 5 Elapsed: 00:00:00.042 select id from bigtab order by id fetch first 5 rows only; ID ---------- 1 2 3 4 5 Elapsed: 00:00:00.608 select id from ( select id, rank() over (order by id) as recnum from bigtab) where recnum <= 5; ID ---------- 1 2 3 4 5 Elapsed: 00:00:00.636
Note that this is the second run of each query to account for the effects of caching.
The similarity in runtime between the limit clause and the analytic function is not that surprising. If you look at an explain plan for a limit clause query, it appears to be using analytical functions under the covers.
The big news here is that the shonky old rownum query is about 14 times faster than the shiny new limit clause equivalent.
Remember, I’m running this on an OCI 19c instance managed by Oracle so there’s nothing odd in the database configuration.
It appears that there is a patch to address an optimizer issue with this type of query, details of which can be found in this article by Nigel Bayliss.
However, if you do come across a performance issue with a limit clause query, it may be worth seeing if good old-fashioned rownum will dig you out of a hole.
Conditionally calling a script in a SQL*Plus control script
For those of us who haven’t quite gotten around to incorporating Liquibase in our database code release pipeline, we sometimes find ourselves in a situation where it would be extremely useful to be able to apply some branching logic in a SQL*Plus control script.
In order to save myself the pain of trying to figure out exactly how you can do this without writing lots of dynamic code, I’ve decided to write it down.
NOTE – the scripting techniques outlined here work the same way in both SQLCL and SQL*Plus.
What follows are examples of SQL*Plus scripts which implement branching to :
- choose which of two scripts to run
- choose whether or not to run a script
The difference between these two use cases is fairly subtle, but worth exploring…
Choosing between two scriptsHere are two very simple scripts. The first is called weekday.sql :
prompt Another working day :-(
…and the second is called weekend.sql
prompt Yay, it's the weekend :-)
I want to write a control script which calls one – and only one – of these scripts based on what day it is today.
We can establish the day easily enough with :
select to_char(sysdate, 'DY') from dual;
In order to use this information to influence the behaviour of the control script at runtime, we’ll need to assign it to a variable that we can reference. In SQL*PLUS, we can do this with NEW_VALUE :
column weekday new_value v_script noprint
select
case when to_char(sysdate, 'DY') in ('SAT', 'SUN')
then 'weekend.sql'
else 'weekday.sql'
end as weekday
from dual;
@&v_script
The first line of the script…
column weekday new_value v_script noprint
…takes the value of the column “weekday” from result of the subsequent query and assigns it to the variable v_script.
I’ve saved this script as choose_message.sql.
When I run it I get :

As you can see, only the message from weekend.sql is shown. The weekday.sql script is not executed.
Choosing whether or not to run a scriptThis is a little different as, with this technique, the script will always call a file at the end.
Therefore, I’ve got a placeholder script which will get executed if we decide we don’t want to run the actaul target script. The new script is placeholder.sql and is predictably simple :
prompt You have chosen to skip weekday.sql
This time, the control script is called run_weekday_yn.sql and accepts a parameter :
set verify off
accept run_yn prompt 'Do you want to run the script ? (Y/N) : '
column run_it new_value v_script noprint
select
case when upper('&run_yn') = 'Y'
then 'weekday.sql'
else 'placeholder.sql'
end run_it
from dual;
@&v_script
This will run weekday.sql…

…but only if we tell it to…

Of course, you can use a positional notation for the parameter to allow it to be passed unprompted at runtime. This is run_it_yn.sql :
set verify off
column run_it new_value v_script noprint
select
case
when upper('&1') = 'Y' then 'weekday.sql'
else 'placeholder.sql'
end run_it
from dual;
@&v_script

Converting a non-partitioned table to partitioned in Oracle
Teddy tells me that his New Year’s Resolution is to teach an old dog new tricks.
I’ve no idea which particular “Old Dog” he has in mind.
Anyway, it looks like 2023 is going to be The Year of The “Ted(dy) Talk” on this blog, starting with the “modern” method ( i.e. in Oracle 12c and later) of convert a non-partitioned table to be partitioned.

Such an undertaking used to require a lot of messing about, probably involving DBMS_REDEFINITION.
If you’re still stuck on 11g, you can find a good example of this method on Oracle Base
Since 12c however, things have become much easier.
Before we go on however…
License WarningUse of partitioning may require an additional license depending on the Oracle version and Edition so it’s a good idea to make sure you’re covered before you start using it.
With that in mind…
We have a non-partitioned table called TEDDY, which we’ve created like this :
create table teddy as
select
rownum as id,
case
when mod(trunc(dbms_random.value(1,5)),4) = 0 then 'Woof'
when mod(trunc(dbms_random.value(1,5)),4) = 1 then 'Bark'
when mod(trunc(dbms_random.value(1,5)),4) = 2 then 'Whine'
else 'Growl'
end as message,
add_months(sysdate, - trunc(dbms_random.value(1,4))) as start_date
from dual
connect by rownum <= 20
/
alter table teddy add constraint teddy_pk primary key (id);
create index teddy_local_start_date_ix on teddy(start_date);
This DDL results in the creation of the following segments :
select segment_name, segment_type from user_segments where segment_name like 'TEDDY%' / SEGMENT_NAME SEGMENT_TYPE ---------------------------------------- -------------------- TEDDY TABLE TEDDY_LOCAL_START_DATE_IX INDEX TEDDY_PK INDEX
Some time after creating the table, we’ve realised that TEDDY should be Interval Partitioned on START_DATE.
Happily for us, on Oracle versions from 12c, we can accomplish this very simply :
alter table teddy modify
partition by range (start_date) interval( numtoyminterval(1, 'MONTH'))
(partition p_default values less than (to_date('2022-01-01', 'YYYY-MM-DD')))
online
/
If we look at the segments now we can see that Oracle has created multiple table and index partitions :
select segment_name, partition_name, segment_type from user_segments where segment_name like 'TEDDY%'; SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE ---------------------------- --------------- -------------------- TEDDY P_DEFAULT TABLE PARTITION TEDDY SYS_P621 TABLE PARTITION TEDDY SYS_P622 TABLE PARTITION TEDDY SYS_P623 TABLE PARTITION TEDDY_LOCAL_START_DATE_IX P_DEFAULT INDEX PARTITION TEDDY_LOCAL_START_DATE_IX SYS_P621 INDEX PARTITION TEDDY_LOCAL_START_DATE_IX SYS_P622 INDEX PARTITION TEDDY_LOCAL_START_DATE_IX SYS_P623 INDEX PARTITION TEDDY_PK INDEX 9 rows selected.
As the TEDDY_LOCAL_START_DATE_IX index is on the partition key we’ve chosen (START_DATE), Oracle has converted this index to be locally partitioned.
Teddy has one final tip to impart.
If you want to query a specific partition of a table, as well as specifying the partition name…
select count(*)
from teddy partition(SYS_P623)
where message = 'Woof'
/
…you can also specify a value that falls within the partition using PARTITION FOR…
select count(*)
from teddy partition for(date '2022-10-30')
where message = 'Growl'
/
This syntax has been around since at least 11g, but he knows I seldom use them and it would be good to have examples to hand.
Building a Master-Detail Drill-Down with APEX
APEX is one of those technologies that I get to use intensively for a short period of time and then don’t touch for years.
As I’ve recently acquired an OCI Free Tier Database (version 19c) and, consequently, Oracle update the APEX version for me (currently 22.3), now seems like a good time to see how it’s evolved since I last had a play.
It just so happens that I’m in need of a UI for my hastily constructed World Cup Wallchart application.
What I’m going to look at here is :
- how to use the APEX Drill-Down creation wizard to create related application pages
- how I can customize the generated pages without writing much code
Incidentally, I’ve put the database source-code for the app on Github should you wish to take a look.
Health Warning : when I say “hastily constructed”, I’m not kidding.
On the plus side, I have spent some time on the physical data model, with special attention made to the table relations.
“OK Boomer”, I hear you say, “but how’s that going to help with the UI ?” Let’s find out…
RequirementsI want to see a list of matches in the tournament.
For each match, I want to be able to enter the score and record anything that happened in the match (e.g. goal scorers).
The Physical Data Model underpinning this functionality looks like this :

After reviewing the options available in APEX, it looks like a Drill-Down Master Detail will best meet this requirement…
APEX Drill-Down WizardI’ve already created a new APEX application, based on the WC22 database schema.Now I want to create a page :

…specifically a Master Detail….

…Drill Down…

I’ve called the first page Tournament Fixtures.
The Master Data Source (parent table) is FIXTURES :

The Primary Key Column1 is already populated as FIXTURES has a Primary Key defined on the ID column :

I’ve set MATCH_DATE as the Form Navigation Order column.
The Create Master Detail step has Show Only Related Tables toggled On, which makes searching the drop-down list of tables to use as the detail rather simple :

…and in the next screen, APEX has pre-populated the required details using the Primary and Foreign Key definitions on the MATCH_DETAILS table :

Now let’s see what APEX has made of all that….

The good news is that APEX seems to have realised that the STAGE_CID, T1_CID and T2_CID are actually Foreign Keys (FKs) to lookup tables in the application and pulled through the appropriate NAME value from the lookup in question.
The numeric columns are all displayed as floats, which is fair enough as the underlying columns are defined as NUMBER rather than INTEGER. It’s almost as if it was hastily constructed :).
Less encouraging is that the columns containing “Draw” codes seem to have incorrect values.
If we click on the link icon and take a look at the drill-down we can see a similar story :

All in all, it’s a reasonable start considering I haven’t had to write a line of code to generate either of these pages. However, I will need to do some tweaking to get the UI into a useable state.
Let’s start with the Report.
The Tournament Fixtures Reports In the Page DesignerFirst of all, there are a number of columns where the data appears to be incorrect :
- GROUP_CID
- T1_DRAW_CID
- T2_DRAW_CID
- WINNER_DRAW_CID
- WINNER_DRAW_CID
Most of these won’t be needed in the final report but it’s probably wise to correct them, just in case I stumble across a bug.
We can see that APEX has defined GROUP_CID to have a Type of Plain Text (based on List of Values) and created a List of Values (LOV) on GROUPS.WINNER_DRAW_CID to use as it’s lookup.

As a consequence of this, the value it displays on the report is taken from the LOV rather than being the actual value in the underlying database row.
To fix this, change the Type to Plain Text, which automatically removes the association with the LOV :

The “Draw” columns (T1_DRAW_CID, T2_DRAW_CID, WINNER_DRAW_CID and LOSER_DRAW_CID) are similarly afflicted (LOV on DRAW.STAGE_CID) and the solution is the same.
To ensure that the Score and Points columns ( T1_SCORE, T2_SCORE, T1_PENS, T2_PENS, T1_PTS, T2_PTS) are displayed as integers, change the Appearance/Format Mask to : 999

One other formatting change I’d like to make is for the date of the fixture to include the day of the week as I’ll be referring to the application to see who is playing and when so I can plan my evenings accordingly.
Therefore, I’ve set the format mask for MATCH_DATE to : fmDay, fmDD fmMonth, YYYY
This application should not have functionality to create Fixtures as I’ve already added them in separately.
Therefore, we need to remove the CREATE button, which links to the Match Events page.
To accomplish this, I’ve simply deleted this button in the Page Designer (right-click on the button and select Delete).
Next, I’m going to remove the columns that have no relevance to the user on this report.
The “Draw” columns are part of the application logic to work out which teams go into each knockout fixture. This functionality is coded in the back end and requires no direct user interaction.
The T1_PTS and T2_PTS columns are virtual columns used to calculate Points totals for the Group Tables.
These are also irrelevant to this report.
There are several ways to prevent these columns being displayed in the report, but the one I’ve chosen is to set the Server-Side Condition to Never

To change the column labels so that they are a bit more user-friendly is a simple matter of changing the Heading property :

The changes are :
ColumnLabelSTAGE_CIDStageGROUP_CIDGroupT1_CIDTeam 1T2_CIDTeam 2T1_SCORET1 GoalsT2_SCORET2 GoalsT1_PENST1 ShootoutT2_PENST2 ShootoutAs a result, the Report is now looking a bit more useable :

I can make further changes in the Report Runtime using the Actions Menu
Reports runtimeUsing various Actions Menu Options, I’ve ordered the Primary Report by Match Date :

I’ve created a Named Report for Group Stage Fixtures.
Note that the Stage and Shootout Columns have been removed from the Report as they aren’t relevant for Group Stage matches.

There’s also a Named Report for Knockout Fixtures.
In this case, the Shootout Columns are relevant but Group isn’t :

The final Named Report is Upcoming Fixtures – i.e. fixtures that have no result entered for them as yet :

As we’ve seen, this page contains two Regions :
- a Form on the FIXTURES table
- an Interactive Grid on the MATCH_EVENTS table
Before we go any further, we need to make the Form useable when updating the base table records.
Remember, T1_PTS and T2_PTS in the FIXTURES table are virtual columns. At the moment, the Form will try to set these values on update, resulting in :

Also, the only values we want to change are in the *_SCORE and *_PENS columns – i.e.
- P3_T1_SCORE
- P3_T2_SCORE
- P3_T1_PENS
- P3_T2_PENS
So let’s make everything else Query Only :

We only want to update records from this Region. We do not want to Create or Delete any.
Therefore, we need to remove the DELETE and CREATE buttons from the page.
This is done by simply right-clicking the buttons in the Page Designer and selecting Delete from the pop-up menu.
The items we don’t want displayed are :
- P3_T1_DRAW_CID
- P3_T2_DRAW_CID
- P3_WINNER_DRAW_CID
- P3_LOSER_DRAW_CID
- P3_T1_PTS
- P3_T2_PTS
To hide them we simply go to Identification and set the Type to Hidden.
The items we do want to display are :
- P3_MATCH_DATE
- P3_STAGE_CID (Tournament Stage )
- P3_GROUP_CID (Group Stage Group – only for Group Stage fixtures)
- P3_T1_CID (Team 1 )
- P3_T2_CID (Team 2 )
- P3_T1_SCORE – Team 1 Goals Scored
- P3_T2_SCORE – Team 2 Goals Scored
For Knockout fixtures only :
- P3_T1_PENS – Team 1 Shootout Penalties Scored
- P3_T2_PENS – Team 2 Shootout Penalties Scored
For the fields we want to display but not update, we need to change the properties as follows…
Identification/Type : Display Only
For P3_STAGE_CID, P3_T1_CID and P3_T2_CID, the value shown is derived from an LOV.
To retain this functionality :
In Settings/Based On : Display Value of List of Values
In List of Values choose :
Type : Shared Component
For P3_STAGE_CID :
List of Values : STAGES.STAGE_NAME
For P3_T1_CID and P3_T2_CID :
List of Values : TEAMS.TEAM_NAME
For all of them :
Display Extra Values : Toggle Off
For the layout changes, All displayed fields have been changed to :
Appearance/Template : Optional
Next, I want to display some fields only under certain conditions.
I only want to show P3_GROUP_CID for a Group Stage fixture.
I only want to show the *PENS fields for a Knockout Stage fixture.
I can do this with a Server-Side Condition property.
For P3_GROUP_CID :
Type : Item is NOT NULL
Item : P3_GROUP_CID
For The *PENS fields :
Type : Item Is NULL
Item : P3_GROUP_CID
Our form now looks like this :

Rather than having each field on it’s own line, I want to display the Match Date, Tournament Stage and (if applicable) the Group on the first line.
To do this, I can simply toggle Off the Layout/Start New Row property for P3_STAGE_CID and P3_GROUP_CID.
Whilst I’m at it, I’ll remove the labels for P3_MATCH_DATE and P3_STAGE_CID as they are now a bit superfluous
The next line will be the teams
For P3_T2_CID set Layout/Label Column Span : 0
This stops the field justifying to the right.
Then Goals
Set P3_T1_SCORE Label to Goals
Set P3_T2_SCORE Label to be blank
Then Penalties
Set P3_T1_PENS Label to Shootout Penalties
Set P3_T2_PENS Label to blank
For the *SCORE and *PENS columns set :
Appearance/Width : 3
Finally, I’ve updated the label on the Save button to say Save rather than Apply Changes.
Match Details Interactive GridBeing an Interactive Grid (IG) the layout requires a bit less tweaking than for a Form.
For EVENT_TYPE_CID, we need to use a List of Values based on the MATCH_EVENT_TYPES table.
First, set Identification/Type : Select List
In the List of Values properties :
Type : SQL Query
SQL Query is :
select event_name as display,
cid as actual
from match_event_types
order by event_name
Display Extra Values : Toggle Off
Display Null Value : Toggle Off
NOTE – I know this is adding to our pile of technical debt and that it would be better to create the LOV as a shared component in case it is needed elsewhere in the application in future. I’ve created the LOV “in-line” here because, to be honest, I can never remember which way round the display and actual value columns should go and it’s quicker to muck about with it here than as a shared component.
Look, I’ll do it later. Promise.
Anyhow, the EVENT_TYPE_CID column can now only be populated with a value from the LOV :

PLAYER_NAME is currently a Textarea so we need to change it to a Type of Text Field.
Also, I want to ensure that the player name is entered in Uppercase so that I don’t have to worry about mixed case records when composing the SQL Query to show the Tournament top scorers. To achieve this :
Settings/Text Case : Upper
PLAYER_TEAM_CID should also be an LOV but should only be Team 1 or Team 2 for the Fixture currently in context.
Once again, we set the Type to Select List.
In the List of Value properties :
Type : SQL Query
This time, the query references values in the items on the current page :
select team_name as display,
cid as actual
from teams
where cid in (:P3_T1_CID, :P3_T2_CID)
Display Extra Values : Toggle Off
Display Null Value : Toggle Off

On this occasion, I’d argue that specifying the LOV in the column properties is reasonable, as it is specific to the FIXTURES record currently in context.
That brings us to MATCH_MINUTE, which is a little more complex than you might think at first glance.
Timekeeping in Association Football is done a bit differently from other sports. Rather than stopping the clock when play is held up for any reason, time is added to the end of a half. By convention this Added Time is recorded as a
“+” figure appended to the last minute of the half e.g. 45+4 = 4th minute of added time at the end of the first half.
The second half begins counting from 46 so, in game terms, the 46th minute is chronologically after 45+4.
To make sure that our Match Events are displayed in chronological order, I need to set the Match Details Region Order By Clause to :
case instr(match_minute, '+',1,1) when 0 then to_number(match_minute) else to_number(substr(match_minute,1,instr(match_minute, '+',1,1) -1)) + (to_number(substr(match_minute,-1, instr(match_minute, '+',-1,1))) *.01) end
Note that the above is formatted for readability.
The Order By Clause field appears to be limited to 256 characters so I had to input this code on a single line.
The final step is to add some highlights for Yellow Cards, Red Cards and Goals.
This can all be done from the Actions Menu in the runtime.
The result looks like this :

So, with a reasonable Physical Data Model and very little additional code, we’ve managed to produce a serviceable two-screen User Interface.
Who are you calling “Boomer” ?
Flipping Plans ! Retrieving past execution plans from AWR
SQL is a special programming language.
It’s quite possible that I’ll be interrupted at this point by some Tech Bro arguing that SQL is not a programming language actually. The phrase “Turing Complete” may already be heading my way.
To save time, I’ll simply observe that I get paid folding money to write SQL so I don’t really care whether it lacks some the properties of Powerpoint (e.g. being Turing Complete).
As I was saying SQL is almost unique among modern programming languages in that it is a 4GL.
In other words it is declarative – you use it to describe the information you want to retrieve. It is then up to the database engine to figure out how best to fulfill your requirements.
Oracle likes to remind me of this every so often when, after months of efficient and trouble-free running, it suddenly gets bored and chooses a new and exciting execution plan which causes a query to take hours rather than minutes.
What follows is a short exploration of how we can tell this is happening and how to retrieve current and previous execution plans for a SQL statement from Oracle’s Active Workload Repository (AWR).
Before we go on, however…
WARNING – LicensingBefore you begin exploring AWR, please make sure that you have a license for the Oracle Diagnostics Pack ( or the Diagnostics and Tuning Pack in earlier Oracle versions).
The documentation for every Oracle version includes details of the licensing approach – here’s the one for 19c, which is the version I’ll be using for this post.
Environment and SetupIn an appropriately licensed production environment, it’s usual for AWR to have been configured appropriately. However, I’m using an Oracle Developer Day Virtual Box Image running 19c Enterprise Edition, so I’ll need to do some tweaking.
First of all (connected as SYS as SYSDBA on the PDB), I need to check the AWR parameters :
show parameter awr NAME TYPE VALUE --------------------------- ------- ----- awr_pdb_autoflush_enabled boolean FALSE awr_pdb_max_parallel_slaves integer 10 awr_snapshot_time_offset integer 0
I can see that I need to enable the autoflush :
alter system set awr_pdb_autoflush_enabled = true; System SET altered
At present, the interval between AWR snapshots is a bit lengthy :
select * from cdb_hist_wr_control;

We’re going to shorten this interval to 10 minutes. Also, were going to record details of as many statements as possible in the snapshot :
begin
dbms_workload_repository.modify_snapshot_settings(
interval => 10,
topnsql => 'MAXIMUM');
end;
/

Even with AWR configured like this, it’s not guaranteed to capture every single statement run in a snapshot so I need to create a sizeable table to make sure my queries make the cut.
DBA_OBJECTS contains around 79K rows so I’ll use that as a basis for my table :
create table chunky as select * from dba_objects; begin for i in 1..100 loop insert into chunky select * from dba_objects; -- commit after each iteration as we're a bit tight -- on resources commit; end loop; end; / create index chunky_owner on chunky(owner); exec dbms_stats.gather_table_stats(user, 'CHUNKY');
CHUNKY contains 7.9 million rows.
Now that’s done, we should be able to start the “slideshow” …
Simulating a Plan FlipI’m going to start with a new snapshot…
exec dbms_workload_repository.create_snapshot;
Now to run a query against the table. In order to make it easier to find, I’ll pretend that I’m working in a Turing Complete medium…
select /* slide 1 */ *
from chunky
where owner = 'HR';
Now that I’ve executed the query, I should be able to find it’s SQL_ID :
select sql_id, sql_text
from v$sql
where sql_text like 'select%/*%slide 1%*/%'
and sql_text not like '%v$sql%'
/
SQL_ID SQL_TEXT ------------- ------------------------------------------------------ 9nwrantgwhcta select /* slide 1 */ * from chunky where owner = 'HR'
Before we run the query again, let’s ensure that the optimizer uses a different plan for the next execution by removing the index :
alter index chunky_owner invisible;
To simulate a separate batch run, the next run will be in a new snapshot :
exec dbms_workload_repository.create_snapshot;
select /* slide 1 */ *
from chunky
where owner = 'HR';
exec dbms_workload_repository.create_snapshot;
The two runs should now be in separate AWR snapshots.
Identifying an Execution Plan ChangeIt’s the morning after the overnight batch run and I’m trying to figure out why my job has taken much longer than normal.
Thanks to this extremely useful query from Kerry Osborne, I can see that the execution plan has changed from when the query was first captured in Snapshot 29 to the latest execution in Snapshot 32 :
select
snap.snap_id,
snap.instance_number,
begin_interval_time,
sql_id,
plan_hash_value,
nvl(executions_delta,0) execs,
(elapsed_time_delta/decode(nvl(executions_delta,0),0,1,executions_delta))/1000000 avg_etime,
(buffer_gets_delta/decode(nvl(buffer_gets_delta,0),0,1,executions_delta)) avg_lio
from dba_hist_sqlstat stat, dba_hist_snapshot snap
where sql_id = '9nwrantgwhcta'
and snap.snap_id = stat.snap_id
and snap.instance_number = stat.instance_number
and executions_delta > 0
order by 3
/

The component lines of these execution plans can be found in DBA_HIST_SQL_PLAN :
select *
from dba_hist_sql_plan
where sql_id = '9nwrantgwhcta'
/
Alternatively, if you’d like to the plans nicely formatted, you can use DBMS_XPLAN.DISPLAY_WORKLOAD_REPOSITORY.
A couple of points to note here :
- the DISPLAY_AWR function is depracated from 12c onwards in favour of this procedure
- if, as in this case, we’re viewing the contents of a Pluggable Database Repository then we need to specify this in the AWR_LOCATION parameter.
Other than that, we simply need to provide the SQL_ID and PLAN_HASH_VALUE.
For the current plan (in SNAP_ID 32 above) :
select *
from table(
dbms_xplan.display_workload_repository(
sql_id => '9nwrantgwhcta',
plan_hash_value => 1105471336,
awr_location => 'AWR_PDB'));

We can see the old plan ( from SNAP_ID 29) by running :
select *
from table(
dbms_xplan.display_workload_repository(
sql_id => '9nwrantgwhcta',
plan_hash_value => 317341109,
awr_location => 'AWR_PDB'));

As well as the aforementioned article by Kerry Osborne, I also found the following quite useful :
Connor McDonald’s Superhero alter ego provides an overview of how AWR works here
Generating Journal Triggers with help from DBMS_UTILITY
It’s good to see modern working practices being implemented in Government, especially in Downing Street, where they seem to have embraced the idea of hot-desking.
Not to say there isn’t room for improvement. In the recent leadership election, candidates claimed a total of 380 nominations…from an electorate of 357. Given the state of the Economy, you do rather hope they chose the candidate that can count.
In order to distract myself from the ongoing shenanigans in Whitehall, I’m going address myself to one of the regular chores facing anyone working on a new Data Warehousing application – implementing journal triggers on your mapping tables.
Any variation on the theme of Star Schema is likely to result in numerous mapping tables, the contents of which will affect how the application behaves.
Recording DML changes to data in these tables is therefore important to maintain an audit trail, or even just to aid debugging.
Rather than writing nearly identical triggers for each of these tables, wouldn’t it be better if we could automate the process to some degree ?
As well as the usual data-dictionary lookups, I’m going to keep things interesting by :
- creating a journal table for the trigger to insert into
- ensuring consistent formatting (canonicalization) of “key” character values – by making them all uppercase
- keeping my dynamic SQL statements readable by using a template
- converting comma-separated lists to PL/SQL tables and back again with DBMS_UTILITY
The trigger will need to do the following :
- capture an updated record before the update is applied.
- capture a deleted record
- ensure that “key” values on an inserted or updated record are canonicalized
Note that we won’t be recording newly inserted records as they’ll be available in the base table anyway.
Also, in this context “key” columns are those that are either used in the predicate of a lookup or returned as the result of a lookup rather than necessarily being part of a physical key in the lookup table.
We’ll need a journal table to write to. This table will contain all of the non-virtual columns of the base table plus a column for the DML operation being performed, the user (or os_user) and a timestamp.
Using “<>” to signify placeholders, a trigger implementing this functionality would look something like this :
create or replace trigger <trigger_name>
before insert or update or delete on <base_table>
for each row
declare
v_user varchar2(4000);
v_operation varchar2(25);
begin
v_user := sys_context('userenv', 'os_user');
v_operation := case when inserting then 'INSERT' when updating then 'UPDATE' else 'DELETE' end;
if inserting or updating then
<canonicalize_cols>
end if;
if updating or deleting then
insert into <jnl_table> (
operation, os_user, change_ts,
<col_list>)
values( v_operation, v_user, systimestamp,
<old_list>);
end if;
end;
I know what you’re thinking, that’s going to look messy once we start building the actual statements dynamically to create each trigger. After all, we’re building a DDL statement, which means that EXECUTE IMMEDIATE will not be accepting bind variables. Therefore we’ll need to do lots of concatenation.
Well, maybe not.
Replacing the placeholders with actual values rather than concatenating them in, we should be able to keep our code fairly readable.
Before we get to that though, we really should look at …
Generating the Journal TableThe Journal tables consist of the permanent columns from the base table plus
- OPERATION – to identify the DML operation for which the record was created
- OS_USER – the Operating System user performing the operation
- CHANGE_TS – the timestamp of the record
Now, we could use the data dictionary to painstakingly reconstruct the required columns.
Alternatively, we can simply do this :
create table <jnl_table> as
select
rpad(' ', 6) as operation,
rpad(' ', 4000) as os_user,
systimestamp as change_ts,
t.*
from <base_table> t
where 1=0
/
The rpad for OPERATION and OS_USER will cause those columns to be defined as varchar2 and large enough to store the padded string.
At this point it’s probably a good idea to look at the base table we’ll be using in the examples that follow :
create table cabinet_members(
office varchar2(500),
first_name varchar2(255),
last_name varchar2(255) not null,
email varchar2(500),
start_date date,
end_date date,
constraint cabinet_members_pk primary key (office, start_date))
/
Using this as an example, the journal table we create using this technique would look like this :
create table cabinet_members_jnl as
select rpad(' ', 25) as operation,
rpad(' ', 4000) as os_user,
systimestamp as change_ts,
t.*
from cabinet_members t
where 1=0
/

The list of columns we need to canonicalize will be passed into our journal trigger generation procedure as a comma-delimited list at runtime.
The columns to insert into can be retrieved from the data dictionary.
For CABINET_MEMBERS, the columns to canonicalize are :
office, first_name, last_name
For each element of this list, we’ll want to :
- make sure that it is the name of a column in the base table
- generate code to uppercase the new value that’s being inserted or updated
If you’re looking for a simple way to loop through this comma-delimited list that doesn’t involve too much typing, DBMS_UTILITY.COMMA_TO_TABLE can help. For example :
clear screen
set serverout on
declare
v_col_list varchar2(4000) := 'office, first_name, last_name';
v_col_arr dbms_utility.lname_array;
v_tab_count binary_integer;
begin
dbms_utility.comma_to_table(
list => v_col_list,
tablen => v_tab_count,
tab => v_col_arr);
dbms_output.put_line('Table populated with '||v_tab_count||' elements');
for i in 1..v_tab_count loop
dbms_output.put_line(trim(v_col_arr(i)));
end loop;
end;
/
Run this and you’re rewarded with…
Table populated with 3 elements office first_name last_name PL/SQL procedure successfully completed.
However, things are not quite as they seem.
Let’s try running that script again, but this time add a second loop using the COUNT attribute of the PL/SQL table variable as well as the tablen parameter output value from the procedure call…
clear screen
set serverout on
declare
v_col_list varchar2(4000) := 'office, first_name, last_name';
v_col_arr dbms_utility.lname_array;
v_tab_count binary_integer;
begin
dbms_utility.comma_to_table(
list => v_col_list,
tablen => v_tab_count,
tab => v_col_arr);
dbms_output.put_line('Table populated with '||v_tab_count||' elements');
for i in 1..v_tab_count loop
dbms_output.put_line(i||' - '||trim(v_col_arr(i)));
end loop;
-- NOTE - we can use v_col_arr.count but this includes an empty element...
dbms_output.put_line(q'[...and here's all ]'||v_col_arr.count||' of them');
for j in 1..v_col_arr.count loop
dbms_output.put_line(j||' - '||v_col_arr(j));
end loop;
end;
/
Table populated with 3 elements 1 - office 2 - first_name 3 - last_name ...and here's all 4 of them 1 - office 2 - first_name 3 - last_name 4 - PL/SQL procedure successfully completed.
In our case, we’ll want to use the element count provided by the procedure call rather than the table variable attribute.
For the columns in the Journal table that we want to insert into, we could just use a listagg in the query against the dictionary, but that would miss a perfect opportunity to demonstrate DBMS_UTILITY.TABLE_TO_COMMA…
declare
v_tab_cols_arr dbms_utility.lname_array;
v_tablen binary_integer;
v_col_list varchar2(32767);
begin
select column_name
bulk collect into v_tab_cols_arr
from user_tab_columns
where table_name = 'CABINET_MEMBERS'
order by column_id;
dbms_output.put_line(v_tab_cols_arr.count||' columns in table');
dbms_utility.table_to_comma(
tab => v_tab_cols_arr,
tablen => v_tablen,
list => v_col_list);
dbms_output.put_line(v_col_list);
end;
/
6 columns in table OFFICE,FIRST_NAME,LAST_NAME,EMAIL,START_DATE,END_DATE PL/SQL procedure successfully completed.The Journal Trigger Generator
Now we have all of the elements we require to populate our placeholder strings, let’s see what the final package might look like :
create or replace package generate_jnl_trigger
authid current_user
as
procedure build_trigger(
i_base_table in varchar2,
i_jnl_table in varchar2 default null,
i_trigger_name in varchar2 default null,
i_canonicalize_cols in varchar2 default null);
end;
/
create or replace package body generate_jnl_trigger
as
function check_object_name( i_name in varchar2)
return boolean
is
begin
-- Ensure that object name does not contain any suspicious characters
return length(regexp_replace(i_name, '[[:alnum:]]|[$_#]')) = 0;
end check_object_name;
procedure build_trigger(
i_base_table in varchar2,
i_jnl_table in varchar2 default null,
i_trigger_name in varchar2 default null,
i_canonicalize_cols in varchar2 default null)
is
v_base_table user_tables.table_name%type;
v_tab_name_max_len pls_integer;
v_jnl_table varchar2(4000);
v_trigger varchar2(4000);
v_jnl_tab_ddl_stmnt clob;
v_stmnt clob;
v_canon_count binary_integer;
v_canon_arr dbms_utility.lname_array;
v_canon_stmnt clob;
v_target_col_arr dbms_utility.lname_array;
v_ins_cols varchar2(32767);
v_old_cols varchar2(32767);
C_TRIGGER_TEMPLATE constant clob :=
q'[
create or replace trigger <trigger_name>
before insert or update or delete on <base_table>
for each row
declare
v_user varchar2(4000);
v_operation varchar2(25);
begin
v_user := sys_context('userenv', 'os_user');
v_operation := case when inserting then 'INSERT' when updating then 'UPDATE' else 'DELETE' end;
if inserting then
<canonicalize_cols>
end if;
if updating or deleting then
insert into <jnl_table> (
operation, os_user, change_ts,
<col_list>)
values( v_operation, v_user, systimestamp,
<old_list>);
end if;
end; ]';
C_JNL_TAB_DDL_TEMPLATE constant clob :=
q'[
create table <jnl_table> as
select
rpad(' ', 6) as operation,
rpad(' ', 4000) as os_user,
systimestamp as change_ts,
t.*
from <base_table> t
where 1=0]';
C_CANON_TEMPLATE constant varchar2(4000) := ':new.<col_name> := upper(:new.<col_name>);';
begin
if i_base_table is null then
raise_application_error(-20000, 'Base Table must be specified.');
end if;
-- Make sure that we've been passed the name of an existing table
v_base_table := dbms_assert.sql_object_name(i_base_table);
--
-- Determine the maxiumum length of a table_name in this Oracle version
--
select data_length
into v_tab_name_max_len
from all_tab_columns
where table_name = 'USER_TABLES'
and column_name = 'TABLE_NAME';
v_trigger := nvl(i_trigger_name, substr(v_base_table,1,(v_tab_name_max_len -4))||'_TRG');
if not check_object_name(v_trigger) then
raise_application_error(-20010, 'Trigger Name '||v_trigger||' not allowed');
end if;
v_jnl_table := nvl(i_jnl_table, substr(v_base_table,1, (v_tab_name_max_len -4))||'_JNL');
if not check_object_name(v_jnl_table) then
raise_application_error(-20020, 'Trigger Name '||v_jnl_table||' not allowed');
end if;
-- Create the Journal table
-- Replace the placeholder text in the template with the actual values.
-- Doing this one placeholder at-a-time for readability
v_jnl_tab_ddl_stmnt := replace(C_JNL_TAB_DDL_TEMPLATE, '<jnl_table>', v_jnl_table);
v_jnl_tab_ddl_stmnt := replace(v_jnl_tab_ddl_stmnt, '<base_table>', v_base_table);
execute immediate v_jnl_tab_ddl_stmnt;
v_stmnt := replace(C_TRIGGER_TEMPLATE, '<trigger_name>', v_trigger);
v_stmnt := replace(v_stmnt, '<base_table>', i_base_table);
v_stmnt := replace(v_stmnt, '<jnl_table>', v_jnl_table);
if i_canonicalize_cols is not null then
dbms_utility.comma_to_table( i_canonicalize_cols, v_canon_count, v_canon_arr);
-- Remember to use the count returned from the procedure...
for i in 1..v_canon_count loop
v_canon_stmnt := v_canon_stmnt ||
replace(C_CANON_TEMPLATE, '<col_name>', trim(v_canon_arr(i)))||chr(10);
end loop;
v_stmnt := replace(v_stmnt, '<canonicalize_cols>', v_canon_stmnt);
else
-- Just do nothing
v_stmnt := replace(v_stmnt, '<canonicalize_cols>', 'null;');
end if;
select column_name
bulk collect into v_target_col_arr
from user_tab_columns
where table_name = upper(i_base_table)
order by column_id;
-- We know that these columns belong to the base table because we've just got them from the data dictionary.
-- However, we want to double check there's nothing fishy about them.
-- Check that they contain only valid characters ( which may not be the case if the column names are quoted).
-- I know I don't have any quoted column names in my data model...
for i in 1..v_target_col_arr.count loop
if not check_object_name(v_target_col_arr(i)) then
raise_application_error(-20030, 'Column name contains invalid characters : '||v_target_col_arr(i));
end if;
v_ins_cols := v_ins_cols||v_target_col_arr(i)||', ';
v_old_cols := v_old_cols||':old.'||v_target_col_arr(i)||', ';
end loop;
v_ins_cols := rtrim(v_ins_cols, ', ');
v_old_cols := rtrim(v_old_cols, ', ');
v_stmnt := replace(v_stmnt, '<col_list>', v_ins_cols);
v_stmnt := replace(v_stmnt, '<old_list>', v_old_cols);
execute immediate(v_stmnt);
end build_trigger;
end;
/
If we execute this for CABINET_MEMBERS ( having first dropped the CABINET_MEMBERS_JNL table we created earlier)…
begin
generate_jnl_trigger.build_trigger(
i_base_table => 'cabinet_members',
i_canonicalize_cols => 'office, first_name, last_name');
end;
/
…the Journal table is created, together with a DML trigger on the table :
create or replace trigger cabinet_members_TRG
before insert or update or delete on cabinet_members
for each row
declare
v_user varchar2(4000);
v_operation varchar2(25);
begin
v_user := sys_context('userenv', 'os_user');
v_operation := case when inserting then 'INSERT' when updating then 'UPDATE' else 'DELETE' end;
if inserting then
:new.office := upper(:new.office);
:new.first_name := upper(:new.first_name);
:new.last_name := upper(:new.last_name);
end if;
if updating or deleting then
insert into cabinet_members_JNL (
operation, os_user, change_ts,
OFFICE, FIRST_NAME, LAST_NAME, EMAIL, START_DATE, END_DATE)
values( v_operation, v_user, systimestamp,
:old.OFFICE, :old.FIRST_NAME, :old.LAST_NAME, :old.EMAIL, :old.START_DATE, :old.END_DATE);
end if;
end;
Let’s run some DML on the table to check the effect :
insert into cabinet_members (
office,
first_name,
last_name,
email,
start_date,
end_date)
values(
'Prime Minister',
'Boris',
'Johnson',
'bigdog@partyhq.co.uk',
date '2019-07-24',
date '2022-09-05');
commit;
insert into cabinet_members (
office,
first_name,
last_name,
email,
start_date,
end_date)
values(
'pRImE mINisteR',
'Liz',
'Truss',
'liz@lettuce.org',
date '2022-09-05',
date '2022-10-25');
commit;
insert into cabinet_members (
office,
first_name,
last_name,
email,
start_date )
values(
'prime minister',
'rishi',
'sunak',
'rishi@tolduso.com',
date '2022-10-25');
commit;
We can see that the office, first_name and last_name values have been canonicalized :
select first_name, last_name
from cabinet_members
where office = 'PRIME MINISTER'
order by end_date nulls last
/
OFFICE FIRST_NAME LAST_NAME
-------------------- ------------------------------ ------------------------------
PRIME MINISTER BORIS JOHNSON
PRIME MINISTER LIZ TRUSS
PRIME MINISTER RISHI SUNAK
If we now do some tidying up…
delete from cabinet_members
where end_date is not null;
commit;
…we can see that these deletes have been recorded in the Journal table :
select * from cabinet_members_jnl;

Saving the World from Fat-finger moments – with regexp_like
It’s not uncommon for a database application to have it’s behaviour defined, to an extent at least, by records in reference data tables.
By it’s nature, this data is static and the tables in which it resides tend to contain comparatively few rows. However, such an approach can be susceptible to erroneous data entry, especially where key values are concerned.
Having spent many an “entertaining” afternoon/evening/dead-of-night, trying to hunt down some mystery bug, only to find that one of these values includes an extraneous space or invisible control character, I’ve come to appreciate the ability of regexp_like to point these things out.
The code examples that follow should be based on some sensible data set, probably from the HR demo schema. However, Dr Fatfinger does sound rather like a Bond villain…
Over the years, MI6 has built up a list of individuals whose activities bear close scrutiny.
They want to know if any of these people start bulk buying piranha food, or looking at hollowed out volcanoes on Zoopla :
create table megalomaniacs (
evil_genius varchar2(100),
start_date date,
constraint megalomaniacs_pk primary key (evil_genius))
/
insert into megalomaniacs( evil_genius, start_date)
values('DOCTOR NO', to_date('01-OCT-1962', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('ERNST STAVRO'||chr(256)||' BLOFELD', to_date('10-OCT-1963', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values(' AURIC GOLDFINGER', to_date('17-SEP-1964', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('EMILIO LARGO', to_date('09-DEC-1965', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('DOCTOR KANANGA', to_date('27-JUN-1973', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('FRANCISCO SCARAMANGA', to_date('19-DEC-1974', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('KARL STROMBERG', to_date('7-JUL-1977', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('HUGO DRAX ', to_date('26-JUN-1979', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('ARISTOTLE KRISTATOS', to_date('24-JUN-1981', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('KAMAL KHAN', to_date('06-JUN-1983', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('MAX ZORIN', to_date('22-MAY-1985', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('GENERAL KOSKOV', to_date('29-JUN-1987', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('FRANZ SANCHEZ', to_date('13-JUN-1989', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('ALEC TREVELYAN', to_date('13-NOV-1995', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('ELLIOT CARVER', to_date('09-DEC-1997', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('ELEKTRA KING', to_date('08-NOV-1999', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('COLONEL TAN-SUN MOON', to_date('20-NOV-2002', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('MR WHITE', to_date('14-NOV-2006', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('DOMINIC GREEN', to_date('20-OCT-2008', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('RAOUL SILVA', to_date('23-OCT-2012', 'DD-MON-YYYY'));
insert into megalomaniacs( evil_genius, start_date)
values('LYUTSIFER SAFIN ', to_date('28-SEP-2021', 'DD-MON-YYYY'));
commit;
However, some of these people are slipping through the net…
select evil_genius, to_char(start_date, 'DD-MON-YYYY') as start_date
from megalomaniacs
where evil_genius in ( 'MR WHITE', 'LYUTSIFER SAFIN', 'AURIC GOLDFINGER', 'ERNST STAVRO BLOFELD');
EVIL_GENIUS START_DATE
------------------------------ --------------------
MR WHITE 14-NOV-2006
1 row selected.
We suspect the handy work of Dr Fatfinger, possibly through the activities of those notorious henchpeople, Copy and Paste.
Fortunately, we can use a regexp to identify any records that contain :
- a leading or trailing non-printing character
- a control character
select evil_genius, length( evil_genius),
to_char(start_date, 'DD-MON-YYYY') as start_date
from megalomaniacs
where regexp_like(evil_genius, '^[[:space:]]|[[:cntrl:]]|[[:space:]]$', 'i');

Linux – Checking that two files contain the same data but in a different order – Sorted!
Trying to regression test a change to a feed-file generation program can be tricky. Whether the format is CSV or some fashionable markup language, the ordering of the result set tends to be unimportant in such circumstances.
When testing, we need to verify that the files produced by the new version of the program contain the same data as those produced by the old version, irrespective of the order in which the records are written.
Recently, I was rescued from my struggle with just such a problem by my colleague, Don Thomson, who imparted some (Linux) Jedi wisdom resulting in a simple yet effective solution, involving an inventive combination of Linux utilities.
What we’re going to look at here is :
- comparing files with diff
- using sort to give diff a hand
- comparing sets of files in different directories using sum
- ignoring trailer records in delimited files using grep or head
The first file we’ll use in our example is called episodes.txt and contains the following :
one
two
three
four
five
six
seven
eight
nine
The file we’re comparing it to is called releases.txt :
four
five
six
one
two
three
seven
eight
nine
As you can see, the files contain the same data but the first is in numeral order and the second is the result of what may be considered a “Skywalker” sort.
Predictably, diff decides that they are not identical :
diff episodes.txt releases.txt
1,3d0
< one
< two
< three
6a4,6
> one
> two
> three
Before we go any further, let’s use some switches to minimize the diff output as, for the purposes of this exercise, we just want to tell whether or not the files are the same.
diff -qs episodes.txt release.txt
Files episodes.txt and releases.txt differ
Fortunately, Don knows just the thing to help us perform a data – rather than line-by-line – comparison…
Sorting it outLet’s see what happens when we use the sort command on episodes.txt :
sort episodes.txt
eight
five
four
nine
one
seven
six
three
two
Interesting. It seems to have sorted the file contents ( “data”) into alphabetical order. Let’s see what it does with releases.txt :
sort releases.txt
eight
five
four
nine
one
seven
six
three
two
That’s useful. The output is identical. Now we just need to pass the sort output for each file to diff.
We can do this using sub-shells. As we’re running in Bash, the syntax for this is :
diff -qs <(sort episodes.txt) <(sort releases.txt)
Files /dev/fd/63 and /dev/fd/62 are identical
Note that the filenames in the output are the temporary files that hold the output (stdout) from each sub-shell.
Just to prove that this solution does detect when the rows of data are different in the files, let’s introduce a “rogue” one…
echo 'three-and-a-bit' >>episodes.txt
diff -qs <(sort episodes.txt) <(sort releases.txt)
Files /dev/fd/63 and /dev/fd/62 differ
Comparing two sets of files in different directories with a checksum
Whilst this approach works really well for comparing two files, you may find that it’s not that quick when you’re comparing a large number of large files. For example, we have a directory containing files that we generated before making any changes to our (fictitious) program :
mkdir baseline_text_files
ls -l baseline_text_files/*.txt
-rw-rw-r-- 1 mike mike 14 Sep 10 13:36 baseline_text_files/originals.txt
-rw-rw-r-- 1 mike mike 14 Sep 10 13:36 baseline_text_files/prequels.txt
-rw-rw-r-- 1 mike mike 17 Sep 10 13:36 baseline_text_files/sequels.txt
The file contents are :
cat originals.txt
four
five
six
cat prequels.txt
one
two
three
cat sequels.txt
seven
eight
nine
Files generated after modifying the program are in the new_text_files directory :
cat originals.txt
four
six
five
cat prequels.txt
three
two
one
cat sequels.txt
eight
nine
seven
Don’s rather neat alternative to diffing each pair of files is to create a checksum for each file and write the output to a temporary file. We then just diff the files with the output for each directory.
There are a number of utilities you can use to do this and the complexity of the checksum algorithm used may impact the runtime for a large number of files.
In light of this, we’ll be using sum, which seems to be the simplest and therefore (in theory) the fastest.
A quick test first :
sum baseline_text_files/originals.txt
45749 1
The first number is the checksum. The second is the file block count.
Now we’ve identified the required utilities, this script should do the job. I’ve called it data_diff.sh and you can download it from Github should you feel inclined to do so. The link is here.
#!/bin/sh
# Difference between files in two directories
orig_dir=$1
new_dir=$2
TMPFILE1=$(mktemp)
TMPFILE2=$(mktemp)
for file in $orig_dir/*
do
sort $file |sum >> $TMPFILE1
done
for file in $new_dir/*
do
sort $file|sum >>$TMPFILE2
done
diff -qs $TMPFILE1 $TMPFILE2
is_same=$?
if [ $is_same -eq 1 ]
then
echo 'Files do not match'
else
echo 'Files are identical'
fi
#delete the temporary files before exiting, even if we hit an error
trap 'rm -f $TMPFILE1 $TMPFILE2' exit
Run this and we get :
sh data_diff.sh baseline_text_files new_text_files
Files /tmp/tmp.OshLmwGL0J and /tmp/tmp.Uz2mUa0SSY are identical
Files are identical
If we introduce a difference in one of the existing files…
echo 'SOLO' >>new_text_files/sequels.txt
sh data_diff.sh baseline_text_files new_text_files
Files /tmp/tmp.4OGnjQls0S and /tmp/tmp.L7OUZyGUzl differ
Files do not match
Unsurprisingly, the script will also detect a difference if we’re missing a file…
touch baseline_text_files/tv_series.txt
sh data_diff.sh baseline_text_files new_text_files
Files /tmp/tmp.LsCHbhxK1D and /tmp/tmp.358UInXSJX differ
Files do not match
Ignoring Trailer Records in delimited files
With text delimited files, it’s common practice to include a trailer record at the end of the file to confirm it is complete.
This record will typically include the date (or timestamp) of when the file was created.
Such a file might look like this in the baseline_files directory
For example :
cat baseline_files/episodes.csv
HEADER|episode|title|release_year
I|The Phantom Menace|1999
II|Attack of the Clones|2002
III|Revenge of the Sith|2005
IV|A New Hope|1977
V|The Empire Strikes Back|1980
VI|Return of the Jedi|1983
VII|The Force Awakens|2015
VIII|The Last Jedi|2017
IX|The Rise of Skywalker|2019
TRAILER|9|20220903
The trailer in the corresponding file in the new directory includes a different date :
cat new_files/episodes.csv
HEADER|episode|title|release_year
I|The Phantom Menace|1999
II|Attack of the Clones|2002
III|Revenge of the Sith|2005
IV|A New Hope|1977
V|The Empire Strikes Back|1980
VI|Return of the Jedi|1983
VII|The Force Awakens|2015
VIII|The Last Jedi|2017
IX|The Rise of Skywalker|2019
TRAILER|9|20220904
To accurately compare the data in these files, we’ll need to ignore the trailer record.
Once again, there are numerous ways to do this. We could use :
grep -iv trailer baseline_files/episodes.csv
HEADER|episode|title|release_year
I|The Phantom Menace|1999
II|Attack of the Clones|2002
III|Revenge of the Sith|2005
IV|A New Hope|1977
V|The Empire Strikes Back|1980
VI|Return of the Jedi|1983
VII|The Force Awakens|2015
VIII|The Last Jedi|2017
IX|The Rise of Skywalker|2019
…which would result in our diff looking like this :
diff -qs <(sort baseline_files/episodes.csv|grep -iv trailer) <(sort new_files/episodes.csv| grep -iv trailer)
Alternatively, if we know that the trailer record is always the last line of the file we can use head to output everything apart from the last line :
head -n -1 baseline_files/episodes.csv.
HEADER|episode|title|release_year
I|The Phantom Menace|1999
II|Attack of the Clones|2002
III|Revenge of the Sith|2005
IV|A New Hope|1977
V|The Empire Strikes Back|1980
VI|Return of the Jedi|1983
VII|The Force Awakens|2015
VIII|The Last Jedi|2017
IX|The Rise of Skywalker|2019
…which would entail a diff command like this :
diff -qs <(head -n -1 baseline_files/episodes.csv| sort) <(head -n -1 new_files/episodes.csv| sort)
This being Linux there are probably several more options but these should cover at least some of the more common circumstances where comparison of file by data is required.