SQLite PK implementation¶
SQLite table's primary key explained
SQLite documentation maze¶
Being a long term fan of SQLite database I have always been a little bit disoriented by its documentation structure. For me it seems enough cluttered and not easy to follow.
Apart of documentation structure there is another thing which is out of line to what we expect from Primary Key constraint definition in other RDBMS.
If you ever were wondering what is the difference between INT PRIMARY KEY
and INTEGER PRIMARY KEY
and why AUTOINCREMENT
is not needed for default primary key generation in SQLite then this post might be interested for you.
It makes an attempt to describe these things in hopefully more easier format.
Note: I don't say official documentation is bad. It describes all these things. I just want to filter out Primary Key related info and provide it in more consistent manner accompained by some examples.
Current ROWID implementation¶
SQLite stores tables in B-tree (balanced tree) structure usually used by other RDBMS for indexes storage. Physically it stores data in ordered by some key format. This gives very fast data retrieval but slower data writes because new value has to be written into specific place of this ordered structure.
As result, every row has an implicit key value which is available under the special system column names: rowid
, oid
, _rowid_
.
sqlite> .headers on
sqlite> .mode column
sqlite>
sqlite> create table t as select 10 id;
sqlite> select rowid, oid, _rowid_, id from t;
rowid rowid rowid id
---------- ---------- ---------- ----------
1 1 1 10
Current ROWID
datatype is a signed 64-bit integer. Its value corresponds to the row key in the underlying B-tree storage engine.
Such mechanism wasn't always like that. In old era for PK
values there were used plain INTEGER
values.
To preserve backward compatibility with old databases there was WITHOUT ROWID
option added to the CREATE TABLE
statement.
WITHOUT ROWID¶
From the official docs:
when SQLite was first designed, it used only integer rowids for row keys to simplify the implementation. This approach worked well for many years. But as the demands on SQLite grew, the need for tables in which the PRIMARY KEY really did correspond to the underlying row key grew more acute. The WITHOUT ROWID concept was added in order to meet that need without breaking backwards compatibility with the billions of SQLite databases already in use at the time (circa 2013).
You generally shouldn't use WITHOUT ROWID
table for any new project.
This option is introduced to support backward compatibility with old databases.
But just in case. If you do so, you must explicitly define PK
constraint.
sqlite> create table t(id int) without rowid;
Error: PRIMARY KEY missing on table t
sqlite> create table t(id int primary key) without rowid;
sqlite> select count(*) from t;
count(*)
----------
0
Also as expected from the name, WITHOUT ROWID
table doesn't have rowid
column defined.
This means that there is no unique row identifier available by default.
But SQLite still needs some way to unique identify the row.
Because, as already explained above, behind the scene data are stored in the ordered B-tree structure.
And that's the reason why we must explicitly declare PK
constraint for the WITHOUT ROWID
table.
Primary Key¶
Finally we are ready to discuss what is the primary key in SQLite.
Non-suprisingly, it's that implicitly defined rowid
column value which is auto-assigned to any row in a table.
Unless table is explicitly defined as WITHOUT ROWID.
This is an actual primary key of a row in a table.
But in traditional RDBMS we can explicitly define some of the table column(s) as a primary key.
And we can create foreign keys in other tables which refer to that primary key.
How does it work in SQLite? Should we refer to that rowid
, oid
or _rowid
columns instead?
Here where INTEGER PRIMARY KEY
comes in.
INTEGER PRIMARY KEY¶
SQLite provides possibility to link some column name in the table to the system rowid
column via INTEGER PRIMARY KEY
clause.
This command makes correspondng column to be a user-defined alias for rowid
column.
sqlite> create table t(id integer primary key);
sqlite> insert into t values (1);
sqlite> select rowid, id from t;
id id
---------- ----------
1 1
As you can see, first selected rowid
column in SELECT
is named in output as id
and not as rowid
column.
That's a sign that it's paired with an id
column of a table.
Having link established, now you define which value will be stored in rowid
column.
Next example shows that rowid
will have user-defined value provided for the paired id
column.
sqlite> insert into t values (10);
sqlite> select rowid, id from t;
id id
---------- ----------
1 1
10 10
But what is even more interesting is that if you will provide explicit NULL
value for the id
column then SQLite will take control back and assign next available value by itself.
sqlite> .schema t
CREATE TABLE t(id integer primary key);
sqlite> insert into t values (null);
sqlite> select rowid, id from t;
id id
---------- ----------
1 1
10 10
11 11
This last example demonstrates something that is not possible in other RDBMS.
We explicitly insert NULL
into the primary key column but command is not errored.
Moreover, it assigns next available value 11
for us!
Same works even if you will explicitly create table with NOT NULL
constraint for the primary key column.
sqlite> create table t(id integer primary key not null);
sqlite> .schema t
CREATE TABLE t(id integer primary key not null);
sqlite> insert into t values (null);
sqlite> select rowid, id from t;
id id
---------- ----------
1 1
Important: One requirement still exists to make this magic happen.
You must explicitly declare corresponding column as INTEGER PRIMARY KEY
.
Letter case doesn't matter but words must be specified as a whole.
If you specify INT PRIMARY KEY
that doesn't work.
sqlite> create table t(id int primary key not null);
sqlite> insert into t values (10);
sqlite> select rowid, id from t;
rowid id
---------- ----------
1 10
As you can see, rowid
is already aliased as rowid
and not an id
column.
They also have different values. This is a clear sign that id
column is no longer an alias for the rowid
.
And you cannot longer insert NULL
into non nullable primary key column.
Yet another subtle feature of SQLite is that user-defined primary key could contain NULL
values.
This is an officially confirmed bug which was again kept from fix to provide backward compatibility.
sqlite> create table t(id int primary key);
sqlite> insert into t values (10);
sqlite> select rowid, id from t;
rowid id
---------- ----------
1 10
sqlite> insert into t values (null);
sqlite> select rowid, id from t;
rowid id
---------- ----------
1 10
2
AUTOINCREMENT¶
In many RDBMS auto increment feature is a common approach for primary key values generation.
MySQL and Postgres have such AUTO_INCREMENT
and SERIAL
properties for a very long time.
Oracle had introduced IDENTITY
columns starting from 12th version.
SQLite also has such AUTOINCREMENT
feature but in official documentation we have...
The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and disk I/O overhead and should be avoided if not strictly needed. It is usually not needed.
As simply as that.
The best way to generate primary key values in SQLite is just define some column as INTEGER PRIMARY KEY
.
This will make it an alias for a rowid
column and it will be always auto-populated with some value.
We have already explained behaviour of this column in details in a previous chapter.
Nevertheless, there is still some use case for an AUTOINCREMENT
in SQLite.
It changes default rowid
assignment algorithm to prevent the reuse of rowid
values that might be already used for a given table.
From the official docs:
If the AUTOINCREMENT keyword appears after INTEGER PRIMARY KEY, that changes the automatic ROWID assignment algorithm to prevent the reuse of ROWIDs over the lifetime of the database. In other words, the purpose of AUTOINCREMENT is to prevent the reuse of ROWIDs from previously deleted rows.
Let's see that in example. Largest possible rowid
value in SQLite is 9223372036854775807.
If we will insert such value in a table then next incremented value for a rowid
would overflow possible values range.
In such case SQLite will search for a first non used value and pick it up.
sqlite> create table t(id integer primary key, id2 int);
sqlite> insert into t(id, id2) values (9223372036854775807, 1);
sqlite> select rowid, id2 from t;
id id2
------------------- ----------
9223372036854775807 1
sqlite> insert into t(id2) values (2);
sqlite> select rowid, id2 from t order by id2;
id id2
------------------- ----------
9223372036854775807 1
3223839605541625959 2
rowid
column aliased by id
column during inserting row with id2 = 1
.
Next, we insert row with id2 = 2
and don't provide rowid
value explicitly relying on automatic mechanism.
SQLite searched for the first non-used value within available range and picked it up.
Now let's see what happen if we will add AUTOINCREMENT
to the column definition.
sqlite> create table t(id integer primary key autoincrement, id2 int);
sqlite> insert into t(id, id2) values (9223372036854775807, 1);
sqlite> select rowid, id2 from t;
id id2
------------------- ----------
9223372036854775807 1
sqlite> insert into t(id2) values (2);
Error: database or disk is full
rowid
values for a primary key column SQLite always increases value for a rowid
.
And if last used value was the maximum possible we get database or disk is full error.
Summary¶
The best way to define primary key on a table in SQLite is to define column as having INTEGER PRIMARY KEY
datatype.
In such way correspodning column become an alias for a system rowid
column which is assigned value by SQLite itself.
Because it's handled internally by SQLite it also guarantees the most performant way to assign the value.
Adding AUTOINCREMENT
for a primary key column is not needed.
Moreover, it is even not desirable because it imposes extra CPU cycles as explicitly stated in documentation.
Use it only when you have a requirement to guarantee unique primary key values during the whole lifetime of a particular table.