SQLite PK implementation
SQLite table's primary key explained
- SQLite documentation maze
- Current ROWID implementation
- Primary Key
SQLite documentation maze
Being a long term fan of SQLite database I have always been a little bit disoriented by its documentation structure. For me it seems enough cluttered and not easy to follow.
Apart of documentation structure there is another thing which is out of line to what we expect from Primary Key constraint definition in other RDBMS.
If you ever were wondering what is the difference between
INT PRIMARY KEY and
INTEGER PRIMARY KEY and why
AUTOINCREMENT is not needed for default primary key generation in SQLite then this post might be interested for you.
It makes an attempt to describe these things in hopefully more easier format.
Note: I don't say official documentation is bad. It describes all these things. I just want to filter out Primary Key related info and provide it in more consistent manner accompained by some examples.
Current ROWID implementation
SQLite stores tables in B-tree (balanced tree) structure usually used by other RDBMS for indexes storage. Physically it stores data in ordered by some key format. This gives very fast data retrieval but slower data writes because new value has to be written into specific place of this ordered structure.
As result, every row has an implicit key value which is available under the special system column names:
sqlite> .headers on sqlite> .mode column sqlite> sqlite> create table t as select 10 id; sqlite> select rowid, oid, _rowid_, id from t; rowid rowid rowid id ---------- ---------- ---------- ---------- 1 1 1 10
ROWID datatype is a signed 64-bit integer. Its value corresponds to the row key in the underlying B-tree storage engine.
Such mechanism wasn't always like that. In old era for
PK values there were used plain
To preserve backward compatibility with old databases there was
WITHOUT ROWID option added to the
CREATE TABLE statement.
From the official docs:
when SQLite was first designed, it used only integer rowids for row keys to simplify the implementation. This approach worked well for many years. But as the demands on SQLite grew, the need for tables in which the PRIMARY KEY really did correspond to the underlying row key grew more acute. The WITHOUT ROWID concept was added in order to meet that need without breaking backwards compatibility with the billions of SQLite databases already in use at the time (circa 2013).
You generally shouldn't use
WITHOUT ROWID table for any new project.
This option is introduced to support backward compatibility with old databases.
But just in case. If you do so, you must explicitly define
sqlite> create table t(id int) without rowid; Error: PRIMARY KEY missing on table t sqlite> create table t(id int primary key) without rowid; sqlite> select count(*) from t; count(*) ---------- 0
Also as expected from the name,
WITHOUT ROWID table doesn't have
rowid column defined.
sqlite> select rowid, id from t; Error: no such column: rowid
This means that there is no unique row identifier available by default.
But SQLite still needs some way to unique identify the row.
Because, as already explained above, behind the scene data are stored in the ordered B-tree structure.
And that's the reason why we must explicitly declare
PK constraint for the
WITHOUT ROWID table.
Finally we are ready to discuss what is the primary key in SQLite.
Non-suprisingly, it's that implicitly defined
rowid column value which is auto-assigned to any row in a table.
Unless table is explicitly defined as WITHOUT ROWID.
This is an actual primary key of a row in a table.
But in traditional RDBMS we can explicitly define some of the table column(s) as a primary key.
And we can create foreign keys in other tables which refer to that primary key.
How does it work in SQLite? Should we refer to that
_rowid columns instead?
INTEGER PRIMARY KEY comes in.
INTEGER PRIMARY KEY
SQLite provides possibility to link some column name in the table to the system
rowid column via
INTEGER PRIMARY KEY clause.
This command makes correspondng column to be a user-defined alias for
sqlite> create table t(id integer primary key); sqlite> insert into t values (1); sqlite> select rowid, id from t; id id ---------- ---------- 1 1
As you can see, first selected
rowid column in
SELECT is named in output as
id and not as
That's a sign that it's paired with an
id column of a table.
Having link established, now you define which value will be stored in
Next example shows that
rowid will have user-defined value provided for the paired
sqlite> insert into t values (10); sqlite> select rowid, id from t; id id ---------- ---------- 1 1 10 10
But what is even more interesting is that if you will provide explicit
NULL value for the
id column then SQLite will take control back and assign next available value by itself.
sqlite> .schema t CREATE TABLE t(id integer primary key); sqlite> insert into t values (null); sqlite> select rowid, id from t; id id ---------- ---------- 1 1 10 10 11 11
This last example demonstrates something that is not possible in other RDBMS.
We explicitly insert
NULL into the primary key column but command is not errored.
Moreover, it assigns next available value
11 for us!
Same works even if you will explicitly create table with
NOT NULL constraint for the primary key column.
sqlite> create table t(id integer primary key not null); sqlite> .schema t CREATE TABLE t(id integer primary key not null); sqlite> insert into t values (null); sqlite> select rowid, id from t; id id ---------- ---------- 1 1
Important: One requirement still exists to make this magic happen.
You must explicitly declare corresponding column as
INTEGER PRIMARY KEY.
Letter case doesn't matter but words must be specified as a whole.
If you specify
INT PRIMARY KEY that doesn't work.
sqlite> create table t(id int primary key not null); sqlite> insert into t values (10); sqlite> select rowid, id from t; rowid id ---------- ---------- 1 10
As you can see,
rowid is already aliased as
rowid and not an
They also have different values. This is a clear sign that
id column is no longer an alias for the
And you cannot longer insert
NULL into non nullable primary key column.
sqlite> insert into t values (null); Error: NOT NULL constraint failed: t.id
Yet another subtle feature of SQLite is that user-defined primary key could contain
This is an officially confirmed bug which was again kept from fix to provide backward compatibility.
sqlite> create table t(id int primary key); sqlite> insert into t values (10); sqlite> select rowid, id from t; rowid id ---------- ---------- 1 10 sqlite> insert into t values (null); sqlite> select rowid, id from t; rowid id ---------- ---------- 1 10 2
In many RDBMS auto increment feature is a common approach for primary key values generation.
MySQL and Postgres have such
SERIAL properties for a very long time.
Oracle had introduced
IDENTITY columns starting from 12th version.
SQLite also has such
AUTOINCREMENT feature but in official documentation we have...
The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and disk I/O overhead and should be avoided if not strictly needed. It is usually not needed.
As simply as that.
The best way to generate primary key values in SQLite is just define some column as
INTEGER PRIMARY KEY.
This will make it an alias for a
rowid column and it will be always auto-populated with some value.
We have already explained behaviour of this column in details in a previous chapter.
Nevertheless, there is still some use case for an
AUTOINCREMENT in SQLite.
It changes default
rowid assignment algorithm to prevent the reuse of
rowid values that might be already used for a given table.
From the official docs:
If the AUTOINCREMENT keyword appears after INTEGER PRIMARY KEY, that changes the automatic ROWID assignment algorithm to prevent the reuse of ROWIDs over the lifetime of the database. In other words, the purpose of AUTOINCREMENT is to prevent the reuse of ROWIDs from previously deleted rows.
Let's see that in example. Largest possible
rowid value in SQLite is 9223372036854775807.
If we will insert such value in a table then next incremented value for a
rowid would overflow possible values range.
In such case SQLite will search for a first non used value and pick it up.
sqlite> create table t(id integer primary key, id2 int); sqlite> insert into t(id, id2) values (9223372036854775807, 1); sqlite> select rowid, id2 from t; id id2 ------------------- ---------- 9223372036854775807 1 sqlite> insert into t(id2) values (2); sqlite> select rowid, id2 from t order by id2; id id2 ------------------- ---------- 9223372036854775807 1 3223839605541625959 2
We inserted biggest possible value 9223372036854775807 into the
rowid column aliased by
id column during inserting row with
id2 = 1.
Next, we insert row with
id2 = 2 and don't provide
rowid value explicitly relying on automatic mechanism.
SQLite searched for the first non-used value within available range and picked it up.
Now let's see what happen if we will add
AUTOINCREMENT to the column definition.
sqlite> create table t(id integer primary key autoincrement, id2 int); sqlite> insert into t(id, id2) values (9223372036854775807, 1); sqlite> select rowid, id2 from t; id id2 ------------------- ---------- 9223372036854775807 1 sqlite> insert into t(id2) values (2); Error: database or disk is full
We got an error. And this behaviour is completely in line with quoted excerpt from the documentation.
In order to guarantee that we don't reuse
rowid values for a primary key column SQLite always increases value for a
And if last used value was the maximum possible we get database or disk is full error.
The best way to define primary key on a table in SQLite is to define column as having
INTEGER PRIMARY KEY datatype.
In such way correspodning column become an alias for a system
rowid column which is assigned value by SQLite itself.
Because it's handled internally by SQLite it also guarantees the most performant way to assign the value.
AUTOINCREMENT for a primary key column is not needed.
Moreover, it is even not desirable because it imposes extra CPU cycles as explicitly stated in documentation.
Use it only when you have a requirement to guarantee unique primary key values during the whole lifetime of a particular table.