SQLite PK implementation¶

SQLite table's primary key explained

SQLite documentation maze¶

Being a long term fan of SQLite database I have always been a little bit disoriented by its documentation structure. For me it seems enough cluttered and not easy to follow.

Apart of documentation structure there is another thing which is out of line to what we expect from Primary Key constraint definition in other RDBMS. If you ever were wondering what is the difference between INT PRIMARY KEY and INTEGER PRIMARY KEY and why AUTOINCREMENT is not needed for default primary key generation in SQLite then this post might be interested for you. It makes an attempt to describe these things in hopefully more easier format.

Note: I don't say official documentation is bad. It describes all these things. I just want to filter out Primary Key related info and provide it in more consistent manner accompained by some examples.

Current ROWID implementation¶

SQLite stores tables in B-tree (balanced tree) structure usually used by other RDBMS for indexes storage. Physically it stores data in ordered by some key format. This gives very fast data retrieval but slower data writes because new value has to be written into specific place of this ordered structure.

As result, every row has an implicit key value which is available under the special system column names: rowid, oid, _rowid_.

sqlite> .headers on
sqlite> .mode column
sqlite>
sqlite> create table t as select 10 id;
sqlite> select rowid, oid, _rowid_, id from t;

rowid       rowid       rowid       id
----------  ----------  ----------  ----------
1           1           1           10

Current ROWID datatype is a signed 64-bit integer. Its value corresponds to the row key in the underlying B-tree storage engine. Such mechanism wasn't always like that. In old era for PK values there were used plain INTEGER values. To preserve backward compatibility with old databases there was WITHOUT ROWID option added to the CREATE TABLE statement.

WITHOUT ROWID¶

From the official docs:

when SQLite was first designed, it used only integer rowids for row keys to simplify the implementation. This approach worked well for many years. But as the demands on SQLite grew, the need for tables in which the PRIMARY KEY really did correspond to the underlying row key grew more acute. The WITHOUT ROWID concept was added in order to meet that need without breaking backwards compatibility with the billions of SQLite databases already in use at the time (circa 2013).

You generally shouldn't use WITHOUT ROWID table for any new project. This option is introduced to support backward compatibility with old databases. But just in case. If you do so, you must explicitly define PK constraint.

sqlite> create table t(id int) without rowid;
Error: PRIMARY KEY missing on table t

sqlite> create table t(id int primary key) without rowid;
sqlite> select count(*) from t;
count(*)
----------
0

Also as expected from the name, WITHOUT ROWID table doesn't have rowid column defined.

sqlite> select rowid, id from t;
Error: no such column: rowid

This means that there is no unique row identifier available by default. But SQLite still needs some way to unique identify the row. Because, as already explained above, behind the scene data are stored in the ordered B-tree structure. And that's the reason why we must explicitly declare PK constraint for the WITHOUT ROWID table.

Primary Key¶

Finally we are ready to discuss what is the primary key in SQLite. Non-suprisingly, it's that implicitly defined rowid column value which is auto-assigned to any row in a table. Unless table is explicitly defined as WITHOUT ROWID.

This is an actual primary key of a row in a table.

But in traditional RDBMS we can explicitly define some of the table column(s) as a primary key. And we can create foreign keys in other tables which refer to that primary key. How does it work in SQLite? Should we refer to that rowid, oid or _rowid columns instead? Here where INTEGER PRIMARY KEY comes in.

INTEGER PRIMARY KEY¶

SQLite provides possibility to link some column name in the table to the system rowid column via INTEGER PRIMARY KEY clause. This command makes correspondng column to be a user-defined alias for rowid column.

sqlite> create table t(id integer primary key);
sqlite> insert into t values (1);
sqlite> select rowid, id from t;
id          id
----------  ----------
1           1

As you can see, first selected rowid column in SELECT is named in output as id and not as rowid column. That's a sign that it's paired with an id column of a table.

Having link established, now you define which value will be stored in rowid column. Next example shows that rowid will have user-defined value provided for the paired id column.

sqlite> insert into t values (10);
sqlite> select rowid, id from t;
id          id
----------  ----------
1           1
10          10

But what is even more interesting is that if you will provide explicit NULL value for the id column then SQLite will take control back and assign next available value by itself.

sqlite> .schema t
CREATE TABLE t(id integer primary key);

sqlite> insert into t values (null);
sqlite> select rowid, id from t;
id          id
----------  ----------
1           1
10          10
11          11

This last example demonstrates something that is not possible in other RDBMS. We explicitly insert NULL into the primary key column but command is not errored. Moreover, it assigns next available value 11 for us!

Same works even if you will explicitly create table with NOT NULL constraint for the primary key column.

sqlite> create table t(id integer primary key not null);
sqlite> .schema t
CREATE TABLE t(id integer primary key not null);

sqlite> insert into t values (null);
sqlite> select rowid, id from t;
id          id
----------  ----------
1           1

Important: One requirement still exists to make this magic happen. You must explicitly declare corresponding column as INTEGER PRIMARY KEY. Letter case doesn't matter but words must be specified as a whole. If you specify INT PRIMARY KEY that doesn't work.

sqlite> create table t(id int primary key not null);
sqlite> insert into t values (10);
sqlite> select rowid, id from t;
rowid       id
----------  ----------
1           10

As you can see, rowid is already aliased as rowid and not an id column. They also have different values. This is a clear sign that id column is no longer an alias for the rowid. And you cannot longer insert NULL into non nullable primary key column.

sqlite> insert into t values (null);
Error: NOT NULL constraint failed: t.id

Yet another subtle feature of SQLite is that user-defined primary key could contain NULL values. This is an officially confirmed bug which was again kept from fix to provide backward compatibility.

sqlite> create table t(id int primary key);
sqlite> insert into t values (10);
sqlite> select rowid, id from t;
rowid       id
----------  ----------
1           10

sqlite> insert into t values (null);
sqlite> select rowid, id from t;
rowid       id
----------  ----------
1           10
2

AUTOINCREMENT¶

In many RDBMS auto increment feature is a common approach for primary key values generation. MySQL and Postgres have such AUTO_INCREMENT and SERIAL properties for a very long time. Oracle had introduced IDENTITY columns starting from 12^th version.

SQLite also has such AUTOINCREMENT feature but in official documentation we have...

The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and disk I/O overhead and should be avoided if not strictly needed. It is usually not needed.

As simply as that.

The best way to generate primary key values in SQLite is just define some column as INTEGER PRIMARY KEY. This will make it an alias for a rowid column and it will be always auto-populated with some value. We have already explained behaviour of this column in details in a previous chapter.

Nevertheless, there is still some use case for an AUTOINCREMENT in SQLite. It changes default rowid assignment algorithm to prevent the reuse of rowid values that might be already used for a given table. From the official docs:

If the AUTOINCREMENT keyword appears after INTEGER PRIMARY KEY, that changes the automatic ROWID assignment algorithm to prevent the reuse of ROWIDs over the lifetime of the database. In other words, the purpose of AUTOINCREMENT is to prevent the reuse of ROWIDs from previously deleted rows.

Let's see that in example. Largest possible rowid value in SQLite is 9223372036854775807. If we will insert such value in a table then next incremented value for a rowid would overflow possible values range. In such case SQLite will search for a first non used value and pick it up.

sqlite> create table t(id integer primary key, id2 int);
sqlite> insert into t(id, id2) values (9223372036854775807, 1);
sqlite> select rowid, id2 from t;
id                   id2
-------------------  ----------
9223372036854775807  1

sqlite> insert into t(id2) values (2);
sqlite> select rowid, id2 from t order by id2;
id                   id2
-------------------  ----------
9223372036854775807  1
3223839605541625959  2

We inserted biggest possible value 9223372036854775807 into the rowid column aliased by id column during inserting row with id2 = 1. Next, we insert row with id2 = 2 and don't provide rowid value explicitly relying on automatic mechanism. SQLite searched for the first non-used value within available range and picked it up.

Now let's see what happen if we will add AUTOINCREMENT to the column definition.

sqlite> create table t(id integer primary key autoincrement, id2 int);
sqlite> insert into t(id, id2) values (9223372036854775807, 1);
sqlite> select rowid, id2 from t;
id                   id2
-------------------  ----------
9223372036854775807  1

sqlite> insert into t(id2) values (2);
Error: database or disk is full

We got an error. And this behaviour is completely in line with quoted excerpt from the documentation. In order to guarantee that we don't reuse rowid values for a primary key column SQLite always increases value for a rowid. And if last used value was the maximum possible we get database or disk is full error.

Summary¶

The best way to define primary key on a table in SQLite is to define column as having INTEGER PRIMARY KEY datatype. In such way correspodning column become an alias for a system rowid column which is assigned value by SQLite itself. Because it's handled internally by SQLite it also guarantees the most performant way to assign the value.

Adding AUTOINCREMENT for a primary key column is not needed. Moreover, it is even not desirable because it imposes extra CPU cycles as explicitly stated in documentation. Use it only when you have a requirement to guarantee unique primary key values during the whole lifetime of a particular table.