Edgewall Software

Trac SQL Database API

Trac uses a very thin layer on top of the standard Python Database API 2.0 for interfacing with supported relational database systems such as SQLite, PostgreSQL or the MySqlDb systems.

You can find the specifics of the database API in the trac.db package. This package provides:

  • Simple pooling of database connections
  • Iterable cursors
  • Selection of DB modules based of connection URIs.

Accessing the Database

The DB API has evolved quite a lot in the recent versions of Trac, and the usage is substantially different whether you're working with Trac 0.11, 0.12 or 1.0 and up.

If you don't need to be encumbered with the legacy styles, skip directly to the #Trac1.0API style described below.

API before Trac 0.12

This is how things used to be for Trac 0.5 - 0.11, and is still supported in Trac 0.12, but in new code this style should definitely not be used, as it introduces all the problems that the new styles are solving (see next sections for details).

Code accessing the database in Trac go through this layer simply by using the Environment method get_db_cnx() in order to get a pooled database connection. A database cursor can be obtained from the pooled database connection, and the code uses the cursor in conjunction with SQL statements to implement the behavior.

from trac.env import Environment
def myFunc():
    env = Environment('/path/to/projenv')
    db = env.get_db_cnx()
    cursor = db.cursor()
    # Execute some SQL statements

    db.commit()

Note that you should always make sure that db won't get garbage collected while cursor is still used, as the collection will do a rollback and close the cursors (avoid in particular doing cursor = env.get_db_cnx().cursor(), see r8878).

Trac 0.12 API

The 0.12 style can still be used in 1.0, but for new code we encourage you to focus on the Trac version 1.0 and to adopt the even simpler 1.0 style (see next section).

As we dropped support for Python 2.3, we could make use of decorators to get rid of some of the boilerplate code.

from trac.env import Environment
from trac.db import with_transaction
def myFunc():
    env = Environment('/path/to/projenv')
    @with_transaction(env)
    def do_something(db):
        cursor = db.cursor()
        # Execute some SQL statements

This is slightly simpler and more robust than the previous API, as not only the commit() is performed in any successful case (i.e. the Execute some SQL statements code can return at any place), but a rollback will be performed should an exception be triggered in that block.

See more details in the ApiChanges/0.12 paragraph.

The use of env.get_db_cnx() is still possible, but deprecated. An alternative env.get_read_db() method can be used to get a "read" only access to the database (for performing SELECT queries).

Nested Transactions

With the API described above (since 0.12) it's possible to have nested transactions. A nested transaction is best explained by an example. Consider the following code:

from trac.env import Environment
from trac.db import with_transaction
def myFunc1():
    env = Environment('/path/to/projenv')
    @with_transaction(env)
    def do_outer_transaction(db):
        cursor = db.cursor()
        # Execute some SQL statements
        myFunc2(env) # do nested transaction
        # <=== commit or rollback

def myFunc2(env):
    @with_transaction(env)
    def do_nested_transaction(db):
        cursor = db.cursor()
        # Execute some SQL statements          

In this example, when myFunc1() is called, it first executes the outer transaction (do_outer_transaction()) and then executes a nested transaction (myFunc2()) from within the outer transaction. The important thing to notice is:

Nested transactions are atomic!

This means that either the whole (outer) transaction is committed or aborted. So even if the nested transaction succeeded but the outer transaction fails (an exception is raised at the line with the "commit or rollback" comment), the whole transaction will be rolled back (i.e. including the changes made by the nested transaction).

Notes:

  • While still possible in the API (for backward compatibility reasons), you should not call commit() yourself in transaction; not only does this happen automatically at the proper time (as explained above), if you'd call it yourself you will risk to commit from within a nested transaction, possibly leading to inconsistent state of the DB
  • Using env.get_read_db() within a transaction reuses the same connection as used for the connection. So uncommitted changes made by the transaction will already be visible to the caller of get_read_db() (but not outside of the transaction - that is in another thread).
  • Uncommited changes of a transaction are only visible to nested transactions in the same thread. Different threads use different database connections and therefore different transactions (well, to be more precise, the exact detail of what is visible to other threads is backend specific and depends to the isolation level used)

Trac 1.0 API

This style is supported for version 1.0.x, 1.1.x and will be supported beyond as well.

As we dropped support for Python 2.4, we could simplify the code a bit more by using the with keyword and context managers:

from trac.env import Environment
def myFunc():
    env = Environment('/path/to/projenv')
    with env.db_transaction as db:
        cursor = db.cursor()
        # Execute some SQL statements

It does essentially the same thing as the 0.12 version above in a terser way. Nested transactions work the way you'd expect (only the outermost one will actually do the commit(), upon normal exit) so, again, don't call commit() by yourself.

The use of env.get_db_cnx() is now strongly discouraged.

Symmetrically, a second context manager is provided for performing read-only accesses:

from trac.env import Environment
def myFunc():
    env = Environment('/path/to/projenv')
    with env.db_query as db:
        cursor = db.cursor()
        cursor.execute("""
            SELECT a, b, c FROM ...
            """, (param1, param2))
        # Execute some SQL "SELECT" statements
    # (continue)

This one enforces the notion of read-access, because the db connection bound to the context doesn't support the commit() or rollback() methods. As there's no commit() upon exit, one could question the usefulness of this syntactic construct. The main interest is in better locality of the connection: in order to improve concurrency, the lifetime of a connection wrapper has to be as short as possible (see #3446). Therefore, we close the connection on exit, making it available to other threads. So even if Python gives you access to the db variable after the with block (at the # (continue) line in the above example), you should not (and can't) use it at that point.

Actually, if you don't need to do anything fancy with the cursor like calling db.get_last_id(cursor, ...), then you can use a shorter form:

from trac.env import Environment
def myFunc():
    env = Environment('/path/to/projenv')
    with env.db_query as db:
        for a, b, c in db("""
                SELECT a, b, c 
                FROM ...
                """, (param1, param2)):
            # do something with a, b, c

All the results are returned at once in a list, by calling fetchall() on the underlying cursor. This is fine most of the time, if you need to limit the number of returned results, you can still use "LIMIT" or "OFFSET" in the SQL query.

In the same spirit, if you don't even need to use db itself for things like db.cast() or db.like(), you can simply do:

from trac.env import Environment
def myFunc():
    env = Environment('/path/to/projenv')
    for a, b, c in env.db_query("""
            SELECT a, b, c
            FROM ...
            """, (param1, param2)):
        # do something with a, b, c

These short forms also present an additional safety measure as they're only allowing a "SELECT" query to be executed by a read-only connection (indeed, the same short forms are possible on both env.db_transaction and env.db_query).

Configuration

For determining which database has to be used, Trac looks at the value of the database configuration option in trac.ini, which should contain a database connection URI. The default value for this option tells Trac to use an SQLite database inside the environment directory:

[trac]
database = sqlite:db/trac.db

The connection URI syntax has been designed to be compatible with that provided by SQLObject (see also the section on SQLObject connections). The only supported URI schemes at this point are sqlite, postgres and mysql.

Pooled Connections

Prior to version 1.0, Trac used to operate the following way:

The Environment method get_db_cnx() returns a connection from the pool of connections. This connection needs to be returned, and Trac is written so that the return will happen automatically by the garbage collector if the code is written to follow a simple rule. When the garbage collector determines the pooled database connection is no longer being used, its __del__ method will return the pooled connection to the pool for reuse. If you have set a lexical variable in the function's body to the pooled connection, this typically occurs when the function is returning. In the example above of myFunc it occurs at the return statement since db is a variable local to myFunc

With the context managers introduced in Trac 1.0, we're able to return this Connection to the pool in a much more robust and direct way:

When the control flow exits a context manager (either Environment.db_query or Environment.db_transaction), and if that context manager is the "outermost" one in case multiple contexts where nested, then the Connection is immediately returned to the pool, regardless of the behavior of the garbage collector.

This means that even if a variable still contains a reference to the Connection, it won't be possible to use it outside of the context:

>>> from trac.env import open_environment
>>> env = open_environment('...-trac')
>>> with env.db_query as db:
...     print db("SELECT count(*) FROM wiki")
...
[(563,)]
>>> db
<trac.db.util.ConnectionWrapper object at 0x026146E8>
>>> print db("SELECT count(*) FROM wiki")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "trac\db\util.py", line 123, in __call__
    cursor = self.cnx.cursor()
  File "trac\db\util.py", line 108, in __getattr__
    return getattr(self.cnx, name)
AttributeError: 'NoneType' object has no attribute 'cursor'

… which is a good thing!

Rules for DB API Usage

Different DB API modules have different ways to pass parameters to the cursors' execute method, and different ways to access query results. To keep the database API as thin as possible, the Trac team has decided to use a relatively common subset in all database code.

Parameter passing

Always use the "format" parameter style, and always with %s (because that's the only type that pyPgSQL supports). Statement parameters always need to be passed into execute as an actual sequence (list or tuple).

So the following statements are okay:

cursor.execute("SELECT author,ipnr,comment FROM wiki WHERE name=%s", [thename])
cursor.execute("SELECT id FROM ticket WHERE time>=%s AND time<=%s", (start, stop))

The following uses are not okay:

cursor.execute("SELECT author,ipnr,comment FROM wiki WHERE name=?", thename)
cursor.execute("SELECT id FROM ticket WHERE time>=%i AND time<=%i", start, stop)

At any cost, try avoiding the use of string formatting to get values into the SQL statement. The database automatically escapes values you pass using execute() arguments, the same is not true if you use string formatting, opening your code up to SQL injection attacks.

On the other hand, you must use string formatting to dynamically specify names of tables or columns, i.e. anything that is not a value as such:

cursor.execute("SELECT time FROM %s WHERE name=%%s" % db.quote(table), (thename,))

Retrieving results

For convenience, cursors returned by the database connection object are iterable after having executed an SQL query. Individual fields in result rows may only be accessed using integer indices:

cursor.execute("SELECT author,ipnr,comment FROM wiki WHERE name=%s", (thename,))
for row in cursor:
    print 'Author: %s (%s)' % (row[0], row[1])
    print row[2]

Accessing fields using the column names is not supported by all database modules, so it should not be used. Automatically unpacking rows into tuples of named variables often provides better readability:

cursor.execute("SELECT author,ipnr,comment FROM wiki WHERE name=%s", (thename,))
for author, ipnr, comment in cursor:
    print 'Author: %s (%s)' % (author, ipnr)
    print comment

Guidelines for SQL Statements

As you may know, support for SQL varies among different database systems. The requirements of Trac are relatively trivial, so we try to stick to a common subset that is supported by the majority of databases.

  • no native date or time database types; store date and time information in seconds as int fields (before 0.12) or better, in microseconds and int64 fields (since 0.12, mapped by each trac.db.IDatabaseConnector to a suitable database specific type)
  • no triggers
  • you may use views if you feel you need them (but so far we didn't feel the need for Trac itself)

For anything not portable (and you really fall quickly in there), you need to use some methods from the connection when building your SQL query (e.g. db.cast(column, type), db.concat(*params), db.like(), db.like_escape(), db.quote(param), db.get_last_id(cursor, table, col)).

For example:

            cursor.execute("""
                SELECT DISTINCT rev FROM node_change
                WHERE repos=%%s AND rev>=%%s AND rev<=%%s
                  AND (path=%%s OR path %s)""" % db.like(),
                (self.id, sfirst, slast, path,
                 db.like_escape(path + '/') + '%'))

(cache.py)

As you can see, the readability of long SQL statements can improved by using Python's multiline string syntax.


See also: TracDev, TracDev/DatabaseSchema, TracDev/DatabaseUpgrades, TracDev/CodingStyle

Last modified 4 months ago Last modified on Apr 5, 2014 1:09:45 PM