Automatic database index creation with the Java Persistence API 2


Data Volume in Enterprise Applications and JPA

Most Enterprise applications are dealing with significant amounts of data, typically managed in a transactional manner where SQL is still the persistence management of choice, sometimes combined with NoSQL databases depending on requirements.

Java Persistence API (JPA) greatly improved developer efficiency, removing some of the low level work such as database-specific SQL dialects and manual creation of schemas. However, there is a risk of lacking focus on performance. Developers tend to develop with smaller data sets and primarily have an object oriented and business logic perspective, and regard database optimization as a job a database administrator should do.
In best case, you will have someone monitoring execution plans and query performance and acting on those findings. Worst case you have a badly performing application with inferior user experience.

Overall Application Performance is a Developer task!

Developers know the queries, they should see themselves as the first and most important role in application performance. Now, how can we make it easy for developers to set up and maintain (in case of schema changes) database indexes? How about

  • Make indexes just as database systems independent as the entity classes using them
  • Define indexes where you define entities and queries
  • Define a simple rule of thumb for what columns to index

Since JPA 2.1, there is a feature for automatic creation of indexes, and it gets way to little attention! Both Hibernate and EclipseLink implement the feature.

Automatic creation of database indexes with JPA

I am going to do a brief test project that sets up a MySQL database, creates an EJB module with a JPA entity, and enhances it to automatically create database indexes. I will use Netbeans, but the steps can be performed with any IDE or editor.

Create a database

mysql -u root -p
create database jpaindex;
quit;

Create EJB Module

In Netbeans, open the „New project“ dialog and use „Maven“ – „EJB Module“. Give it the name „jpaindex“. In the Source Packages folder, create a new package. I am using „com.jdimension.jpaindex.persistence“.

Right click the package and use „New“ – „Entity Class“. It will bring up a wizard to define the entity name (I am using „JpaEntity“) and the datasource in your Java Enterprise Container (I am using Wildfly). Netbeans will create the Entity Class for you, with an id column as a primary key. Now, let’s create one more column that is not part of the primary key. Add a member like this:

private String someValue;

Then right click the variable and use „Refactor“ – „Encapsulate Fields“. Confirm the dialog and you will have a Getter and Setter for the new member variable.

Deploy the module

At this point, you can right click the project and use „Run“ to deploy it to the container. Upon successful deployment, it will automatically have created a new table „JpaIndex“ in your database. Let’s check:

 

mysql -u root -p
use jpaindexdb;
desc JpaIndex;
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| id | bigint(20) | NO | PRI | NULL | |
| someValue | varchar(255) | YES | | NULL | |
+-----------+--------------+------+-----+---------+-------+
2 rows in set (0,00 sec)

So, we have a table „JpaIndex“ with id as the primary key and someValue not being indexed.

Enable automatic JPA index creation

Now, let’s tell our JPA provider to create indexes. In our Entity class, right under the @Entity annotation, add the following:

@Table(name = "JpaIndex",
indexes = {@Index(name = "idx_id", columnList="id", unique = true),
@Index(name = "idx_somevalue", columnList="someValue", unique = false)})

This is supposed to create two indexes, one for id and another one for someValue. The column list can contain multiple columns, separated by comma. Re-run (deploy) the module and check the table again:

desc JpaIndex;
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| id | bigint(20) | NO | PRI | NULL | |
| someValue | varchar(255) | YES | MUL | NULL | |
+-----------+--------------+------+-----+---------+-------+
2 rows in set (0,00 sec)
show index from JpaIndex;
+----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| JpaIndex | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | | |
| JpaIndex | 1 | idx_somevalue | 1 | someValue | A | 0 | NULL | NULL | YES | BTREE | | |
+----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0,00 sec)

There we go – we have indexes on both columns. MySQL has an index on primary keys by default, so „idx_id“ was not created explicitly. „idx_somevalue“ is in place. All good.

Queries and indexes in one place

Now that we know how to automatically create indexes, what should be our „reminder“ for defining them? I suggest using named queries on the entity classes instead of using query builder in your application code. This way, using a different / new query is a consciuous decision. Adding query, you will see the index definitions directly in the same code snippet. It will require quite some ignorance to not directly enhance the indexes as well 🙂

Here are some named queries for our sample entity:

@NamedQueries({
@NamedQuery(name = "JpaIndex.findAll", query = "SELECT a FROM JpaIndex a"),
@NamedQuery(name = "JpaIndex.findBySomeValue", query = "SELECT a FROM JpaIndex a WHERE a.someValue = :someValue")})

Simple rule for index creation

But what columns should be indexed? That is probably the most difficult part and would require a couple of blog posts. There are many factors and database specifics. But in many cases, developers can just use a simple rule:

„Index all columns that are used as criteria in WHERE clauses“

Anything beyond that needs more thinking – how large is the table? Does it see more reads than writes? …

I posted the entity class as a Gist. Post a comment if you have feedback or questions.

 


Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.

2 Gedanken zu “Automatic database index creation with the Java Persistence API