Connect Pentaho/Mondrian Schema Workbench to Amazon Redshift

Posted by & filed under Development, Java, Work.

The more you play with big data technologies, the more you end up circling back to the basic question “How does my end user get access/value?”.  Currently we’re using Amazon Redshift as a down and dirty query window on top of our raw data.  Recently, I’ve started to try and run Mondrian OLAP against redshift.

After getting through the initial server setup against MySQL, I tried connecting to my redshift cluster.  Since Mondrian supports PostgreSQL out of the box, it works – but I ran into some key roadblocks where the subtle differences in Redshift fall short.

When trying to use the Mondrian Schema Workbench, it connects successfully to the database using the connection manager, but in the JDBC explorer and when I add a new Table to a cube I only get a list of schemas but not the databases underneath them.  Looking at the log output the error appeared to be:

org.postgresql.util.PSQLException: Unable to determine a value for MaxIndexKeys due to missing system catalog data.

You’ll notice in the extended logs it sucessfully grabs all of the schemas, but appears to die when getting metadata information from the “public” schema:

2013-08-12 13:30:55,473 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: initConnection
2013-08-12 13:30:55,780 DEBUG [mondrian.gui.JdbcMetaData] JDBC connection OPEN
2013-08-12 13:30:55,780 DEBUG [mondrian.gui.JdbcMetaData] Catalog name = hillsraw
2013-08-12 13:30:55,780 DEBUG [mondrian.gui.JdbcMetaData] Database Product Name: PostgreSQL
2013-08-12 13:30:55,780 DEBUG [mondrian.gui.JdbcMetaData] Database Product Version: 8.0.2
2013-08-12 13:30:55,781 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: initConnection - no error
2013-08-12 13:30:55,781 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: setAllSchemas
2013-08-12 13:30:55,887 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: setAllTables - information_schema
2013-08-12 13:30:55,887 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: Loading schema: 'information_schema'
2013-08-12 13:30:56,050 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: setAllTables - pg_catalog
2013-08-12 13:30:56,050 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: Loading schema: 'pg_catalog'
2013-08-12 13:30:56,211 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: setAllTables - pg_internal
2013-08-12 13:30:56,211 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: Loading schema: 'pg_internal'
2013-08-12 13:30:56,378 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: setAllTables - public
2013-08-12 13:30:56,378 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: Loading schema: 'public'
2013-08-12 13:30:56,668 ERROR [mondrian.gui.JdbcMetaData] setAllTables
org.postgresql.util.PSQLException: Unable to determine a value for MaxIndexKeys due to missing system catalog data.
	at org.postgresql.jdbc2.AbstractJdbc2DatabaseMetaData.getMaxIndexKeys(AbstractJdbc2DatabaseMetaData.java:64)
	at org.postgresql.jdbc2.AbstractJdbc2DatabaseMetaData.getImportedExportedKeys(AbstractJdbc2DatabaseMetaData.java:3196)

Conveniently the error message points me directly to the PostgresSQL JDBC driver on github.  Tracing back through the code, the error appears to be centered around this section of code.  From what I can guess, the first check of “haveMinimumServerVersion(“8.0″)” passes for Redshift, however the query to return the key information does not work in Redshift (returns emtpy result set).  The query contained within the block for “haveMinimumServerVersion(“7.3″)” however, does work.  So although Redshift is intended to be a version 8 of PostgreSQL, we find there are some oddities within it that don’t match your typical PostgreSQL (should be no surprise).

Solution

To keep moving, I created a brute-force hack so that it forces all code paths into the “else” block, see my fork on github.  I’ll create a bug, and if someone knows how to tell the difference between Redshift PostgreSQL and normal version 8 PostgreSQL, let me know.

Here is the built .jar file from my source code, tested and it works against Redshift in schema workbench: https://github.com/garrettpatterson/pgjdbc/raw/master/postgresql.redshift.jar

Update – 9/23/2015

I realized I never updated this post with relevant info from an issue I filed on GitHub:

https://github.com/pgjdbc/pgjdbc/issues/79

Per the comments Redshift doesn’t exactly play by the same standards as the Postgres spec/codebase.  I agree that its technically an Amazon problem – and perhaps their latest set of jdbc drivers for Redshift account for this.

Tags:

One Response to “Connect Pentaho/Mondrian Schema Workbench to Amazon Redshift”

  1. rajiv

    awesome man , after 8 hours of contiouns research , i finally landed at your blog

    thank you very much

    Reply

Leave a Reply