The more you play with big data technologies, the more you end up circling back to the basic question “How does my end user get access/value?”. Currently we’re using Amazon Redshift as a down and dirty query window on top of our raw data. Recently, I’ve started to try and run Mondrian OLAP against redshift.
After getting through the initial server setup against MySQL, I tried connecting to my redshift cluster. Since Mondrian supports PostgreSQL out of the box, it works – but I ran into some key roadblocks where the subtle differences in Redshift fall short.
When trying to use the Mondrian Schema Workbench, it connects successfully to the database using the connection manager, but in the JDBC explorer and when I add a new Table to a cube I only get a list of schemas but not the databases underneath them. Looking at the log output the error appeared to be:
org.postgresql.util.PSQLException: Unable to determine a value for MaxIndexKeys due to missing system catalog data.
You’ll notice in the extended logs it sucessfully grabs all of the schemas, but appears to die when getting metadata information from the “public” schema:
2013-08-12 13:30:55,473 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: initConnection 2013-08-12 13:30:55,780 DEBUG [mondrian.gui.JdbcMetaData] JDBC connection OPEN 2013-08-12 13:30:55,780 DEBUG [mondrian.gui.JdbcMetaData] Catalog name = hillsraw 2013-08-12 13:30:55,780 DEBUG [mondrian.gui.JdbcMetaData] Database Product Name: PostgreSQL 2013-08-12 13:30:55,780 DEBUG [mondrian.gui.JdbcMetaData] Database Product Version: 8.0.2 2013-08-12 13:30:55,781 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: initConnection - no error 2013-08-12 13:30:55,781 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: setAllSchemas 2013-08-12 13:30:55,887 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: setAllTables - information_schema 2013-08-12 13:30:55,887 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: Loading schema: 'information_schema' 2013-08-12 13:30:56,050 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: setAllTables - pg_catalog 2013-08-12 13:30:56,050 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: Loading schema: 'pg_catalog' 2013-08-12 13:30:56,211 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: setAllTables - pg_internal 2013-08-12 13:30:56,211 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: Loading schema: 'pg_internal' 2013-08-12 13:30:56,378 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: setAllTables - public 2013-08-12 13:30:56,378 DEBUG [mondrian.gui.JdbcMetaData] JdbcMetaData: Loading schema: 'public' 2013-08-12 13:30:56,668 ERROR [mondrian.gui.JdbcMetaData] setAllTables org.postgresql.util.PSQLException: Unable to determine a value for MaxIndexKeys due to missing system catalog data. at org.postgresql.jdbc2.AbstractJdbc2DatabaseMetaData.getMaxIndexKeys(AbstractJdbc2DatabaseMetaData.java:64) at org.postgresql.jdbc2.AbstractJdbc2DatabaseMetaData.getImportedExportedKeys(AbstractJdbc2DatabaseMetaData.java:3196)
Conveniently the error message points me directly to the PostgresSQL JDBC driver on github. Tracing back through the code, the error appears to be centered around this section of code. From what I can guess, the first check of “haveMinimumServerVersion(“8.0″)” passes for Redshift, however the query to return the key information does not work in Redshift (returns emtpy result set). The query contained within the block for “haveMinimumServerVersion(“7.3″)” however, does work. So although Redshift is intended to be a version 8 of PostgreSQL, we find there are some oddities within it that don’t match your typical PostgreSQL (should be no surprise).
Solution
To keep moving, I created a brute-force hack so that it forces all code paths into the “else” block, see my fork on github. I’ll create a bug, and if someone knows how to tell the difference between Redshift PostgreSQL and normal version 8 PostgreSQL, let me know.
Here is the built .jar file from my source code, tested and it works against Redshift in schema workbench: https://github.com/garrettpatterson/pgjdbc/raw/master/postgresql.redshift.jar
Update – 9/23/2015
I realized I never updated this post with relevant info from an issue I filed on GitHub:
https://github.com/pgjdbc/pgjdbc/issues/79
Per the comments Redshift doesn’t exactly play by the same standards as the Postgres spec/codebase. I agree that its technically an Amazon problem – and perhaps their latest set of jdbc drivers for Redshift account for this.
awesome man , after 8 hours of contiouns research , i finally landed at your blog
thank you very much