Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Table of Contents

...

I get an exception stating - ERROR {org.apache.hadoop.hive.ql.exec.ExecDriver} - Job Submission failed with exception 'java.io.IOException (Cannot run program "chmod": CreateProcess error=2, The system cannot find the file specified)' java.io.IOException: Cannot run program "chmod": CreateProcess error=2, The system cannot find the file specified. What is going wrong?

This happens when you try to run BAM on Windows without installing cygwin. The BAM analyzer engine depends on Apache Hadoop and Hadoop requires Cygwin in order to run in Windows. So, make sure that you have installed basic, net (OpenSSH,tcp_wrapper packages) and security related Cygwin packages if you are working on Windows before using BAM. After installing Cygwin, please update your PATH variable by appending ";C:\cygwin\bin". This is required since the default installation of Cygwin might not do this.

Can BAM do real time analytics?

BAM is built to do batch based analytics on large data volumes. However, it can do real time analytics by installing the WSO2 CEP feature on top of the BAM server. By design, BAM and CEP use the same components to send and receive data, making them compatible to process data. The WSO2 CEP server is a powerful real time analytics engine capable of defining queries based on temporal windows, pattern matching and much more.

Why does BAM use Cassandra?

BAM is intended to store large amounts of data, and Cassandra is a proven NoSQL store that allows to store TeraBytes and even PetaBytes of data with ease. It has no single points of failure and is very easy to scale. All these reasons chose us to choose Cassandra as the primary data store. But, this does not mean another data store cannot be supported. By extending necessary interfaces a different data stores can be plugged in as well.

I see that in the BAM samples, it writes the results to a RDBMS? Why does it do this?

The BAM does this for 2 reasons. One is to promote a polyglot data architecture, i.e. BAM initially stores data in Cassandra, but that does not mean everything has to be analyzed and stored back to Cassandra. It can be stored in a RDBMS, Cassandra or any other data store. The second is that there is extensive support for many 3rd party reporting tools such as Jasper, Pentaho, etc. already support RDBMSs. With this sort of support for a polyglot data architecture, any reporting engine or dashboard can be plugged into BAM without any extra customization effort.

I get a read timeout in the analytics UI after executing a query?

This happens when there is a large amount of data to analyze. The UI will timeout after 10 minutes, if the data to be processed takes more time than this.

How can I save summarized data into different database?

Put the jdbc diver in BAM_HOME/repository/components/lib/ and change the values of the following properties accordingly. 
'mapred.jdbc.driver.class' //jdbc driver class 
                'mapred.jdbc.url' // Connection URL 
                'mapred.jdbc.username' // Username 
                'mapred.jdbc.password' // Password 
'hive.jdbc.table.create.query' // DB specfic query for creating the table.

Once I run the hive query to persist data to H2 database I can't configure it to mysql ?

Workaround is to add "DROP TABLE {hive table name}" before all "CREATE TABLE ***" , You have to do this if you are doing any changes to meta tables in hive script.