Tuesday, October 23, 2012

Configuring Hive metastore to remote database - WSO2 BAM2

Hive Metastore

Hive metastore is the central repository which is used to store Hive metadata. We use embedded H2 database as the default hive metastore. Therefore only one hive session can access the metastore. 

Using remote MYSQL database as Hive metastore. 

You can configure hive metastore to MYSQL database as follows. 

Edit hive-site.xml located at WSO2_BAM2_HOME/repository/conf/advanced/ directory.

  <description>JDBC connect string for a JDBC metastore</description>
  <description>Driver class name for a JDBC metastore</description>
  <description>username to use against metastore database</description>
  <description>password to use against metastore database</description>

Put MYSQL driver into WSO2_BAM2_HOME/repository/components/lib

Now You have successfully configured the hive metastore to MYSQL database. Now restart the BAM server. 

Saturday, October 6, 2012

A Fix for Huawei E220 connection issue with ubuntu 12.04

After installing Ubuntu 12.04, I faced an issue when connecting to the internet from my Huawei E220 dongle. So I did some google search and found a bug report relating this[1]. After going through this issue I found a workaround which fix the issue.

This is the workaround.

You should execute following command as root.

echo -e "AT+CNMI=2,1,0,2,0\r\nAT\r\n" > /dev/ttyUSB1 

Now try to connect your dongle again, it works for me until dongle is removed from USB port. Thanks Nikos for your workaround :)

Saturday, September 8, 2012

WSO2 Business Activity Monitor 2.0.0 released ....!!!!

We spent almost year for releasing the WSO2 BAM 2.0.0 after completely re-writing it twice from BAM 1.x.x to  BAM 2.0.0 according to new architecture, suggestions and improvements. Finally we released it today, below you can see the release note of the BAM 2.0.0 :)  

WSO2 Business Activity Monitor 2.0.0 released!

The WSO2 Business Activity Monitor (WSO2 BAM) is an enterprise-readyfully-open sourcecomplete solution for aggregating, analyzing and presenting information about business activities. The aggregation refers to collection of data, analysis refers to manipulation of data in order to extract information, and presentation refers to representing this data visually or in other ways such as alerts. The WSO2 BAM architecture reflects this natural flow in its design.
Since all WSO2 products are based on the component-based WSO2 Carbon platform, WSO2 BAM is lean, lightweight and consists of only the required components for efficient functioning. It does not contain unnecessary bulk, unlike many over-bloated, proprietary solutions. WSO2 BAM comprises of only required modules to give the best of performance, scalability and customizability, allowing businesses to achieve time-effective results for their solutions without sacrificing performance or the ability to scale.
The product is available for download at: http://wso2.com/products/business-activity-monitor

  • Key Features

    Collect & Store any Type of Business Events

    • Events are named, versioned and typed by event source
    • Event structure consists of (name, value) tuples of business data, metadata and correlation data
  • High Performance Data Capture Framework

    • High performance, low latency API for receiving large volumes of business events over various transports including Apache Thrift, REST, HTTP and Web services
    • Scalable event storage into Apache Cassandra using columns families per event type
    • Non-blocking, multi-threaded, low impact Java Agent SDK for publishing events from any Java based system
    • Use of Thrift, HTTP and Web services allows event publishing from any language or platform
    • Horizontally scalable with load balancing and high available deployment
  • Pre-Built Data Agents for all WSO2 Products

  • Scalable Data Analysis Powered by Apache Hadoop

    • SQL-like flexibility for writing analysis algorithms via Apache Hive
    • Extensibility via analysis algorithms implemented in Java
    • Schedulable analysis tasks
    • Results from analysis can be stored flexibly, including in Apache Cassandra, a relational database or a file system
  • Powerful Dashboards and Reports

    • Tools for creating customized dashboards with zero code
    • Ability to write arbitrary dashboards powered by Google Gadgets and {JaggeryJS}
  • Installable Toolboxes

    • Installable artifacts to cover complete use cases
    • One click install to deploy all artifacts for a use case

Issues Fixed in This Release

All fixed issues have been recorded at - http://bit.ly/Tzb1VP

Known Issues in This Release

All known issues have been recorded at - http://bit.ly/TzberZ

Engaging with Community

Mailing Lists

Join our mailing list and correspond with the developers directly.

Reporting Issues

WSO2 encourages you to report issues, enhancements and feature requests for WSO2 BAM. Use the issue tracker for reporting issues.

Discussion Forums

We encourage you to use stackoverflow (with the wso2 tag) to engage with developers as well as other users.


WSO2 Inc. offers a variety of professional Training Programs, including training on general Web services as well as WSO2 Business Activity Monitor and number of other products. For additional support information please refer to http://wso2.com/training/


We are committed to ensuring that your enterprise middleware deployment is completely supported from evaluation to production. Our unique approach ensures that all support leverages our open development methodology and is provided by the very same engineers who build the technology.
For additional support information please refer tohttp://wso2.com/support/
For more information on WSO2 BAM, and other products from WSO2, visit the WSO2 website.

We welcome your feedback and would love to hear your thoughts on this release of WSO2 BAM.
The WSO2 BAM Development Team

Sunday, June 10, 2012

JDBC Storage Handler for Hive

I was able to complete the implementation of Hive JDBC Storage Handler with basic functionality. Therefore I thought to write a blog post describing the usage with some sample queries. Currently It supports writing into any database and reading from major databases (MySql, MsSql, Oracle, H2, PostgreSQL). This feature comes with WSO2 BAM 2.0.0 release. 

Setting up the BAM to use Hive jdbc-handler. 

Please add your jdbc-driver to $BAM_HOME/repository/component/lib directory, before starting the server. 

Web UI for executing Hive queries.

BAM2 comes with a web ui for executing the Hive queries. Also there is a option to schedule the script

User interface for writing Hive Queries

User interface for scheduling hive script

Sample on writing analyzed data into JDBC 

Here I am going to demonstrate the functionality of writing the analyzed data into JDBC storage. In this simple example, We'll fetch records from a file then analyze it using hive and finally store those analyzed data into MySQL database. 

Records - These are the records that we are going to analyze.

bread   12      12/01/2012
sugar   20      12/01/2012
milk    5       12/01/2012
tea     33      12/01/2012
soap    10      12/01/2012
tea     9       13/01/2012
bread   21      13/01/2012
sugar   9       13/01/2012
milk    14      13/01/2012
soap    8       13/01/2012
biscuit 10      14/01/2012

Hive Queries

//drop tables if already exist
drop table productTable;
drop table summarizedTable;
//Load the file with above records
load data local inpath '/opt/sample/data/productInfo.txt' into table productTable;
summarizedTable( product STRING, itemsSold INT) 
STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler'
                'mapred.jdbc.driver.class' = 'com.mysql.jdbc.Driver',
                'mapred.jdbc.url' = 'jdbc:mysql://localhost/test',
                'mapred.jdbc.username' = 'username',
                'mapred.jdbc.password' = 'password',
                'hive.jdbc.update.on.duplicate'= 'true',
                'hive.jdbc.table.create.query''CREATE TABLE productSummary (product VARCHAR(50) NOT NULL PRIMARY KEY, itemsSold INT NOT NULL)');
insert overwrite table summarizedTable SELECT product, sum(noOfItems) FROM productTable GROUP BY product;

View the result in mysql.

mysql> select * from productSummary;
| product | itemsSold |
| biscuit |        10 |
| bread   |        33 |
| milk    |        19 |
| soap    |        18 |
| sugar   |        29 |
| tea     |        42 |
6 rows in set (0.00 sec)

Detail description on TBLPROPERTIES in storage handler.

Property name Required Detail
mapred.jdbc.driver.class Yes
The classname for the JDBC Driver to use. This should be available on Hive's classpath.
mapred.jdbc.url  YesThe connection url for the database.
mapred.jdbc.username NoThe database username, if it's required.
mapred.jdbc.password  No The database Password, if it's required.
hive.jdbc.table.create.query No
If table already exist in the database, then you don't need this. Otherwise you should provide the sql query for creating the table in the database.

mapred.jdbc.output.table.name  No
The name of the table in the database. It does not have to be the same as the name of the table in Hive. If you have specified the sql query for creating the table, handler will pick the table name from query. Otherwise you need to specify this if your meta table name is different from the table in database.
hive.jdbc.primary.key.fields YesIf you have any primary keys in the database table
hive.jdbc.update.on.duplicate No
Expected values are either "true" or "false". If "true" then the storage handler will update the records with duplicate keys. Otherwise it will insert all data. 

This can be use to optimize the update operation. The default implementation is  to use insert or update statement after the select statement. So there will be two database round trips. But we can reduce it to one by using db specific upsert statement. Example query for mysql database is 'INSERT INTO productSummary (product, itemsSold) values (?,?) ON DUPLICATE KEY UPDATE itemsSold=?'

hive.jdbc.upsert.query.values.order No
If you are using an upsert query then this is mandatory. sample values for above query will be 'product,itemsSold,itemsSold' //values order for each question mark 

hive.jdbc.input.columns.mapping No
This is mandatory if your field names in meta table and database tables are different. Provide the field names in database table in the same order as the field names in meta table with ',' separated values. example: productNames,noOfItemsSold. These will map to your meta table with product,itemsSold field names.

mapred.jdbc.input.table.name No
Used when reading from a database table. This is needed if the meta table name and database table name are different.

Sample on reading from JDBC.

Now I am going to read the previously saved records from mysql using hive jdbc-handler.

Hive queries

//drop table if already exists
drop table savedRecords;
STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler'        
             TBLPROPERTIES (                
                    'mapred.jdbc.driver.class' = 'com.mysql.jdbc.Driver',
                    'mapred.jdbc.url' = 'jdbc:mysql://localhost/test', 
                    'mapred.jdbc.username' = 'username',     
                    'mapred.jdbc.password' = 'password',
                    'mapred.jdbc.input.table.name' = 'productSummary');
SELECT product,itemsSold FROM savedRecords ORDER BY itemsSold;

This will give all the records in the productSummary table.

Sunday, April 29, 2012

How to remote debug Apache Cassandra standalone server

In order to debug the cassandra server from your favorite IDE. You need to add the following into cassandra-env.sh located in apache-cassandra-1.1.0/conf directory.

JVM_OPTS="$JVM_OPTS -Xnoagent"
JVM_OPTS="$JVM_OPTS -Djava.compiler=NONE"
JVM_OPTS="$JVM_OPTS -Xrunjdwp:transport=dt_socket,server=y,address=5005,suspend=n"

After adding this, once you start the server you can see the following line printed in cassandra console

"Listening for transport dt_socket at address: 5005" 

This the port that you specified in JAVA_OPTS. You can change it to some other value as you want.

Now configure your IDE to run on debug mode.

Now you can debug the apache cassandra server from your favorite IDE :)

Sunday, March 18, 2012

WSO2 BAM 2.0.0-Alpha 2 released..!!!!

After working hard on releasing BAM-2.0.0 Alpha2, We were able to release it on 13th March 2012.

This is the release note :

WSO2 team is pleased to announce the release of version 2.0.0 - ALPHA 2 of WSO2 Business Activity Monitor.
WSO2 Business Activity Monitor (WSO2 BAM) is a comprehensive framework designed to solve the problems in the wide area of business activity monitoring. WSO2 BAM comprises of many modules to give the best of performance, scalability and customizability. These allow to achieve requirements of business users, dev ops, CxOs without spending countless months on customizing the solution without sacrificing performance or the ability to scale.
WSO2 BAM is powered by WSO2 Carbon, the SOA middleware component platform.

The binary distribution can be downloaded at http://dist.wso2.org/products/bam/2.0.0-alpha2/wso2bam-2.0.0-ALPHA2.zip

  1. Service Data Agent - Sample to install Service data agent, publish statistics and intercepted message activity from Service Hosting WSO2 Servers such as WSO2 AS, DSS, BPS, CEP, BRS and any other WSO2 Carbon server with the service hosting feature
  2. Mediation Data Agent - Sample to install Mediation data agent, publish mediation statistics and intercepted message activity using Message Activity Mediators from the WSO2 ESB
  3. Data center wide cluster monitoring - Sample to simulate two data centers each having two clusters sending statistics events, perform summarizations and visualize them in a dashboard
  4. End - End Message Tracing - Sample to simulate messages fired from a set of servers to WSO2 BAM and set up message tracing analytics and visualizations of respective messages
  5. KPI Definition - Sample to simulate receiving events from a server (ex: WSO2 AS), perform summarizations and visualize product and consumer data in a retail store
  6. Fault Detection & Alerting - Sample to simulate receiving events from a server (ex: WSO2 ESB), detect faults and fire email alerts


  • Data Agents
    1. Pre built data agents - Service Data Agent for the WSO2 AS, DSS, BPS, CEP, BRS and any other WSO2 Carbon server with the service hosting feature and Mediation Data Agent for the WSO2 ESB
    2. A re-usable Agent API to publish events to the BAM server from any application (samples included)
    3. Apache Thrift based Agents to publish data at extremely high throughput rates
    4. Option to use Binary or HTTP protocols
  • Event Storage
    1. Apache Cassandra based scalable data architecture for high throughput of writes and reads
    2. Carbon based security mechanism on top of Cassandra
  • Analytics
    1. An Analyzer Framework with the capability of writing and plugging in any custom analysis tasks
    2. Built in Analyzers for common operations such as get, put aggregate, alert, fault detection, etc.
    3. Scheduling capability of analysis tasks
  • Visualization
    1. Drag and drop gadget IDE to visualize analyzed data with zero code
    2. Capability to plug in additional UI elements and Data sources to Gadget IDE
    3. Google gadgets based dashboard

Reporting Issues

WSO2 encourages you to report issues, enhancements and feature requests for WSO2 BAM. Use the issue tracker for reporting any of these.

Sunday, February 26, 2012

Setting up a Cassandra cluster using wso2 carbon

If you want to use WSO2 security model with Cassandra cluster here I'll show you, how you can setup a cassandra cluster using wso2 carbon.

First you need to download wso2 carbon (I am using version 3.2.2)

Then install cassandra feature by using the p2 repository from http://dist.wso2.org/p2/carbon/releases/3.2.2/ to wso2 carbon server.

This will install Cassandra 0.7 version to your carbon server.

Adding p2 repository (http://dist.wso2.org/p2/carbon/releases/3.2.2/)

Installing Cassandra 3.2.2 feature

After finishing the installation restart the carbon server. Now carbon server will work as your Cassandra server.

setup few more cassandra nodes using wso2 carbon as above according to your requirement.
You can follow the instruction given by this site for setting up the cassandra cluster.
The cassandra.yaml configuration file is located in $wso2carbon_home/repository/conf/advanced/ directory.

Add following configuration file (cassandra-auth.xml) $wso2carbon_home/repository/conf/advanced/ in order to view keyspaces using Cassandra Keyspaces ui (change the username and password accordingly).



Cassandra Keyspaces ui

Once you finish the configuration. You can check the status of the cluster by using Cassandra cluster ui or else You can use nodetool comes with Apache Cassandra to monitor the cluster.

Cassandra cluster monitor ui 


$./nodetool -h -p 9999 ring -u admin -pw admin

Address         Status State   Load            Owns    Token                                    
                                                       113427455640312821154458202477256070485   Up     Normal  20.36 MB        33.33%  0                                  Up     Normal  251.64 MB       33.33% 56713727820156410577229101238628035242   Up     Normal  20.95 MB        33.33%  113427455640312821154458202477256070485

note: remote jmx agent port number in carbon server is 9999 + offset (default offset in carbon.xml is 0)

Thursday, February 16, 2012

Fixing ADB databinding issue when web service method returning OMElement

When I try to call a web service which return an OMElement, I faced above issue (My Axis2 version is 1.6.1). Following I have shown the steps that I did for fixing the issue.

This is the part of the stack trace.

org.apache.axis2.AxisFault: org.apache.axis2.databinding.ADBException: Any type  element type has not been given
    at org.apache.axis2.AxisFault.makeFault(AxisFault.java:430)
    at org.wso2.carbon.bam.presentation.stub.QueryServiceStub.fromOM(QueryServiceStub.java:8908)
    at org.wso2.carbon.bam.presentation.stub.QueryServiceStub.queryColumnFamily(QueryServiceStub.java:800)
    at org.wso2.carbon.bam.clustermonitor.ui.ClusterAdminClient.getClusterStatistics(ClusterAdminClient.java:148)

If you check the schema of the response element in your generated wsdl(by axis2) it should similar to this.

<xs:element name="queryColumnFamilyResponse">
                <xs:element minOccurs="0" name="return" nillable="true" type="xs:anyType" />

In order to fix the ADB databinding issue you need to change the above schema as follows and regenerate the stub code.

<xs:element name="queryColumnFamilyResponse">
              <xs:any processContents="skip"/>

Then ADB will generate code that represents the content of OriginalMessage as an OMElement and this will fix your problem.