Spark 2.0 Under the hood

Spark 2.0 is a major release of spark. This release packages, structured  API improvements(Unification of DataFrame,DataSet,SparkSession), MLIB model exports and various update in platform libraries. Spark 2.0 supports the features in Scala 2.12 and SQL 2003. It enables ease of development with the streamed lined API , faster and optimized way of code complication and advanced intelligence making it smarter.
The whole-stage code optimization improves the performance by 10x faster the existing one. The I/O operations are also optimized in builtin cache and parquet storage.

Spark 2.0 continuous to support the standard SQL & bring up new ANSI SQL parser extends the querying capabilities drastically over other libraries. On the API side Spark 2.0 will merge the DataFrame and Dataset APIs.

RDD , DataFrames & DataSets

RDD was the standard abstraction of Spark. But from Spark 2.0, Dataset will become the new abstraction layer for spark. Though RDD API will be available, it will become low level API, used mostly for runtime and library development

Dataset is a superset of Dataframe API which is released in Spark 1.3. Dataset together with Dataframe API brings better performance and flexibility to the platform compared to RDD API. Dataset will be also replacing RDD as an abstraction for streaming in future releases.Spark 2.0 supports infinite dataframes, It helps to run various functions (aggregate,GroupBy etc) on the whole data. 

Spark Session 

Spark 2.0 brings up single point of entry SparkSession is essentially combination of SQLContext, HiveContext and future StreamingContext. All the API’s available on those contexts are available on spark session also.Spark session internally has a spark context for actual computation.
When you start the interactive shell (spark-shell)

We see a sample of using spark SQL. Basically we can read data from various it can be used to read data from various types of input formats such as  CSV,JSON,ORC,AVRO,parquet, text, jdbc, table etc. Here is an example of reading  a data from amazon S3.
val hadoopConf = sc.hadoopConfiguration;

hadoopConf.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")

hadoopConf.set("fs.s3n.awsAccessKeyId", "yourKeyid")hadoopConf.set("fs.s3n.awsSecretAccessKey","your key")

Set the properties with amazon credentials and read it via the bucket URL.

val dF= spark.read.json("s3n://buket-name/*/*/*/*/*/*/*/*.json.gz")
This will read the files in S3 bucket  allows you to query it on the fly.



it reads as dataFrame and we can print the schema of the data read from the system
The speed it runs 10x faster than previous version Spark 1.6. Now the registerTempTable  method is deprecated  we use the new one

dF.createOrReplaceTempView("logStash")
Once the temporary view is created, with the spark session , queries can be triggered


spark.table("logStash")


spark.sql("select Records.eventName,Records.eventTime from logStash").show() 
The process works in the above way
The data is been returned as Warped array of rows.

Spark 2.0 Compiler update 

Spark plays a vital role in the performance which has keyed up many big data developers  start preferring it  over existing M/R engines. Spark 2.0 updated its execution layer to reduce the number of CPU cycles used by I/O operations happen during the process execution. Spark 2.0 ships with the second generation Tungsten engine. It will help to emit optimized bytecode at runtime that collapses the entire query into a single function, eliminating virtual function calls and leveraging CPU registers for intermediate data. This promises around 10X improvement in the  performance.

Spark Structured Streaming 

Currently Spark streaming API provides real time streaming and data processing. It evolved as the first attempt in the bigdata space in unifying batch and streaming computation. As a first streaming API called DStream and introduced in Spark 0.7. It offered highly scalable , fault tolerance systems with high throughput.Now Spark 2.0 enabled structured streaming. 

 It enables the applications to take decision in real time. Exiting system supported just streaming of data. To add more intelligent to the real time processing the Structured Streaming enables the feature to combine business logic with data streaming on the fly. Now the streaming API works as full stack, where the developers doesn't need to depend on external applications to apply logic on the streams.

The structure streaming will be a extension to DataSet/DataFrame API.The newly introduced the SparkSession will soon support the Streaming context soon . This unification should make adoption easy for existing Spark users, allowing them to leverage their knowledge of Spark batch API to answer new questions in real-time. Key features here will include support for event-time based processing, out-of-order/delayed data, sessionization and tight integration with non-streaming data sources and sinks.



Graph Database A Gist - NEO4J-Part 2

Please refer to the initial post of graph DB for basics Graph Database A Gist - NEO4J

NEO4J uses property model graph. So to associate various nodes in graph DB uses relationships.Relationships connect nodes.
(Note: Relationships are directional in NEO4j , if directions aren't defined Neo4j throws error)

Relationships are typed as
  1. Uni-directional
  2. Bi-directional
Volkswagen brands Polo
Volkswagen brands Vento
Volkswagen brands Jetta
Relationships should direction and type. It is very similar to properties with name/value pairs. In the above example the car  "BRANDS" relates various type of cars that volkswagen produces. There can be many relationships that connect two nodes.

Volkswagen BRANDS Polo RATING  5
Volkswagen BRANDS Vento RATING 4
Skoda BRANDS Rapid RATING 5
The pointing direction shows the traversal path.In general the directions can be incoming or outgoing .  

Monitor application usage statistics with Javamelody.


Javamelody is open source package. It measures various real time statistical data and computes them with respect to operations of a web application, depending on the usage of the application by users. It monitors application server, JVM, data usage, memory management and analyzes various issues in the developed application. It helps in finding delayed responses and optimizing it. It generates various charts for error in http-requests, sql-requests, JVM memory and user sessions. It also monitors CPU execution time on a request with which the developer can manipulate the performance. The package also gives an additional option in generating a scheduled email of the various statistics to the application admin.

HTTP Sessions:

Javamelody computes various HTTP sessions that a server handles for the web application in the given time period. It calculates average number of request per-minute and generates a report for the number of sessions with the session created time. Javamelody also generates the detail about various attributes that a session handles and its properties. It also handles a list of various HTTP errors in the application.


Sample chart of HTTP request graph
Javamelody has lot more details with respect to HTTP sessions. Below a sample that shows various requests at various times.


 

Statistics sql:

Javamelody calculates various SQL sessions that a database server handles for the web application in the given time period. It calculates average number of request per-minute for each query. Normally it generates a report with the number of sessions and session created time. Javamelody also generates the detail about various queries’ that a session handles and its properties. It also handles a list of various SQL errors in the application.

Sample graph for the list of SQL errors

Sample SQL Statistics graph


Javamelody calculates various SQL errors that occur in a session and generates a graph for the same. This data helps us to tune the performance of the application

Statistics JSP

Javamelody generates a list of various JSP page displayed in a given session. It calculates cumulative percentage time taken to load that particular page and develops a report with the number of JSP page hits per minute. Javamelody also generates the detail about various attributes that a session handles and its properties. It also handles a list of various JSP errors in the loading a JSP page


Sample JSP pages list.


Sample graph for % of JSP page errors in a day

Sample graph for JSP page hits/minute
Similar way the Javamelody generates report for CPU time, memory usage, currently active threads and open JDBC connection.
Sample graph for JVM memory usage



CPU Time for each execution

Graph Database A Gist - NEO4J

 Graph Database - The Comprehensive Landscape  


Graph Database as the name states, it's a database that uses various graph structures for storing & representing data. Its a NoSQL database which users graph theory to store, map and query the relationships. Since we say its a NoSQL all values are stored in (KV) pairs. Graph DB will handle more complex data and relationships, which are backed in at the data record level.

          When we say graph, we know its not going to be about tables, rows and columns. The graph DB uses nodes which will have  properties, labels etc. A record that is well known in a traditional DB is represented as an node. Every node will have values which is called properties.

For example 
 Any real world entity can be represented as a node. 
Let's take an example of a CAR

CAR is a node which contains the following properties

Name : Skoda 
Type  : Sedan 

A property can be a String, boolean, array, etc. We cannot have NULL as a property value, instead we can remove the property.

The nodes can be grouped by Labels. A Label can only be represented by String. it can't take up a numeric or alpha-numeric value.The Labels should Start with Camel-case and it can't properties tagged to them.

For example 

we have a Label Cars which will have following nodes
Name : Skoda              Name: Chev
Type : Sedan              Type : SUV

All the nodes are tagged to single Label Cars. A node can be connected to any number of labels or can also be left alone :). Similar to RDBMS graph DB can also have billions of records stored & grouped by labels.

NFC potentates payments in Near Future

In the E-commerce world ,NFC in the pool alongside many other payment technologies , we explore a fresh, innovative standing point through a stark comparison of NFC Payments versus all other types of technologies, to measure and plan your future strategy. Originally mobile payments and other mobile services, like mobile banking, relied on text messaging to complete transactions.  NFC offers the best of both worlds. Smartphones let a customer store multiple credit cards and other payment methods all in one device that the customer is likely to carry everywhere with them already. It cuts out the unnecessary hassle of texting or swiping through menus to make payments and yet still offers the security of a credit card. By offering a high level of compatibility with different companies and technologies, NFC can evolve into a one-step payment method that works anywhere the customer wants to make a purchase.

Jasper Server installation from war Distribution.

Jasper reports server can be installed with the ready installer package ,which is very easy task. But if you want to run it on your own application server you can install it from the war distribution. 

Pre-Installation alert : Since the jasper server use many AJAX tools for UI, it requires more run time memory . The inbuilt server in the installer package is fine tuned for best performance. So its advisable to increase the pergem space of your server to get the best performance. 

Download the following .

1) The Jasper reports server 5.0 war distribution can be downloaded from the following link


2) Database Driver . I used Mysql.
 Download the JDBC driver, mysql-connector-java-5.1.23-bin.jar 

3) Application server . Here i used Tomcat application server 7.x.x.

Set up JAVA_HOME in the environment variable

Setting up Environment variable in windows 


Adding it to the class path 
we can install the server in two ways 
1) Manual 
2) Automatic installation 
To install the jasper from WAR distribution using the manual buildomatic steps:-
1) Extract the war in the desired location.
2) Copy the Mysql driver to the following location in the extracted distribution. 

                <extracted location>/buildomatic/conf_source/db/mysql/jdbc.
3) Copy mysql_master.properties file from <extracted location>/buildomatic/sample_conf and paste it to
     <extracted location>/buildomatic and rename it to default_master.properties.
4Edit default_master.properties file and change the following settings of database server and application server   according to your system.

    appServerDir=[path to tomcat application server]
dbUsername=root
dbPassword=password
dbHost=localhost

5) Open the Cmd prompt as Administrator (in windows ) SU (in Linux) and direct it to the jasper server extracted location \builddomatic and run the following commands.
     
js-ant create-js-db
Creates the Jasper Reports Server repository database
js-ant create-sugarcrm-db
js-ant create-foodmart-db
Creates the Jasper Reports Server and sample databases
js-ant load-sugarcrm-db

js-ant load-foodmart-db
js-ant update-foodmart-db
Loads sample data into the sample databases.
js-ant init-js-db-ce

js-ant import-minimal-ce
Initializes database, loads core application data.

Community edition
js-ant import-sample-data-ce
Loads the demos that use the sample data
Community edition
js-ant deploy-webapp-ce
Configures and deploys the WAR file to Tomcat
Commercial edition
js-ant init-js-db-pro

js-ant import-minimal-pro
Initializes database, loads core application data.
 Professional edition
js-ant import-sample-data-ce
Loads the demos that use the sample data
professional edition
js-ant deploy-webapp-pro
Configures and deploys the WAR file to Tomcat

6) Start the DB server and your application server

Check jasper server in the following link 


Follow the same steps that we followed for manual for setting up the server and use the following commands install

To install the jasper from WAR distribution using the Automatic buildomatic steps:-

cd < extracted location >/buildomatic
js-install.bat (Windows)
./js-install.sh (Linux and Mac OSX)
Installs JasperReports Server, sample data,
and sample databases (foodmart and
sugarcrm)
js-install.bat minimal (Windows)
./js-install.sh minimal (Linux and Mac OSX)
Installs JasperReports Server, but does not
install sample data and sample databases





Comparing cross platform development tools

Apache Cordova (Phonegap)  VS Appcelerator Titanium 


The prominent leap in the world of development is cross platform development. The code written once can be built and run any where.  This is greater advantage for both an developer and an business customer. From an business perspective, it saves the development cost and the time to release their product in all platforms . In the desktop application development Java played an very important role in platform independence. Now for the mobile development  there are two solutions that provide their unique way for the cross platform development.They are Phonegap and Titanium Studio . 

Phonegap 

Phonegap is an opensource framework , which helps to develop cross platform apps using HTML,JavaScript and CSS. It supports nearly 7 platforms to build the app with. It also supports many tools and frameworks to develop the User Interface. The best ones will be sencha because it gives you the look and fell of a mobile application. Phonegap also allows you to use the phone features with its in-built API'S and enhance your application.

Appcelerator  Titanium 

On the other hand the Appcelerator Titanium develops native apps using JavaScript . It doesn't support CSS3. Without these two it gives a good performance . It compiles the code with their native IphoneSDK or JVM for android. It also supports more than 15 cloud services to connect your app .


Comparison between both 


Titanium
Phonegap
Java Script
Yes
Yes
HTML5
Yes
Yes
CSS3
No
Yes
DOM based JS
No
Yes
Native Code
Yes
Yes
Native UI Performance
Yes
Yes






 

Contributors

Social Connect


View Sadagopan K V's profile on LinkedIn