Code I/O

A topnotch WordPress.com site

uClassify: A JAVA SDK for uClassify’s on-demand text classification web service

3 Comments

Obtain the full source code for uClassify and the test module from: https://github.com/udy/UClassify

Text Analysis/Mining has been a topic of interest in the academia for a long time; with technological improvements, one in-theory concepts can now be offered to the masses on-demand and for free.  The web service has changed the programming model totally, and add to it the capability to churn large amounts of text in few seconds to classify them with accuracy, this is a blessing for processing unstructured documents on the fly.

Having investigated various technologies on text classification I must agree that text analysis services have evolved be fully utilized for creating software for the business class.  I believe that it is the most powerful area in Information technology, the applications of text analysis and classifications are known widely (Ref:  http://en.wikipedia.org/wiki/Text_mining).  My personal interest is to add some Business Intelligence sense to such classifications, hence there is a need to find quantifiable measures to evaluate the discoveries and enable business use-cases be mapped to such powerful technologies.  One simple idea to fit such text analysis in social networking is for streaming feeds and eliminating noise depending on user preferences.

Few of the services which can be consumed freely dazzled me in terms of ease of use, constantly improving accuracy and eliminating the need to purchase a complex solution and supporting it

Web services have opened up the text analysis platform from being in-premise and brought it to the cloud.  OpenCalais and uClasify; [http://lifencode.com/lifencode/technology/textelligence-of-ontology-taxonomy-and-text-mining/] in particular are capable of assisting in the process of building metrics based on text analysis and classification without much effort, which in my opinion is a value-add to the application providers to allow slicing-and-dicing of unstructured information.

Few of the interesting measures which can be used for building metrics are as follows:

  • age: age group the content is relevant for
  • language: language the content is written in
  • mood: happy/sad
  • tone: Business/Personal article
  • topics: This is very broad

Solution based on observation of  Patterns while using uClassify service (RESTful API invocation)

While experimenting the classification services from uClassify (RESTful, you can also evaluate the XML API if needed), I observed that, the API’s returns name:value pairs, where name is the classification term and the value is the numeric percent relevance of the discovery, and I’m sure you will agree that such numeric values are essential for analysis and comparison from time to time.

The need is for a Java based consumption library for uClassify, which encapsulates most of the repeated tasks involved in the process of using the API’s.  I’m presenting the entire source code under Apache 2.0 license, so that it can used and extended by the community.

NOTE: uClassify must be used as a complementary  service along with other text analysis frameworks to bridge gaps in technologies and business needs.


Here is a simple snipped of the service consumption:


/*
 **
	Copyright 2010 Udaya Kumar (Udy)

	Licensed under the Apache License, Version 2.0 (the "License");
	you may not use this file except in compliance with the License.
	You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
 **
 */
package org.onesun.textmining.uclassify.test;

import java.util.Map;

import org.onesun.textmining.uclassify.ResultHandler;
import org.onesun.textmining.uclassify.ServiceType;
import org.onesun.textmining.uclassify.UClassifyService;

public class UClassifyServiceTest {
	public void doTest(){
		for(ServiceType service : ServiceType.values()){
			String text =
				"A new survey has been launched in the United Kingdom to unearth the true nature of cyber stalking in the country."
				+ "n"
				+ "The Network for Surviving Stalking has issued an "Electronic Communication Harassment Observation" or ECHO questionnaire in collaboration with the scientists at the University of Bedfordshire."
				+ "n"
				+ "The survey has been commissioned to classify those who have been stalked on web and how according to a number of criteria."
				+ "n"
				+ "The questionnaire will ask respondents if they were harassed or threatened on a social networking site such as Facebook, Twitter and LinkedIn, email service or Instant Messaging."
				+ "n"
				+ ""At the moment there are very few widely agreed guidelines or rules about how to behave online - we hope Echo will define behaviours that are generally experienced as anti-social or likely to cause distress in online communication." said Dr. Emma Short, head of the project ECHO."
				+ "n"
				+ "The survey has been launched after Crown Prosecution Service (CPS) of the UK revealed a set of new guidelines for law enforcers tough on stalkers on web."
				+ "n"
				+ "Read more: http://www.itproportal.com/security/news/article/2010/9/25/study-reveal-nature-cyberstalking-uk/#ixzz10YckSmCr";

			// *******************************************************************
			// DO NOT FORGET TO SET YOUR OWN KEY HERE BEFORE RUNNING APP
			// You can get a key from: http://www.uclassify.com/Register.aspx
			// *******************************************************************
			UClassifyService.setUClassifyReadAccessKey(null);
			// *******************************************************************

			UClassifyService uClassifyService = new UClassifyService(text, service, new ResultHandler() {

				@Override
				public void process(ServiceType serviceType, Map<String, Double> results) {
					System.out.println(
							"---------------------------------------------------------------------n"
							+
							serviceType.getUrl() + " <<<>>> " + serviceType.getClassifier() + "n" +
							"---------------------------------------------------------------------n"
						);

					for(String key : results.keySet()){
						Double result = results.get(key);

						// interested in match >= 25%
						if(result >= 25) System.out.format("%1$-50s %2$10.2fn", key, result);
					}
				}
			});

			try{
				uClassifyService.process();
			}catch(Exception e){
				e.printStackTrace();
			}

		}
	}

	public static void main(String[] args) {
		UClassifyServiceTest miningTest = new UClassifyServiceTest();

		miningTest.doTest();
	}
}

The result produced is as follows: [The snipped was taken from http://www.itproportal.com/security/news/article/2010/9/25/study-reveal-nature-cyberstalking-uk/%5D

---------------------------------------------------------------------
http://uclassify.com/browse/uClassify <<<>>> Ageanalyzer
---------------------------------------------------------------------

51-65                                                   36.23
---------------------------------------------------------------------
http://uclassify.com/browse/uClassify <<<>>> Text%20Language
---------------------------------------------------------------------

English                                                100.00
---------------------------------------------------------------------
http://www.uclassify.com/browse/prfekt <<<>>> Mood
---------------------------------------------------------------------

upset                                                   25.72
happy                                                   74.28
---------------------------------------------------------------------
http://uclassify.com/browse/prfekt <<<>>> Tonality
---------------------------------------------------------------------

Corporate                                               99.63
---------------------------------------------------------------------
http://uclassify.com/browse/uClassify <<<>>> Topics
---------------------------------------------------------------------

Society                                                 88.75

The percentage matches found will help a lot in pushing the right content to the right audience. There by improving the quality of the service and making the audiences come back to the service often.

Obtain the full source code for UClassifyService and the test module from: https://github.com/udy/UClassify

Advertisements

3 thoughts on “uClassify: A JAVA SDK for uClassify’s on-demand text classification web service

  1. Hi Udy,

    I just saw your uClassify lib and wanted to ask you if you are ok with me linking your uClassify java repository from http://uclassify.com/ApiDocumentation.aspx (under third part)?

    Good work,

    Jon / uClassify

  2. Thanks, will do in a near future. Also thanks for the good words in your post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s