feedburner
Enter your email address:

Delivered by FeedBurner

feedburner count

Final Report

Labels: , , , , , ,

Its been quite a while since I last blogged about the project status, and now a good summer of coding has come to an end (according to the program timeline). I should thank Mr. Stephen Shaw "decriptor" (for being a "kewl" mentor, Mr. Pascal Bleser "yaloki" for mavenizing the code, among other things and Mr. Bryen Yunashko "suseROCKS" for getting my project selected and finding a mentor for me. (It was fun to have three openSUSE board members involved in the project)

I would like to report my works done till the "firm" pen (or pencil) down date - 17/08/2009. The code and other things can be accessed at http://code.google.com/p/vaani

As mentioned in the proposal, the softwar consists primarily of two parts-

*Part 1: The text NLP part - which analyzes text inputs and tries to find common desktop activities that the user might be trying to convey through it.

*Part 2: The speech analyzer part - which converts an audio input to text, and lets the first part complete the rest of the process.

Part 1 (mostly present in vaani.shabd package) is fairly complete, currently it has the following plugins -

1. Instant message plugin - analyzes purple buddy list information, and uses dbus to open new chat windows in Pidgin (an Empathy plugin can be extended easily).

2. Application plugin - which right now collects information from the .Desktop files, and tries to find the required application based on the text.

3. Search plugin - this performs searches using the beagle-query command (to be upgraded to use beagle-dbus soon).

The framework is fairly clean, and new plugins can be added easily.

About the 2nd part (vaani.swar package), the approach was to have a grammar for each plugin, and then the Recognizer would use all of these grammars to convert speech commands to text. Right now, grammars for the instant message and application plugin are ready, however the 2nd part isn't functional yet, owing to some problems with grammar compilation by the sphinx system. Effort is currently been put into making it work asap.

The 0.1 release can be downloaded from here, although checking out from svn would be a better option. Also, we need to package the code soon, currently the best way to hack it is by opening the project in an IDE (I wrote in Netbeans) Please try, suggestions/contributions/criticism are always welcome.



WikiHome has been setup

Labels: , , ,

Hi, after procrastinating a lot on this, I've finally managed to set up the WikiHome for the project (here's the link: http://code.google.com/p/vaani/wiki/WikiHome).

Again thanks to my mentor, Stephen Shaw for the final push yesterday. The wiki is minimal now, but I hope it'll grow like all Wiki do :)

The changelog is on its way too.

Btw two good news-

1. Netbeans 6.7 has released which promises some good collaborative features.

2. I've set up Google Analytic for the http://code.google.com/p/vaani



Using D-Bus in Java

Labels: , , ,

I think one of the most fun things I've got to learn during Summer of Code is d-bus. I had to learn it to open instant message conversations in the Pidgin module. And I guess I can present a noobish tutorial to do something simple in Java using d-bus. Before you start, you can refer to this excellent manual here.

Pre-requisites-

1. you need JVM and JDK (the version I used was openJDK 1.6 taken from the standard repository)

2. dbus-java (which requires you to install libmatthew as well).

The task at hand is to make an empty note in Tomboy using a java program.

Step-1 Extend the required interface for the function you're looking to implement (you can use a software called d-feet which can help you analyze various buses, object paths and interface names)


package org.gnome.Tomboy;

import org.freedesktop.dbus.DBusInterface;
import org.freedesktop.dbus.DBusInterfaceName;

/**
*
* @author sourcemorph
*/

@DBusInterfaceName("org.gnome.Tomboy.RemoteControl")
public interface RemoteControl extends DBusInterface {

public String CreateNote();
}


[The annotation is important, and you need to mention the corresponding object path of the interface you're extending, here I have declared just one function that we needed, but you can choose from the available functions from d-feet].

Step-2 Write a main class to get an object of this interface type and execute the function.


/**
*
* @author sourcemorph
*/

import org.freedesktop.dbus.DBusConnection;
import org.freedesktop.dbus.exceptions.DBusException;
import org.gnome.Tomboy.RemoteControl;

public class NewClass {

private static String ObjectPath = "/org/gnome/Tomboy/RemoteControl";
private static String ServiceBusName = "org.gnome.Tomboy";
private static DBusConnection conn;

public NewClass() {
try {
conn = DBusConnection.getConnection(DBusConnection.SESSION);
RemoteControl c = (RemoteControl) conn.getRemoteObject(ServiceBusName, ObjectPath);
c.CreateNote();
} catch(DBusException ex) {
ex.printStackTrace();
}
}

public static void main(String [] args) {
NewClass n = new NewClass();
}
}


Fairly simple. By the way its something really admirable about D-Bus because now my Java code can interact with processes which could have been coded in C sharp, Python etc. I am going to call methods on the Pidgin interface through Java code, and learning D-Bus was totally worth the effort.

PS: thanks to my mentor Stephen Shaw for patiently referring me to the README files when dbus-java wasn't compiling.. :) and also for d-feet, its awesome!



Blah blah...

Labels: , , ,

I was reading this book today ("Speech and Language Processing - An Introduction to Natural Language Processing and Computational Linguistics and Speech Recognition" by Daniel Jurafsky & James H. Martin) and would like to share a few interesting tid bits I gathered in the opening chapter..

1. "... regardless of what people believe or know about the inner workings of computers, they talk about them and interact with them as social entities. People act towards computers as if they were people, they are polite to them, treat them as team members, and expect among other things that computers should be able to understand their needs, and be capable of interacting with them naturally" [doesn't that make my job simpler, this software should only understand polite commands and not the rude, mean ones :P ]

2. ELIZA (probably the first cool NLP application, written back in 1966) actually managed to fool people into believing that it were a Rogerian psychotherapist by simply rephrasing sentences inputted by them.

"ELIZA's deep relevance to Turing's ideas is that many people who interacted with ELIZA came to believe that it really understood them and their problems. Indeed, Weizenbaum (1976) notes that many of these people continued to believe in ELIZA's abilities even after the program's operation was explained to them."

[check this for a sample conversation with ELIZA : http://www.stanford.edu/group/SHR/4-2/text/dialogues.html]

The future looks all hunky dory, doesn't it...

Btw my firefox too is showing some signs of being talkative, the last I heard was

"
(firefox:19328): Gdk-WARNING **: XID collision, trouble ahead

(firefox:19328): Gdk-WARNING **: XID collision, trouble ahead

(firefox:19328): Gdk-WARNING **: XID collision, trouble ahead
"



"Vaani", new project started

Labels: , , , ,

Hi, after a week of contemplation (:P), when my laptop was away with the nice service center guys (:D) , I finally figured out that "vaani" (means sound in Hindi) seems to be a cool enough title. The project has been hosted at code.google.com/p/vaani, and part of the initial code has been uploaded. You can check it out using svn (though its in considerably bad shape right now).

The package structure is--

1. sourcemorph.nlp.vaani -- for this project
2. sourcemorph.nlp.shabd -- for nl text to bash command ("shabd" means word in Hindi)
3. sourcemorph.nlp.swar -- for speech to nl text ("swar" roughly means voice in Hindi).



Tagging (not parsing)..

Labels: , ,

Part of Speech (POS) Tagging refers to a problem in NLP which requires to tag each word of a sentence in a natural language with an identification mark (related to its function in the grammatical structure of the sentence). For eg. NN for noun, singular, VB for verb, base etc. The first part of the project, which involves the text to bash command conversion will make use of POS tagging (I decide to scrap the usage of parser for now because tagging will provide enough data).

Here are the results of a POS Tagger (Claws), however since it is propreitary I will not be using this (alternatives are Open NLP Tools, Stanford University POS Tagger, Language Tool).

1.Play all songs by coldplay from album viva la vida and all songs by death cab for cutie (by Prateek Maheshwari)

Play_VV0 all_DB songs_NN2 by_II coldplay_NN1 from_II album_NN1 viva_NN1 la_FU
vida_NN1 and_CC all_DB songs_NN2 by_II death_NN1 cab_NN1 for_IF cutie_NN1

2. Find an application that edits photos. (by Prateek Maheshwari)

Find_VV0 an_AT1 application_NN1 that_CST edits_VVZ photos_NN2 ._.

3. Open bits mail. (by Nunna Jaikish)

Open_JJ bits_NN2 mail_NN1 ._.

4. Find TODO.txt in Home (by Brad Taylor)

Find_VV0 TODO.txt_NP1 in_II Home_NN1

5. Open this website related to the Indian history from the browsing history. (by me).

Open_VV0 this_DD1 website_NN1 related_VVN to_II the_AT Indian_JJ history_NN1
from_II the_AT browsing_NN1 history_NN1 ._.

The tagging is not completely accurate, as the "Open" in example 3 is incorrectly tagged as JJ ("adjective") instead of VV.

However, a few observations from the above examples-

1. The application that the user is trying to mention can be reasonably ascertained from the verbs.

2. The arguments for that command can be mined from the nouns or noun phrases (again reasonably).



Project Name?

Labels: , , , , ,

Well, it has started, but its still not titled.. and there has been limited progress in that regard--

Some of the few I had thought over were--

1. Vaani (hindi, meaning "Voice")
2. Shrimp (pointless albeit cool)
3. Psittacula (scientific name for parrot :P )
4. Voice-do (suggested by Stephen, my mentor)

Can anyone suggest a better one?