feedburner
Enter your email address:

Delivered by FeedBurner

feedburner count

Final Report

Labels: , , , , , ,

Its been quite a while since I last blogged about the project status, and now a good summer of coding has come to an end (according to the program timeline). I should thank Mr. Stephen Shaw "decriptor" (for being a "kewl" mentor, Mr. Pascal Bleser "yaloki" for mavenizing the code, among other things and Mr. Bryen Yunashko "suseROCKS" for getting my project selected and finding a mentor for me. (It was fun to have three openSUSE board members involved in the project)

I would like to report my works done till the "firm" pen (or pencil) down date - 17/08/2009. The code and other things can be accessed at http://code.google.com/p/vaani

As mentioned in the proposal, the softwar consists primarily of two parts-

*Part 1: The text NLP part - which analyzes text inputs and tries to find common desktop activities that the user might be trying to convey through it.

*Part 2: The speech analyzer part - which converts an audio input to text, and lets the first part complete the rest of the process.

Part 1 (mostly present in vaani.shabd package) is fairly complete, currently it has the following plugins -

1. Instant message plugin - analyzes purple buddy list information, and uses dbus to open new chat windows in Pidgin (an Empathy plugin can be extended easily).

2. Application plugin - which right now collects information from the .Desktop files, and tries to find the required application based on the text.

3. Search plugin - this performs searches using the beagle-query command (to be upgraded to use beagle-dbus soon).

The framework is fairly clean, and new plugins can be added easily.

About the 2nd part (vaani.swar package), the approach was to have a grammar for each plugin, and then the Recognizer would use all of these grammars to convert speech commands to text. Right now, grammars for the instant message and application plugin are ready, however the 2nd part isn't functional yet, owing to some problems with grammar compilation by the sphinx system. Effort is currently been put into making it work asap.

The 0.1 release can be downloaded from here, although checking out from svn would be a better option. Also, we need to package the code soon, currently the best way to hack it is by opening the project in an IDE (I wrote in Netbeans) Please try, suggestions/contributions/criticism are always welcome.



WikiHome has been setup

Labels: , , ,

Hi, after procrastinating a lot on this, I've finally managed to set up the WikiHome for the project (here's the link: http://code.google.com/p/vaani/wiki/WikiHome).

Again thanks to my mentor, Stephen Shaw for the final push yesterday. The wiki is minimal now, but I hope it'll grow like all Wiki do :)

The changelog is on its way too.

Btw two good news-

1. Netbeans 6.7 has released which promises some good collaborative features.

2. I've set up Google Analytic for the http://code.google.com/p/vaani



Using D-Bus in Java

Labels: , , ,

I think one of the most fun things I've got to learn during Summer of Code is d-bus. I had to learn it to open instant message conversations in the Pidgin module. And I guess I can present a noobish tutorial to do something simple in Java using d-bus. Before you start, you can refer to this excellent manual here.

Pre-requisites-

1. you need JVM and JDK (the version I used was openJDK 1.6 taken from the standard repository)

2. dbus-java (which requires you to install libmatthew as well).

The task at hand is to make an empty note in Tomboy using a java program.

Step-1 Extend the required interface for the function you're looking to implement (you can use a software called d-feet which can help you analyze various buses, object paths and interface names)


package org.gnome.Tomboy;

import org.freedesktop.dbus.DBusInterface;
import org.freedesktop.dbus.DBusInterfaceName;

/**
*
* @author sourcemorph
*/

@DBusInterfaceName("org.gnome.Tomboy.RemoteControl")
public interface RemoteControl extends DBusInterface {

public String CreateNote();
}


[The annotation is important, and you need to mention the corresponding object path of the interface you're extending, here I have declared just one function that we needed, but you can choose from the available functions from d-feet].

Step-2 Write a main class to get an object of this interface type and execute the function.


/**
*
* @author sourcemorph
*/

import org.freedesktop.dbus.DBusConnection;
import org.freedesktop.dbus.exceptions.DBusException;
import org.gnome.Tomboy.RemoteControl;

public class NewClass {

private static String ObjectPath = "/org/gnome/Tomboy/RemoteControl";
private static String ServiceBusName = "org.gnome.Tomboy";
private static DBusConnection conn;

public NewClass() {
try {
conn = DBusConnection.getConnection(DBusConnection.SESSION);
RemoteControl c = (RemoteControl) conn.getRemoteObject(ServiceBusName, ObjectPath);
c.CreateNote();
} catch(DBusException ex) {
ex.printStackTrace();
}
}

public static void main(String [] args) {
NewClass n = new NewClass();
}
}


Fairly simple. By the way its something really admirable about D-Bus because now my Java code can interact with processes which could have been coded in C sharp, Python etc. I am going to call methods on the Pidgin interface through Java code, and learning D-Bus was totally worth the effort.

PS: thanks to my mentor Stephen Shaw for patiently referring me to the README files when dbus-java wasn't compiling.. :) and also for d-feet, its awesome!



Blah blah...

Labels: , , ,

I was reading this book today ("Speech and Language Processing - An Introduction to Natural Language Processing and Computational Linguistics and Speech Recognition" by Daniel Jurafsky & James H. Martin) and would like to share a few interesting tid bits I gathered in the opening chapter..

1. "... regardless of what people believe or know about the inner workings of computers, they talk about them and interact with them as social entities. People act towards computers as if they were people, they are polite to them, treat them as team members, and expect among other things that computers should be able to understand their needs, and be capable of interacting with them naturally" [doesn't that make my job simpler, this software should only understand polite commands and not the rude, mean ones :P ]

2. ELIZA (probably the first cool NLP application, written back in 1966) actually managed to fool people into believing that it were a Rogerian psychotherapist by simply rephrasing sentences inputted by them.

"ELIZA's deep relevance to Turing's ideas is that many people who interacted with ELIZA came to believe that it really understood them and their problems. Indeed, Weizenbaum (1976) notes that many of these people continued to believe in ELIZA's abilities even after the program's operation was explained to them."

[check this for a sample conversation with ELIZA : http://www.stanford.edu/group/SHR/4-2/text/dialogues.html]

The future looks all hunky dory, doesn't it...

Btw my firefox too is showing some signs of being talkative, the last I heard was

"
(firefox:19328): Gdk-WARNING **: XID collision, trouble ahead

(firefox:19328): Gdk-WARNING **: XID collision, trouble ahead

(firefox:19328): Gdk-WARNING **: XID collision, trouble ahead
"



"Vaani", new project started

Labels: , , , ,

Hi, after a week of contemplation (:P), when my laptop was away with the nice service center guys (:D) , I finally figured out that "vaani" (means sound in Hindi) seems to be a cool enough title. The project has been hosted at code.google.com/p/vaani, and part of the initial code has been uploaded. You can check it out using svn (though its in considerably bad shape right now).

The package structure is--

1. sourcemorph.nlp.vaani -- for this project
2. sourcemorph.nlp.shabd -- for nl text to bash command ("shabd" means word in Hindi)
3. sourcemorph.nlp.swar -- for speech to nl text ("swar" roughly means voice in Hindi).



Tagging (not parsing)..

Labels: , ,

Part of Speech (POS) Tagging refers to a problem in NLP which requires to tag each word of a sentence in a natural language with an identification mark (related to its function in the grammatical structure of the sentence). For eg. NN for noun, singular, VB for verb, base etc. The first part of the project, which involves the text to bash command conversion will make use of POS tagging (I decide to scrap the usage of parser for now because tagging will provide enough data).

Here are the results of a POS Tagger (Claws), however since it is propreitary I will not be using this (alternatives are Open NLP Tools, Stanford University POS Tagger, Language Tool).

1.Play all songs by coldplay from album viva la vida and all songs by death cab for cutie (by Prateek Maheshwari)

Play_VV0 all_DB songs_NN2 by_II coldplay_NN1 from_II album_NN1 viva_NN1 la_FU
vida_NN1 and_CC all_DB songs_NN2 by_II death_NN1 cab_NN1 for_IF cutie_NN1

2. Find an application that edits photos. (by Prateek Maheshwari)

Find_VV0 an_AT1 application_NN1 that_CST edits_VVZ photos_NN2 ._.

3. Open bits mail. (by Nunna Jaikish)

Open_JJ bits_NN2 mail_NN1 ._.

4. Find TODO.txt in Home (by Brad Taylor)

Find_VV0 TODO.txt_NP1 in_II Home_NN1

5. Open this website related to the Indian history from the browsing history. (by me).

Open_VV0 this_DD1 website_NN1 related_VVN to_II the_AT Indian_JJ history_NN1
from_II the_AT browsing_NN1 history_NN1 ._.

The tagging is not completely accurate, as the "Open" in example 3 is incorrectly tagged as JJ ("adjective") instead of VV.

However, a few observations from the above examples-

1. The application that the user is trying to mention can be reasonably ascertained from the verbs.

2. The arguments for that command can be mined from the nouns or noun phrases (again reasonably).



Project Name?

Labels: , , , , ,

Well, it has started, but its still not titled.. and there has been limited progress in that regard--

Some of the few I had thought over were--

1. Vaani (hindi, meaning "Voice")
2. Shrimp (pointless albeit cool)
3. Psittacula (scientific name for parrot :P )
4. Voice-do (suggested by Stephen, my mentor)

Can anyone suggest a better one?



Volunteers needed!

Labels: , ,

Hi, as I mentioned in my first post, I need some samples of how people would give commands to their desktops in a natural language, for now just English. I guess it'll be easier if I could give some sort of a questionnaire. In it are a few desktop activities that one usually performs on a desktop, if you were to do instruct your computer to do these activities how would you formulate the sentence? You can take specific instances to create samples, for eg. for the first question you can prepare at least three sample commands, one for playing music by artist ("Coldplay" for instance), one for an album and one for a genre.

It'll be a most valued contribution, as it will help improve the strategy used in analyzing natural language commands. Please email your results to mohit.verma.in@gmail.com with subject "gsoc help" or post your samples as comments here.

Questionnaire:

1. You want to play music (you may mention the media player or not), based on artist, genre, album etc.

2. You want to install/upgrade a software.

3. You want to initiate an IM conversation with a friend (you may mention the protocol or not, for eg. yahoo chat).

4. You want to locate a file/folder in your home directory.

5. You want to find an application which does a particular task (probably mentioned in its description).

6. You want to browse a website (possibly present in the history or in bookmarks).

(If you feel there are other common desktop activities, then please mention them as well in the sample submitted).



System Design # 1

Labels: , , , , , , ,



Here's the basic layout of the system.

The layer will accept inputs in two forms-

1. Regular text
2. Speech

The speech input will first have to be converted to text
using a speech recognition system called Sphinx. Since this
conversion is usually error prone, the text will be enhanced
using knowledge of the system.

After this, it can be handled in a similar way to regular text input.

At the first phase, a parser will generate a tree and tags for
a given user command. For this, a statistical parser written in
Java at the Stanford University Natural Language Group will be used, it can be checked here.

After this, the analyzer will try to determine the kind of action the user wants to perform and then the application specific interpreter will try to find the arguments in the natural language text, for example if a user wants to play some music, the title, artist, genre etc will probably be mentioned in the text, that will have to be mined.

Several times, the system will not be completely sure of the result generated, hence user recommendation will be taken to improve the accuracy.



The summer begins (officially)

Labels: , , , ,

Hi all.. Hmm well, through this blog I hope to communicate to everyone interested, the progress of my GSoC Project, still untitlted (you could help there) project. The idea is to make a functional "Natural Language + Voice User Interface for openSUSE Desktop" (the abstract can be viewed here).

So here I am with a replaced motherboard and upgraded RAM in my laptop, all gung-ho to start my first dream project.. And at this very moment, I need some help :P

The project is about making a software layer which lets a computer understand a user's commands in a natural language (through text or speech), which is why I'd like some people to give in a sample of their natural language commands. To give a clue, if I could give commands to your computer in English, I'd say things like this --

1. "Privately message Nihar Joshi on Google Talk"
2. "Enqueue all the Coldplay songs in the player"

I do have a basic strategy in place to understand such commands, the first step being a parser, after which an application-specific analyzer will convert such a command to a bash command (which can be executed directly on a Linux platform). However, since commands in a natural language like English can be of a very wide variety and my strategy might be suffering from a lack of perspective, I'd like to take inputs from several sources to test my strategy.

Imagine that you do have such a system working on your computer, what kind of commands would you like to give to your computer in English? Please don't restrict yourself to my examples and try to think of all routine desktop activities you perform on your computer. Should you prefer anonymity you can mail me your list at mohit.verma.in@gmail.com with subject "GSoC help".

Cheers and looking forward to your support, (support open source, thats what good guys do :P )