WHAT I DO
Analyze Data
I love poking around data, finding patterns in it and
visualizing it.
Develop New Statistical Methods
I like applying the latest cutting-edge methods when analyzing data and more than once, this has led me to invent new statistical methods.
Build Software
I can (and have) programmed in a variety of languages including C, C++, Qt, Java, R, SQL and awk/sed.
Where I've Work
These are the places I've worked!
Graduate Student in Statistics
August 2013 - May 2016
My principal objective in coming to Purdue was to attain a greater breadth of knowledge than I had before.
To this end, I spent my time at Purdue taking some of the most challenging and cutting edge courses offered both in the Department of Statistics, as well in related areas such as Computer Science and Industrial Engineering. In the process, I've developed an almost unmatched breadth of knowledge and a very unique set of skills.
Further, as a graduate student, I was required to teach undergraduates, and I'm proud to say that I consistently got some of the best teaching evaluations.
Researcher, Applied Statistics and Computing Lab
May 2010 - June 2013
The Indian School of Business is regularly ranked among the top five B-schools in India and is currently ranked 33th in the World.
My initial assignment at ISB was to help establish the Applied Statistics and Computing Lab. The Lab functioned partly as a statistical consulting service, and partly as a center of research; and I contributed equally to both areas.
I've consulted on a variety of projects, from purely academic ones involving Faculty research, to working with Admissions on surveys, both analysis and design.
The research I undertook I submitted to one of the top journals in Statistics, and it also got me invited to the one of the World's top conferences in Computational Statistics.
EDUCATION
And these are the places I got my degrees from!
Masters in Statistics
, August 2008 - May 2010
The University of Hyderabad is one of the highest ranked universities in India.
My GPA on graduation (10-point scale) was 9.1, the highest over several past and future batches.
Bachelors in Mathematics, Statistics and Computer Science
April 2005 - March 2008
Bhavan's is one of the best colleges affiliated to Osmania University. I obtained distinction in all my subjects of study.
RESEARCH
This is some of my original work in Statistics and Machine Learning!
Context Driven Exploratory Projection Pursuit
March 2011 - June 2012
The first problem that one encounters with multivariate data is that you can no longer "see" whats going on. So one must somehow reduce the data dimension. For instance, one common way to do this is principal components.
The problem with usual multivariate visualizations is that they are "Cut and Dried" methods - that is, given a dataset, they are deterministic, and will always result in the same visualization.
But in real data analysis, the analyst might have different goals at different times. For instance, the analyst might want to explore the data with a view to ultimately run a classifier, or the ultimate goal might be comparison to an older dataset. But there is no way to make our visualizations aware of our changing goals.
cepp is very novel idea, where we allow the analyst to include the context of the data analysis into the dimension reduction process. Using cepp, one can actually include the background information about the data analysis into the process of creating visualizations, so that different visualizations more suited to the task at hand will result.
https://www.academia.edu/1900062/Context_Driven_Exploratory_Projection_Pursuit
Drawing Interactively for Visual Exploration
February - September 2012
DIVE is a spin-off project from cepp. cepp allows us to change the visualizations according to our data analysis needs. Where do we go from there?
One thing that suggests itself is to use the same algorithm as part of an interactive software.
In DIVE, the analyst can actually "draw" point configurations, that are of interest to him. Thus, for instance, if there are two classes in the data, which are presently mixed, the analyst may draw so that the two classes separate.
The software then tries to find data projections that conform as best as possible to the user's request. This backend is really nothing more than a modification of the basic cepp algorithm.
This software was the subject of a talk I gave at the COMPSTAT Conference in 2012.
However, to actually build high performance software that can accomplish this task in real time is not easy, and work is still underway.
https://www.academia.edu/2519457/Exploring_Multivariate_Data_via_the_DIVE_system
Applying Dempster-Shafer to Machine Learning
August - December, 2014
This is some of my most fundamental work, although it is quite entirely theoretical. Dempster-Shafer theory is an extension of probability theory, and studying it, I had the unique idea of using it as a basis for Machine Learning.
In particular, I proposed one way to represent any machine learning computation as a Dempster-Shafer set (basic probability assignment).
From this theoretical basis, one can understand better why machine learning works, and use this understanding to extend existing algorithms and build new ones.
Talk 1:
https://www.academia.edu/21936241/k-Nearest_Neighbour_Classification_using_Dempster-Shafer_Theory
Talk 2:
https://www.academia.edu/21936788/Applying_Dempster-Shafer_Theory_to_Machine_Learning
Multiclass Visualizable Classification Using Projections
March - August 2015
The MuViCP classifier grew out of the DSML project. The fundamental question I started from was whether new machine learning algorithms could be built using Dempster-Shafer theory. If we are able to do this, this indirectly validates our theoretical insights.
The MuViCP classifier works in three steps. In the first step, it projects high dimensional data into two dimensional space using random matrices. In the second step, a very simple classifier is run on each projection. In the final step, we combine these classifications using Dempster-Shafer theory.
The classifier works extremely well when tried on several real data sets.
Software
Some of my software!
cepp
Ver 1.7 / Release / Jan 30, 2016
This package implements the cepp algorithm in the R language. It can also be used to compute the spatial quantiles of Chaudhury (1996). The package also implements the projection pursuit algorithms of Perisic and Posse (2005).
MuViCP
Ver 1.3.2 / Beta / Feb 22, 2016
This package implements the two main phases of the MuViCP classifier - the belief builder functions and the ensemble function.
It also implements the Dempster-Shafer Calculus for atomic sets in the R language.
(Some belief builder functions are yet to be implemented; hence the Beta status.)
Qt Graphics System
Ver 0.6 / Alpha
The Java language, and more broadly, the JVM Platform is slowly emerging as the data analysis platform of the future. However, it is quite surprising that so far, high performance graphics libraries for data visualizations are still lacking on the Java Platform. This causes serious impediments to the data analyst.
In this project, I proposed and built such a high performance graphics system that renders data using the well known Qt graphics library.Proposed features of the graphics system include plotting in layers, selectable points and other graphics elements, and programmable widget controls.
The software is still under active development.
Slides from the Talk:
Other Projects
This is some other interesting stuff that I've worked on.
A Spam Filter
December 2013
Most modern spam filters work by first reading all the emails, from which a machine representation of the contents is created.
The problem with this is that to build the machine representations, the machine must actually read all of the emails at some point. This is obviously is a violation of privacy.
In my project, I argued in favor of a spam filter that preserves the user's privacy in every aspect. In fact, the beauty of my spam filter is that it in fact never needs to see the emails themselves at any point, while still showing promising accuracy.
https://www.academia.edu/21594485/Email_Classification
RADAR Imaging
April 2014
One is used to thinking about data as a list of numbers, but in the modern age, data can take on a variety of forms. For instance, some geo-spatial satellites use radar to build images of the data. The waveform received back at the satellite is one example of data that occurs naturally as a function.
Now radar waveforms from each terrain type (lake, river, forest, etc) would presumably be different. Is it possible to cluster these waveforms so that we can put lakes together with lakes (say) without human intervention?
In this talk, I reviewed a very novel technique for such clustering that requires almost no assumptions.
https://www.academia.edu/21595360/Nonparametric_Clustering_of_Functional_Data
Reinforcement Learning
April 2014
Reinforcement Learning is a very broad class of optimization methods. These methods are often used on problems that are very high or infinite dimensional.
Partalas, Tsoumakas, Vlahavas (2009), propose using reinforcement learning to "prune" a ensemble of classifiers.
In this talk and paper, I review what classifier ensembles are, why pruning them is useful, and how reinforcement learning fits into the picture.
There is a final speculative part in the paper about using the same technology to build neighborhood point maps in high dimensional space.
Talk:
https://www.academia.edu/21785861/Ensemble_Pruning_Using_Reinforcement_Learning
Paper:
Things
Some other things I've done
French
I can speak and understand French. I am certified up to the A2 level (Basic Proficiency).
Software Foundations
I took CS 565: Programming Languages, a course on Software Foundations taught by Tiark Rompf, one of the designers of the Scala programming language
Undergraduate Teaching
Every semester that I've done classroom teaching, I've received some of the best reviews.
Graduate Teaching
Except my first semester, I've consistently been appointed the TA of the core computational statistics course of the department.
OR... LEAVE A MESSAGE
Copyright 2016