• WHAT I DO

    broken image

    Analyze Data

    I love poking around data, finding patterns in it and 

    visualizing it.

    broken image

    Develop New Statistical Methods

    I like applying the latest cutting-edge methods when analyzing data and more than once, this has led me to invent new statistical methods.

    broken image

    Build Software

    I can (and have) programmed in a variety of languages including C, C++, Qt, Java, R, SQL and awk/sed.

  • Where I've Work

    These are the places I've worked!

    broken image

    Graduate Student in Statistics

    August 2013 - May 2016

    My principal objective in coming to Purdue was to attain a greater breadth of knowledge than I had before.

     

    To this end, I spent my time at Purdue taking some of the most challenging and cutting edge courses offered both in the Department of Statistics, as well in related areas such as Computer Science and Industrial Engineering. In the process, I've developed an almost unmatched breadth of knowledge and a very unique set of skills.

     

    Further, as a graduate student, I was required to teach undergraduates, and I'm proud to say that I consistently got some of the best teaching evaluations.

    broken image

    Researcher, Applied Statistics and Computing Lab

    May 2010 - June 2013

    The Indian School of Business is regularly ranked among the top five B-schools in India and is currently ranked 33th in the World.

     

    My initial assignment at ISB was to help establish the Applied Statistics and Computing Lab. The Lab functioned partly as a statistical consulting service, and partly as a center of research; and I contributed equally to both areas.

     

    I've consulted on a variety of projects, from purely academic ones involving Faculty research, to working with Admissions on surveys, both analysis and design.

     

    The research I undertook I submitted to one of the top journals in Statistics, and it also got me invited to the one of the World's top conferences in Computational Statistics.

  • EDUCATION

    And these are the places I got my degrees from!

    broken image

    Masters in Statistics

    , August 2008 - May 2010

    The University of Hyderabad is one of the highest ranked universities in India.

     

    My GPA on graduation (10-point scale) was 9.1, the highest over several past and future batches.

    broken image

    Bachelors in Mathematics, Statistics and Computer Science

    April 2005 - March 2008

    Bhavan's is one of the best colleges affiliated to Osmania University. I obtained distinction in all my subjects of study.

  • RESEARCH

    This is some of my original work in Statistics and Machine Learning!

    broken image

    Context Driven Exploratory Projection Pursuit

    March 2011 - June 2012

    The first problem that one encounters with multivariate data is that you can no longer "see" whats going on. So one must somehow reduce the data dimension. For instance, one common way to do this is principal components.

     

    The problem with usual multivariate visualizations is that they are "Cut and Dried" methods - that is, given a dataset, they are deterministic, and will always result in the same visualization.

     

    But in real data analysis, the analyst might have different goals at different times. For instance, the analyst might want to explore the data with a view to ultimately run a classifier, or the ultimate goal might be comparison to an older dataset. But there is no way to make our visualizations aware of our changing goals.

     

    cepp is very novel idea, where we allow the analyst to include the context of the data analysis into the dimension reduction process. Using cepp, one can actually include the background information about the data analysis into the process of creating visualizations, so that different visualizations more suited to the task at hand will result.

     

    https://www.academia.edu/1900062/Context_Driven_Exploratory_Projection_Pursuit

    broken image

    Drawing Interactively for Visual Exploration

    February - September 2012

    DIVE is a spin-off project from cepp. cepp allows us to change the visualizations according to our data analysis needs. Where do we go from there?

     

    One thing that suggests itself is to use the same algorithm as part of an interactive software.

     

    In DIVE, the analyst can actually "draw" point configurations, that are of interest to him. Thus, for instance, if there are two classes in the data, which are presently mixed, the analyst may draw so that the two classes separate.

     

    The software then tries to find data projections that conform as best as possible to the user's request. This backend is really nothing more than a modification of the basic cepp algorithm.

     

    This software was the subject of a talk I gave at the COMPSTAT Conference in 2012.

     

    However, to actually build high performance software that can accomplish this task in real time is not easy, and work is still underway.

     

    https://www.academia.edu/2519457/Exploring_Multivariate_Data_via_the_DIVE_system

    broken image

    Applying Dempster-Shafer to Machine Learning

    August - December, 2014

    This is some of my most fundamental work, although it is quite entirely theoretical. Dempster-Shafer theory is an extension of probability theory, and studying it, I had the unique idea of using it as a basis for Machine Learning.

     

    In particular, I proposed one way to represent any machine learning computation as a Dempster-Shafer set (basic probability assignment).

     

    From this theoretical basis, one can understand better why machine learning works, and use this understanding to extend existing algorithms and build new ones.

     

    Talk 1:

    https://www.academia.edu/21936241/k-Nearest_Neighbour_Classification_using_Dempster-Shafer_Theory

    Talk 2:

    https://www.academia.edu/21936788/Applying_Dempster-Shafer_Theory_to_Machine_Learning

    broken image

    Multiclass Visualizable Classification Using Projections

    March - August 2015

    The MuViCP classifier grew out of the DSML project. The fundamental question I started from was whether new machine learning algorithms could be built using Dempster-Shafer theory. If we are able to do this, this indirectly validates our theoretical insights.

     

    The MuViCP classifier works in three steps. In the first step, it projects high dimensional data into two dimensional space using random matrices. In the second step, a very simple classifier is run on each projection. In the final step, we combine these classifications using Dempster-Shafer theory.

     

    The classifier works extremely well when tried on several real data sets.

  • Software

    Some of my software!

    broken image

    cepp

    Ver 1.7 / Release / Jan 30, 2016

    This package implements the cepp algorithm in the R language. It can also be used to compute the spatial quantiles of Chaudhury (1996). The package also implements the projection pursuit algorithms of Perisic and Posse (2005).

     

    https://cran.r-project.org/web/packages/cepp/index.html

    broken image

    MuViCP

    Ver 1.3.2 / Beta / Feb 22, 2016

    This package implements the two main phases of the MuViCP classifier - the belief builder functions and the ensemble function.

     

    It also implements the Dempster-Shafer Calculus for atomic sets in the R language.

     

    (Some belief builder functions are yet to be implemented; hence the Beta status.)

     

    https://cran.r-project.org/web/packages/MuViCP/index.html

    broken image

    Qt Graphics System

    The Java language, and more broadly, the JVM Platform is slowly emerging as the data analysis platform of the future. However, it is quite surprising that so far, high performance graphics libraries for data visualizations are still lacking on the Java Platform. This causes serious impediments to the data analyst.

    In this project, I proposed and built such a high performance graphics system that renders data using the well known Qt graphics library.

     

    Proposed features of the graphics system include plotting in layers, selectable points and other graphics elements, and programmable widget controls.

     

    The software is still under active development.

     

    Slides from the Talk:

    https://www.academia.edu/22064612/Graphics_for_Big_Data

  • Other Projects

    This is some other interesting stuff that I've worked on.

    broken image

    A Spam Filter

    December 2013

    Most modern spam filters work by first reading all the emails, from which a machine representation of the contents is created.

     

    The problem with this is that to build the machine representations, the machine must actually read all of the emails at some point. This is obviously is a violation of privacy.

     

    In my project, I argued in favor of a spam filter that preserves the user's privacy in every aspect. In fact, the beauty of my spam filter is that it in fact never needs to see the emails themselves at any point, while still showing promising accuracy.

     

    https://www.academia.edu/21594485/Email_Classification

     

    broken image

    RADAR Imaging

    April 2014

    One is used to thinking about data as a list of numbers, but in the modern age, data can take on a variety of forms. For instance, some geo-spatial satellites use radar to build images of the data. The waveform received back at the satellite is one example of data that occurs naturally as a function.

     

    Now radar waveforms from each terrain type (lake, river, forest, etc) would presumably be different. Is it possible to cluster these waveforms so that we can put lakes together with lakes (say) without human intervention?

     

    In this talk, I reviewed a very novel technique for such clustering that requires almost no assumptions.

     

    https://www.academia.edu/21595360/Nonparametric_Clustering_of_Functional_Data

     

    broken image

    Reinforcement Learning

    April 2014

    Reinforcement Learning is a very broad class of optimization methods. These methods are often used on problems that are very high or infinite dimensional.

     

    Partalas, Tsoumakas, Vlahavas (2009), propose using reinforcement learning to "prune" a ensemble of classifiers.

     

    In this talk and paper, I review what classifier ensembles are, why pruning them is useful, and how reinforcement learning fits into the picture.

     

    There is a final speculative part in the paper about using the same technology to build neighborhood point maps in high dimensional space.

     

    Talk:

    https://www.academia.edu/21785861/Ensemble_Pruning_Using_Reinforcement_Learning

     

    Paper:

    https://www.academia.edu/21786551/Pruning_Classifier_Ensembles_via_Reinforcement_Learning_and_its_possible_Application_to_combining_Data_Projections

  • Things

    Some other things I've done

    French

    I can speak and understand French. I am certified up to the A2 level (Basic Proficiency).

    Software Foundations

    I took CS 565: Programming Languages, a course on Software Foundations taught by Tiark Rompf, one of the designers of the Scala programming language

    Undergraduate Teaching

    Every semester that I've done classroom teaching, I've received some of the best reviews.

    Graduate Teaching

    Except my first semester, I've consistently been appointed the TA of the core computational statistics course of the department.

  • GET IN TOUCH

    broken image

    My Academia Page

    broken image

    Send Me Mail!

    broken image

    Connect with Me!

    broken image

    Look at My Code!

  • OR... LEAVE A MESSAGE