Ujjwal Peshin
I am a Masters of Data Science student at Data Science Institute at Columbia University , graduating in December 2019. I am looking for exciting full-time opportunities where I can apply my Machine Learning and Data Science skills for the greater good. I have a strong Computer Science and Data Science intersection background.
I have worked at ZS Associates as a Data Science Associate for a year, where I majorly worked on Constrained Optimization, State Space Search and Imbalanced Classification problems in healthcare domain using Read World Evidence data. I graduated in Computer Engineering from Netaji Subhas Institute of Technology under the University of Delhi, which is now Netaji Subhas University of Technology, in 2017. I worked with Prof. Sushama Nagpal for my Bachelors project, Touchless typing for Charcater Recognition.
Email  / 
Resume  / 
LinkedIn  / 
GitHub
|
|
|
Data Science Intern, Small Business Card Data Team ( Capital One ) June '19 - August '19
I woked with the Business Entity Resolution team with the following objectives for my internship:
Scale current ER pipeline to multiple databases The current Entity resolution pipeline was adaptable to 3 datasets, which was made flexible to multiple datasets with the introduction of the intermediate blocking table. This increased the matching to 10% higher coverage.
Test production model resilience for new datasets The ER pipeline had a production ready machine learning model which would identify matches within the entities in the datasets, and I tested whether the model would perform just as well if some of the model features were removed, which would be a common occurance when you would work with multiple datasets, not all of which would have every feature to generate matching pairs.
Design a graph database of businesses and find use cases for the marketing team Generated a dataset of businesses where nodes would be businesses, executives of those businesses, and addresses where those businesses are located. Due to the inherent relationship of a graph database, graph algorithms can be used to find out important nodes in the relationship for marketing to traget.
|
|
Data Scientist, ADS and RWE Team ( ZS Associates ) July '17 - July '18
My role was to develop and use advanced data science techniques to solve difficult client problems in an efficient manner. Some of the projects are as follows:
Detecting patients at high risk of developing AFib in the next 3-6 months Produced multiple classification models to find prevalence of AFib using RNNs and advanced ensembling techniques, providing a potential upscale of $30 million to the client.
Clinical Trial Patient Pool Optimization Formed a state space search model to optimize Inclusion/Exclusion patient pool for clinical trials, which was
included in a pipeline to develop a complete clinical trial solution.
Physician Record Visualizer Prepared a deliverable using Django to display physician records using various data sources to provide an enhanced view to the client regarding each physician record.
Detecting ADRs using CNNs Developed a model to detect ADRs from tweets using CNN.
|
|
Research Assistant ( Columbia University Irving Medical Center )
January '19 - May '19
The aim of the project is to replace the student-actor interaction, on which students are graded, with a machine learning model which would be able to evaluate whether the student has asked a requisite number of questions related to their mental health, medical history, etc. I used TF-IDF and Sequence models to predict whether a sentence is asking question related to a particular criterion.
|
|
Research Assistant ( Earth Institute, Columbia University )
Mentored by Prof. Peter T. Coleman and Prof. Larry Liebovitch
Sep '18 - May '19
The aim of the project is to operationalize Negative Intergroup reciprocrocity, which is a measure of the response to a negative interactions betweeen groups. I used Reddit data to scrape subreddits for a measure to the interactions between the different political parties supporters using Google NLP and VADER for sentiment analysis.
|
|
International Conference on Machine Learning for Networking (MLN'2018) (Link) (Link)
Jan '17 - Jun '17
Developed a novel cost efficient system, using a webcam and an IR transmitter, to provide input to the system using CNNs and downsampling the image to MNIST, a dataset which has been researched upon extensively. The input can be as characters to the notepad, or as numbers to a calculator, or can be changed accordingly. Presented in the above conference and will be published as a post-proceedings in Springer's LNCS.
|
|