Ujjwal Peshin

I am a Masters of Data Science student at Data Science Institute at Columbia University , graduating in December 2019. I am looking for exciting full-time opportunities where I can apply my Machine Learning and Data Science skills for the greater good. I have a strong Computer Science and Data Science intersection background.

I have worked at ZS Associates as a Data Science Associate for a year, where I majorly worked on Constrained Optimization, State Space Search and Imbalanced Classification problems in healthcare domain using Read World Evidence data. I graduated in Computer Engineering from Netaji Subhas Institute of Technology under the University of Delhi, which is now Netaji Subhas University of Technology, in 2017. I worked with Prof. Sushama Nagpal for my Bachelors project, Touchless typing for Charcater Recognition.

Email  /  Resume  /  LinkedIn  /  GitHub

Work Experience

Data Science Intern, Small Business Card Data Team ( Capital One )
June '19 - August '19

I woked with the Business Entity Resolution team with the following objectives for my internship:

  • Scale current ER pipeline to multiple databases The current Entity resolution pipeline was adaptable to 3 datasets, which was made flexible to multiple datasets with the introduction of the intermediate blocking table. This increased the matching to 10% higher coverage.
  • Test production model resilience for new datasets The ER pipeline had a production ready machine learning model which would identify matches within the entities in the datasets, and I tested whether the model would perform just as well if some of the model features were removed, which would be a common occurance when you would work with multiple datasets, not all of which would have every feature to generate matching pairs.
  • Design a graph database of businesses and find use cases for the marketing team Generated a dataset of businesses where nodes would be businesses, executives of those businesses, and addresses where those businesses are located. Due to the inherent relationship of a graph database, graph algorithms can be used to find out important nodes in the relationship for marketing to traget.
  • Data Scientist, ADS and RWE Team ( ZS Associates )
    July '17 - July '18

    My role was to develop and use advanced data science techniques to solve difficult client problems in an efficient manner. Some of the projects are as follows:

  • Detecting patients at high risk of developing AFib in the next 3-6 months Produced multiple classification models to find prevalence of AFib using RNNs and advanced ensembling techniques, providing a potential upscale of $30 million to the client.
  • Clinical Trial Patient Pool Optimization Formed a state space search model to optimize Inclusion/Exclusion patient pool for clinical trials, which was included in a pipeline to develop a complete clinical trial solution.
  • Physician Record Visualizer Prepared a deliverable using Django to display physician records using various data sources to provide an enhanced view to the client regarding each physician record.
  • Detecting ADRs using CNNs Developed a model to detect ADRs from tweets using CNN.
  • Research Experience

    Research Assistant ( Columbia University Irving Medical Center )
    January '19 - May '19

    The aim of the project is to replace the student-actor interaction, on which students are graded, with a machine learning model which would be able to evaluate whether the student has asked a requisite number of questions related to their mental health, medical history, etc. I used TF-IDF and Sequence models to predict whether a sentence is asking question related to a particular criterion.

    Research Assistant ( Earth Institute, Columbia University )
    Mentored by Prof. Peter T. Coleman and Prof. Larry Liebovitch
    Sep '18 - May '19

    The aim of the project is to operationalize Negative Intergroup reciprocrocity, which is a measure of the response to a negative interactions betweeen groups. I used Reddit data to scrape subreddits for a measure to the interactions between the different political parties supporters using Google NLP and VADER for sentiment analysis.

    International Conference on Machine Learning for Networking (MLN'2018) (Link) (Link)
    Jan '17 - Jun '17

    Developed a novel cost efficient system, using a webcam and an IR transmitter, to provide input to the system using CNNs and downsampling the image to MNIST, a dataset which has been researched upon extensively. The input can be as characters to the notepad, or as numbers to a calculator, or can be changed accordingly. Presented in the above conference and will be published as a post-proceedings in Springer's LNCS.

    Teaching Assistantships

    Teaching Assistant: Applied Machine Learning, COMSW4995 - Spring 2019 [Columbia University] with Prof. Andreas Mueller

    Teaching Assistant: Elements of Data Science, COMSW4995 - Fall 2018 [Columbia University] with Prof. Bryan Gibson

    Homepage Credits: Thanks a lot, Jon!