About

I am working as a researcher in the Department of Biomedical and Clinical Sciences, at Linkoping University & Clinical Genomics Linkoping, SciLifeLab.

Workflow pipeline development:

I am passionate about development of bioinformatic pipelines for analyzing next-generation sequence data. Recent work includes a snakemake pipeline for single cell rna-seq data (10X Genomics) analysis https://doi.org/10.5281/zenodo.15090341. I also routinely use Nextflow pipelines for bulk transcriptomics data analysis.

I also refactored and configured the nanoDx pipeline to support local infrastructure, with a customized version of nanoDx, a pre-trained neural network model for DNA methylation‑based tumour classification using nanopore whole genome sequencing data.

I love creating Shiny apps for analysis and visualization. Two recent shiny applications that I developed are:

https://completeolink.serve.scilifelab.se
for end-to-end solution for OLINK (proteomics) data analysis
https://enrichedmassspec.serve.scilifelab.se
for downstream analysis of differentially expressed protein in Mass Spectrometry data (from Spectronaut software) – visualization, and enrichment analysis

Epidemiology & Biostatistics:

I have extensive experience working on registry-based cohort data. I have worked on a gigantic birth cohort (All Babies in Southeast Sweden, ABIS), and used methods of biostatistics to elucidate environmental, genetic, and immunological factors that increase the risk of developing auto-immune diseases.

Genome Assembly:

I worked on the de novo assembly and annotation of a non-model species, the Common snapping turtle. Through this project I was introduced to the interesting world of computational biology.It helped me gain considerable experience in dealing with high-throughput sequence data and high-performance cluster (HPC) environments.

Over the years, I have gained considerable hands-on experience in handling large-scale data sets including transcriptomics data, bar-seq data, and single-cell sequencing data. I thrive at terminal-based work in Linux/UNIX environments and code in a number of languages. The most used are Python, R and bash.

To summarise, the focus of my research these days have been:

Analysis of high-throughput datasets that provide insights into disease pathogenesis and biological realm in general.
Biostatistics of Epidemiological data

Debojyoti Das

News