Latin American R/BioConductor Developers Workshop 2018

R

General information

Conference webpage: http://www.comunidadbioinfo.org/r-bioconductor-developers-workshop-2018/
Level: intermediate – advanced
Language: english
When: July 30 – August 3, 2018
Where: Classroom #4 of the Undergraduate Program on Genomic Sciences at the Center for Genomic Sciences, Cuernavaca, Mexico
Twitter: @CDSBMexico
Github: https://github.com/ComunidadBioInfo/R-BioConductor-Developers-Workshop-2018

@

Pre-requirements

Requisitos de conocimientos previos

  • Los participantes deberán tener conocimientos básicos del lenguaje de programación R: asignación de variables, lectura de archivos: read.csv, read.delim, read.table; estructuras de datos: matrix, dataframe, list; tipos de datos: character, numeric, factor, logical, etc; instalación y uso de paquetes.
  • Saber instalar paquetes de R.
  • Saber usar RStudio.

Requisitos técnicos

  • Computadora Personal
    Un mínimo de 8 GB de RAM, un ratón y espacio de disco suficiente para archivos de texto y archivos de imagen. Privilegios de administrador para instalar y ejecutar utilidades de RStudio.
h

Introduction

In recent years, biology has seen a rise in the use of technologies that enable high-throughput, quantitative, data-rich profiling of cellular states. As a result, the field now faces computational challenges to analyse such data. The R/Bioconductor project is an open source, open development software platform that provides tools to translate complex data sets into biological knowledge.

This workshop is aimed at students and researchers interested in the analysis of biological data. We encourage applications from experts in diverse disciplines, including but not limited to biologists, bioinformaticians, data scientists, software engineers and programmers and R users at large. The main goals of the workshop are:

  1. Teach participants the principles of reproducible data science through the development of R/Bioconductor packages.
  2. Turn bioinformatic software users into bioinformatic software developers.
  3. Foster the exchange of expertise and establish multidisciplinary collaborations.
  4. Create a community of Latin American scientists committed to the development of software and computational pipelines for biological data analysis.
  5. Help train bioinformatics instructors that can continue to grow in their local communities.

This workshop is part of a long-term project to create a community of developers from Latin America. We hope to hold regular meetings in the future (similar to BioC, EuroBioc and BioCAsia) where attendees present their own software contributions. To provide a welcoming environment please follow our code of conduct.

i

Program

Materials

Day 1: July 30, 2018
09:00 – 10:00 Inauguration in the main auditorium
10:00 – 10:30 Keynote Lecture I: From learning to using to teaching to developing R Leonardo Collado-Torres
10:30 – 11:00 Talk I: Example of Bioinformatics in Mexico Daniel Piñero
11:00 – 11:20 Coffee break
11:20 – 12:20 Creating a package Alejandro Reyes
12:20 – 12:40 Break
12:40 – 14:00 Version control with git and GitHub Selene Fernandez-Valverde
14:00 – 15:30 Lunch break
15:30 – 16:15 Open source software projects and collaborative development Selene Fernandez-Valverde
16:45 – 17:30 Package documentation Alejandro Reyes
17:30 – Welcome cocktail
Day 2: July 31, 2018
9:00 – 10:00 Keynote Lecture II Martin Morgan
10:00 – 10:30 Talk II: Example of Bioinformatics in Mexico: Using R-Shiny in Agrobiodiversity Alejandro Ponce-Mendoza
10:30 – 11:00 Coffee break
11:00 – 12:00 Best practices for writing efficient functions Martin Morgan
12:00 – 12:30 Break
12:30 – 14:00 Bioconductor: core package, common objects and extending classes Benilton de Sá Carvalho
14:00 – 15:30 Lunch break
15:30 – 17:30 S4 – system for object oriented programming Martin Morgan
17:30 – 18:30 Poster session
Day 3: August 1, 2018
9:00 – 10:00 Collaborative project organization and introduction Daniela Ledezma-Tejeida
10:00 – 10:30 Vignette writing with markdown/BiocStyle Benilton de Sá Carvalho
10:30 – 10:50 Coffee break/Event Photo (to be confirmed)
10:50 – 11:30 Unit testing and R CMD check Martin Morgan
11:30 – 12:10 Rcpp (Adding C/C++ code to R packages) Benilton de Sá Carvalho
12:10 – 12:30 Break
12:30 – 14:00 Debugging and Parallelization Martin Morgan
14:00 – 15:30 Lunch break
15:30 – 17:30 Working on a Collaborative Project
Day 4: August 2, 2018
9:00 – 10:00 Keynote Lecture IV Benilton Carvalho
10:00 – 10:30 Talk III: Example of Bioinformatics in Mexico
RLadies Community Experience
Teresa Ortíz
10:30 – 11:00 Coffee break
11:00 – 14:00 Working on a Collaborative Project
14:00 – 15:30 Lunch break
15:30 – 17:30 Working on a Collaborative Project
Day 5: August 3, 2018
9:00 – 10:00 Working on a Collaborative Project
10:00 – 10:30 Teams: Concluding remarks about the experience
10:30 – 11:00 Coffee break
11:00 – 12:00 Presentation of the package developed
Learned lessons during the project collaborative develop.
12:00 – 13:00 Evaluate projects and award ceremony
12:00 – 13:00 Closing remarks and community building

Instructors

Organizing Committee

Instructors

Martin Morgan, PhD

Dr. Morgan spent 10 years as an Assistant and then Associate Professor at Washington State University, before joining the Fred Hutchinson Cancer Research Center in 2005. At the Hutch, Dr. Morgan worked on the Bioconductor project for the analysis and comprehension of high-throughput genomic data; he has led Bioconductor since 2008. Dr. Morgan recently moved to Roswell Park Comprehensive Cancer Center in Buffalo, NY, where the Bioconductor project is now based.

Benilton S Carvalho, PhD

Statistician (B.S., M.Sc.), Biostatistician (Ph.D.)
Instructor: Statistics – Regression Models;
Database designer and developer: MySQL, PHP;
Teacher Assistant: Biocomputing, Statistical Computing and Statistical Methods in Public Health;
Developer: BioConductor – oligo, makePlatformDesign, crlmm, pdInfoBuilder.
Specialties: High-throughput genotyping, microarray preprocessing, statistical modelling, gene-expression analyses, genetic epidemiology, statistical computing, programming (R, C/C++, Matlab).

Selene Fernández V., PhD

Selene Fernandez is a bioinformatician/genomic data scientist studying the evolution of gene regulatory mechanisms underlying the phenotypic diversity and cell differentiation in multicellular eukaryotes. She is particularly interested in the evolution of regulatory roles of non-coding RNAs.
In addition to her academic roles, she participate in initiatives that link scientists and the general public (such as Mas Ciencia por Mexico, Clubes de Ciencia Mexico), programs to introduce researchers to computing (Software/Data Carpentry) as well as mentoring programs (Ekpapalek).
Expertise: Analysis of high-throughput (next generation) sequencing data, transcriptomics, genome annotation, non-coding RNAs.

Alicia Mastretta Yanes, PhD

The broad aspect of the research of Alicia is how Mexican biodiversity has evolved from a genetic perspective. This includes changes on species distributions due to historical climate fluctuations (e.g. the Pleistocene glacial ages) as well as the effect of human management and domestication.

María Teresa Ortiz, PhD

Maria is a data analyst with experience in ecology and marketing. My interests include Statistical modeling (hierarchical models, spatial statistics, Bayesian networks), Survey design and Survey data analysis, Machine learning, and Data visualization.

Alejandro Ponce-Mendoza

Hi!!! I studied Food Technology at UIA and my masters and PhD in Biotechnology and Bioengineering at CINVESTAV. I have several years in postdocs and jobs at many institutions (ECOSUR, UNSIJ, INIFAP, CONABIO, UAM-X). I’m interested in numerical ecology, data visualization and agrobiodiversity. Finally, I practice touring bike and admire Thomas Bernhard, W.G. Sebald and Glenn Gould.

Alejandra Medina Rivera, PhD.

Alejandra obtained her Ph.D. in 2012 from the Biomedical Sciences Program at the National University of Mexico (UNAM). Since her Ph.D., she has been focused on developing bioinformatic tools, and strategies to study gene regulatory mechanisms, most of the developed tools are now part of the Regulatory Sequence and Analysis Tools suite (RSAT, http://rsat.eu/). Currently, using computational approaches, her research will incorporate functional genomics data into Genome Wide Association Studies (GWAS), aiming to identify variants that lead to misregulation of gene expression.

Alejandro Reyes, PhD

Alejandro Reyes is a postdoctoral research fellow in Rafael Irizarry’s laboratory at Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health. He is interested in (1) understanding how transcript isoforms contribute to cellular phenotypes and disease conditions and (2) integrating multi-omic data to unravel molecular cancer phenotypes. In order to ensure reproducibility of results, he implements analyses in documented workflows, software packages and graphic interphases. He contributes to the Bioconductor project.

Leonardo Collado-Torres, PhD

Leonardo is a data scientist working with Andrew Jaffe at the Lieber Institute for Brain Development. He uses R packages daily and contributes to the Bioconductor project. Leonardo is interested in high-throughput genomics assays such as RNA-seq, developments in R and helping others get started in their R journey. He has been learning & teaching about R since 2008 and is a co-founder of the LIBD rstats club.