Daianna Gonzalez-Padilla

About me

Hey! Daianna here. I completed my Bachelor’s degree in Genomic Sciences at the National Autonomous University of Mexico (LCG-UNAM) in 2024. I am currently a researcher in the group of Dr. Stephen Burgess at the MRC Biostatistics Unit, University of Cambridge. My research applies causal inference methods to identify disease risk factors and potential drug targets that are relevant to specific population subgroups, with the goal of advancing prevention in at-risk groups and leading to the development of more effective, targeted treatments.

Along my academic journey, I’ve acquired solid foundations in statistics and bioinformatics that have allowed me to explore a broad range of scientific questions. My experiences as a researcher, student, and instructor, have shaped

However, my scientific aspirations go beyond the implementation of

A goal of mine is to contribute to conducting rigurous research by disseminating knowledge and training students and life scientists in data science.

want to exploit to teach others in a more impactfult way. I want to share with the scientific community. particularly with other students like me that may not have the same opportunities or academic background but also want to analyze datasets to answer biologically-relevant questions.

Motivation for this blog

Through conducting scientific research, collaborating with peers, and assisting in bioinformatics courses, I have observed two common problems when analyzing data:

In these interconnected times, with praiseworthy collaborative efforts such as the Bioconductor project we can easily develop, share, and use other people’s code, data, methods, and even complete packages for our own analyses. That represents an incredible opportunity for all of us to leverage, contribute, and improve popular and new-emerging computational tools for the reproducible analysis of biological data, no doubts! However, for students and novice researchers, and people coming from areas other than biostatistics, computational biology, or bioinformatics, some analyses may represent obscure-if not completely unknown–territories. People developing these algorithms often assume they have a specialized audience and tend to trivialize underlying statistical concepts and methods when describing their computational functions and packages, not to mention the poor or even missing documentation and support some of the authors offer (with notable exceptions such as limma and variancePartition, among others).
Second, nowadays it is incredibly simple to run a complete pipeline with a single function. That’s efficient and increases productivity but it also has diluted the needed understanding behind their use. I have found many people, including myself, deludedly thinking we master an analysis only because we have run software programs without errors and have received outputs. We may dominate the practice but that doesn’t imply nor guarantee we understand the theory.

It seems to me we are a generation of trained students that know how to run an analysis and obtain results, but don’t understand the analyses themselves; in some occasions, not even the reasons why we execute them. But this is not limited to undergrad or master students: you would be surprised by the number of PhD students, postdocs, and researchers that relate to this!

❗️❗️❗️ More alarming than the aforementioned I’d say, is not to be aware of why it is important to really understand the aims and foundations of the methods we implement. It is not until we do that, that we can make accurate and informed method selections based on the features of our data, detect unexpected and error-announcing results and interpret them correctly, map potential limitations of our analyses and draw rigorous meaningful conclusions from them.

Objectives

The purposes of the blog posts are the following:

To diminish those ◼️📦’s that many of these single-function methods represent, clearly showing how they operate mathematically and statistically.
To exemplify how to run these R/Bioconductor/Bash programs on real data, explaining their inputs and outputs, arguments and parameters.
To present the type of analyses you can implement with your own datasets to inspire you to explore further your data and outputs.
To show how to interpret the results.

In summary, this is a little of what I would have liked to read to feel more confident when applying a tool and explaining results derived from it.

Take-home messages

Finally, I want to share some very important lessons I have learned so far:

There’s no better source to understand a method than going to its original publication 📑 (yes, some of the last century!)
Documentation sites won’t answer many of your theoretical questions. Tutorials, if available, are more detailed materials with usage examples and practical explanations, but again, for theory and methods check the original articles.
Stay humble. If I have become aware of something these years, it is how ignorant we are, something we only learn, paradoxically, as we acquire more knowledge by studying and investigating 📚. Taking an arrogant attitude will only stop you from nourishing yourself with more learnings and ideas, and it will close doors for you. We never stop learning!

Feedback

💬 I hope you find these materials useful. Feel free to contact me for personal doubts, inquiries, or further discussion in the comment boxes and in any of my media shown below 👇🏼. I’d also appreciate your feedback and contributions to keep improving these contents. Have fun with your analyses!

The website was created using Quarto.