个人陈述,青春笔记

分子迷宫初探幽,芯片如镜照玄机。燕园夜绘染色谱,港岛晨析测序谜。

跨界思维破云障,临床算法济群黎。十年磨得屠龙技,笑看数据化虹霓。


这是我当年申请约翰霍普金斯大学(JHU)生物信息学博士时的个人陈述(personal statement, PS)。现在回头看我自己十多年前的文字,当时写这份陈述的过程让我有机会好好梳理自己的科研经历,也让我想明白了很多事情。我觉得最重要的不是那些具体的研究成果,而是在这个过程中逐渐形成的思维方式。

刚开始做研究的时候,我总是一头扎进技术细节里,觉得只要把实验做漂亮、把数据分析得精确就够了。后来通过和导师的交流,特别是看到审稿人的反馈,我才意识到,真正重要的是从数据中看到背后的生物学意义。这种从"怎么做"到"为什么做"的转变,让我对科研有了全新的理解。

在生物信息学这个领域,我最大的感受就是跨界思维的重要性。生物学和计算机科学看似是两个完全不同的世界,但当它们碰撞在一起时,往往能产生意想不到的火花。这种跨界的思维方式,让我在研究中总能发现一些别人可能忽略的问题。

还有一个让我感触很深的是,科研不应该只是实验室里的自娱自乐。当我开始思考如何把研究成果应用到临床诊断中时,才真正体会到科研的价值所在。看到自己开发的算法能够帮助提高产前诊断的准确率,那种成就感是无可比拟的。

最后,我想说的是,在这个快速发展的时代,保持学习的能力比掌握某个具体的技术更重要。生物信息学领域的技术更新速度实在太快了,只有不断学习、不断适应,才能在这个领域走得更远。

希望我的这些经历和思考,能给正在科研道路上探索的你们一些启发。每个人都有自己的科研故事,重要的是在这个过程中找到属于自己的方向。

下面是我的个人陈述原文和中文翻译。

My interest in bioinformatics started from my first research project in my sophomore year. We used the gene expression microarray to study the molecular mechanism of drug resistance in cancer. I can still remember the first time I used the bioinformatics software to analyze the microarray and marveled at the powerful use of bioinformatics techniques. Since then, my curiosity regarding the microarray and insatiable desire to explore unknown areas motivated me to switch my interest from molecular biology to bioinformatics.

我对生物信息学的兴趣始于大二时的第一个研究项目。我们使用基因表达芯片研究癌症中的药物耐药分子机制。我至今仍然记得第一次使用生物信息学软件分析芯片数据时,对生物信息学技术强大功能的惊叹。从那时起,我对芯片技术的好奇心和对未知领域的探索欲望促使我将兴趣从分子生物学转向了生物信息学。

I joined Prof. Jing-Dong Jackie Han's lab in Chinese Academy of Science, the leading and highest academic research institution in China, as one of the only two interns accepted from many applicants. My first project was to study the role of Smad1/4 in embryonic stem cell fate determination by chromatin immunoprecipitation with microarray (ChIP-chip). Learning from literatures and trial-and-error, I independently finished my work including the analysis of ChIP-chip, Smad1/4 motifs and gene expression microarrays in just few weeks. These results were showed as one the main figures in the paper, which was published on Genome Research. During this project, I gained first-hand bioinformatics knowledge and skills such as programming with Python and analyzing high throughput data. I was fascinated by its interdisciplinary nature of applying statistics and computer knowledge to understand biological processes in a comprehensive picture.

我在众多申请者中脱颖而出,成为仅有的两名被中国科学院(中国最高学术研究机构)韩敬东教授实验室录取的实习生之一。我的第一个项目是通过染色质免疫沉淀芯片(ChIP-chip)技术研究Smad1/4在胚胎干细胞命运决定中的作用。通过查阅文献和反复试验,我在短短几周内独立完成了ChIP-chip、Smad1/4基序和基因表达芯片的分析工作。这些结果作为主要图表之一发表在《Genome Research》上。在这个项目中,我获得了生物信息学的第一手知识和技能,如Python编程和高通量数据分析。我被其跨学科性质所吸引,即运用统计学和计算机知识来全面理解生物学过程。

From the reviewers' comments on our work during the paper review process, I realized that it is of more importance to address biological questions, rather than focusing the technique itself. I put more emphasis on exploring the biological meaning from the data since the next project which studied two transcription factors (TFs) in hepatic cell differentiation by ChIP-Seq. I identified a novel motif sequences that does not belong to any of these two TFs. My biology background motivated me to think deeper and I realize that this may be another unknown co-factor of these two TFs. Then, I performed several bioinformatics analysis to identify this TF and study their spatial relationship, which was further confirmed by following experiment. These results indicate a novel mechanism of these three TFs in regulating hepatic cell differentiation which no one has ever reported before. This work has been submitted to Cell Stem Cell and I'm the co-first-author.

从论文审稿过程中审稿人的评论中,我意识到解决生物学问题比关注技术本身更为重要。因此在下一个项目中,我更加注重从数据中探索生物学意义,这个项目是通过ChIP-Seq研究肝细胞分化中的两个转录因子(TFs)。我发现了一个不属于这两个转录因子的新基序序列。我的生物学背景促使我深入思考,我意识到这可能是这两个转录因子的另一个未知辅因子。随后,我进行了多项生物信息学分析来识别这个转录因子并研究它们的空间关系,这些发现随后得到了实验的证实。这些结果表明了这三个转录因子在调控肝细胞分化中的一个前所未有的新机制。这项工作已经提交给《Cell Stem Cell》,我是共同第一作者。

Besides basic biological research, I am also quite interested in utilizing the bioinformatics to solve problems in medical research and clinical application. Therefore after finishing my undergraduate study, I came to Prof. Dennis Lo lab, who is the first person applying next generation sequencing (NGS) technology on non-invasive prenatal diagnosis of fetal trisomy 21 in clinical research. However, this approach did not work well on trisomy 13 and trisomy 18. After extensive reading and carefully analyzing the data, I quickly found that this problem was due to the GC content bias in NGS data. Based on what I learned from ChIP-chip analysis, I developed a bioinformatics algorithm that normalizes the GC bias and applied this method on trisomy13 and trisomy 18 diagnosis. It significantly improved the detection rate of trisomy 13 and 18 from 36% and 73% to 100% and 92% respectively. After presenting the methods and results into a manuscript by myself, I submitted it to PLoS One as the first author.

除了基础生物学研究,我也对利用生物信息学解决医学研究和临床应用问题很感兴趣。因此在本科毕业后,我来到了盧煜明教授的实验室,他是第一个将下一代测序(NGS)技术应用于临床研究中胎儿21三体综合征无创产前诊断的人。然而,这种方法在13三体和18三体的诊断中效果并不理想。经过广泛阅读和仔细分析数据后,我很快发现这个问题是由NGS数据中的GC含量偏差造成的。基于我从ChIP-chip分析中学到的知识,我开发了一种校正GC偏差的生物信息学算法,并将其应用于13三体和18三体的诊断。这显著提高了13三体和18三体的检出率,分别从36%和73%提高到100%和92%。我独立撰写了方法和结果的手稿,并作为第一作者提交给了《PLoS One》。

I enjoyed the three years learning and research experience in bioinformatics area. I believe bioinformatics will become an indispensable tool in biology study, medical research and clinical application. For example, next-generation sequencing approaches, as mentioned in recent Nature review paper "Next-generation genomics: an integrative approach", have been more and more utilized to study many kinds of biology questions. Personalized medicine, as recently widely discussed in The New England Journal of Medicine, attracted more and more scientists' attention. However, how to exploit such huge amount data from ChIP-Seq, RNA-Seq and whole genome sequencing is still a bottleneck. These are also my major interest. I am also interested in regulatory mechanisms and the function of molecular networks. How does non-coding RNA, especially long non-coding RNA, function in the cell and how these regulation processes are related to the cell development, differentiation or human disease. Study the gene regulation in epigenetic and genomic level is also my interest. How do the DNA methylation and chromatin modification regulate gene expression especially in stem cell? How does genomic changes such as copy number change result in cancer? What is the function of non-coding regions such as transposable elements in human genome? How is it regulated and how does it impact the human genome? What is the association between genetic variants in the human genome and human disease? How genetic variation perturbs the regulatory network? My career objective is to establish myself as an independent, progressive, and highly collaborative investigator and also an educator.

我很享受这三年在生物信息学领域的学习和研究经历。我相信生物信息学将成为生物学研究、医学研究和临床应用中不可或缺的工具。例如,正如最近《Nature》综述文章"下一代基因组学:一种整合方法"中提到的,下一代测序方法已经越来越多地被用于研究各种生物学问题。个性化医疗,正如最近在《新英格兰医学杂志》中广泛讨论的那样,吸引了越来越多科学家的关注。然而,如何处理来自ChIP-Seq、RNA-Seq和全基因组测序的海量数据仍然是一个瓶颈。这些也是我主要感兴趣的领域。我还对调控机制和分子网络的功能感兴趣。非编码RNA,特别是长非编码RNA,如何在细胞中发挥功能,这些调控过程如何与细胞发育、分化或人类疾病相关。研究表观遗传和基因组水平的基因调控也是我的兴趣所在。DNA甲基化和染色质修饰如何调控基因表达,特别是在干细胞中?基因组变化如拷贝数变异如何导致癌症?人类基因组中非编码区域如转座元件的功能是什么?它是如何被调控的,又如何影响人类基因组?人类基因组中的遗传变异与人类疾病有什么关联?遗传变异如何扰乱调控网络?我的职业目标是成为一名独立的、进步的、高度协作的研究者和教育者。

JHU is an ideal place for me to continue my research training. The broad and modern curriculum in this program will provide me with a fundamental background in interdisciplinary fields. More importantly, it is equipped with great faculties and facilities and also places an emphasis on student research. I would be interested in working with Dr. Joel Bader, Dr. Michael Beer, Dr. Rachel Karchin and Dr. Aleksander S. Popel. I would be very lucky if given an opportunity to work with so many great scientists in this program for my graduate study.

约翰霍普金斯大学是我继续研究训练的理想场所。该项目广泛而现代的课程设置将为我提供跨学科领域的基础背景。更重要的是,它拥有优秀的教师队伍和设施,并且特别重视学生研究。我很希望能与Joel Bader博士、Michael Beer博士、Rachel Karchin博士和Aleksander S. Popel博士合作。如果有机会在这个项目中与这么多优秀的科学家一起工作,我将感到非常幸运。

Last updated