APRIL (XINZHU) WEI

One book a week

7/13/2019

“I have always imagined that Paradise will be a kind of library.”
― Jorge Luis Borges

Written on the last day of 2019:
I planed to read on average one book per week at the beginning of 2019. Speedy progress was made in the first half of the 2019. In summer and autumn, I had faced a lot of troubles and spent all my time on maintaining my dying self-esteem. Towards the end of the year, I finally regained my peace and excitement in research but did not get the chance to rekindle the momentum for reading. Thanks to the early half of 2019, I still finished 54 books this year.

Progress (last update: Dec 31, 2019):
52 Finished (I like most of the books, but the books I select to read are based on personal taste; the ones I would recommend to anybody are in bold):
Biography/autobiography:

I will find you (by Joe Kenda)
Spaceman (by Mike Massimino)
Elon Musk (by Ashlee Vance)
Becoming (by Michelle Obama)
Promise me, dad (by Joe Biden)
American Prometheus (by: Kai Bird, Martin J. Sherwin)
Stride toward Freedom (by Martin Luther King Jr.)
A River in Darkness, One Man's Escape from North Korea (by Masaji Ishikawa)
I Will Be Gone in the Dark (by Michelle McNamara)
The Matriarch (by Susan Page)
The Professor and the Madman (by Simon Winchester)
Educated (by Tara Westover)

Fiction:

Frankenstein (by Mary Shelley)
Go Set a Watchman (by Harper Lee)
The Hitchhiker's Guide to the Galaxy, Book 1 (by Douglas Adams)
Song of Achilles (by Madeline Miller)
The First Fifteen Lives of Harry August (by Claire North)
The Hitchhiker's Guide to the Galaxy, Book 2 (by Douglas Adams)
Circe (by Madeline Miller)
Harry Potter and the Cursed Child Part One and Two Playscript (by J.K. Rowling)

Non-fictions:

WTF? (by Tim O'Reilly)
She has her mother's laugh (by: Carl Zimmer)
Never Split the Difference (by: Chris Voss)
Get Well Soon (by Jennifer Wright)
I Contain Multitudes (by: Ed Yong)
The Elephant in the Brain (by: Kevin Simler, Robin Hanson)
The Creature from Jekyll Island (by G. Edward Griffin)
Comet (by Carl Sagan and Ann Druyan)
Gene Machine (by Venki Ramakrishnan)
Blood in the Water (by Heather Thompson)
Astrophysics for people in a hurry (by Neil DeGrasse Tyson)
Hidden Figures (by Margot Shetterly)
Arrival of the Fittest (by Andreas Wagner)
Who We Are and How We Got Here (by David Reich)
The Rise and Fall of the Dinosaurs (by Steve Brusatte)
Freakonomics (by Steven Levitt)
A Most Improbable Journey: A Big History of Our Planet and Ourselves (by Walter Alvarez)
The Pleasure of Finding Things Out (Richard Feynman)
The Body Keeps the Score (by Bessel van der Kolk)
When (by Daniel Pink)
We Should All Be Feminists (by Chimamanda Adichie)
The Gene (by Siddhartha Mukherjee)
What If? (by Randall Munroe)
The Person You Mean to Be: How Good People Fight Bias (by Dolly Chugh)
The Moment of Lift (by Melinda Gates)
Brief Answers to the Big Questions (by Stephen Hawking)
So You Want to Talk about Race (by Ijeoma Oluo)
Neanderthal Man (by Svante Paabo)
Pandora's Lab (by Paul Offit)
The Character of Physical Law (by Richard Feynman)
Accessory to War (by Neil DeGrasseTyson)
Thinking in Bets (by Annie Duke)
Man's Search for Meaning (by Victor E. Frankl)
Hamilton the Revolution (by Lin-Manuel Miranda and Jeremy McCarter)

Currently reading:

In Search of Memory (by Eric Kandel)
The Breakthrough (by Charles Graeber)
Sorry I'm Late, I Didn't Want to Come (by Jessica Pan)
The Name of the Rose (by Umberto Eco)
A Mathematician's Apology (by G.H. Hardy)
On Writing Well (by William Zinsser)
Do I Make Myself Clear? (by Horold Evans)
Being Mortal (by Atul Gawande)
NeuroTribes (by Steve Silberman)
Evicted (by Matthew Desmond)
A Man on the Moon (by Andrew Ghaikin)
The Souls of Black Folk (by W. E. B. Du Bois)
Fire & Blood (by George R. R. Martin)

On the e-shelf:

The Science of Interstellar (by Kip Throne)
An Elegant Defense (by Matt Richtel)
An Astronaut's Guide to Life on Earth (by Chris Hadfield)
Raise Your Game (by Alan Stein, Jon Sternfeld et al)
What Hath God Wrought (by Daniel Howe)
Devil in the Grove (by Gilbert King)
A Random Walk Down Wall Street (by Burton Malkiel)
Hacking Darwin (by Jamie Metzi)
Can't Hurt Me (by David Goggins)
Where the Crawdads Sing (Delia Owens)
Frederick Douglass (by David Blight)
Water for Elephants (by Sara Gruen)
Born Survivors (by Wendy Holden)
Inferior (by Angela Saini)
Gathering Moss (by Robin Wall Kimmerer)
Why We Sleep (by Matthew Walker)
Broad Band (by Claire Evans)
The Big Picture (by Sean Carroll)
Little Fires Everywhere (by Celeste Ng)

The part below was written at the beginning and middle of 2019
Outlook:
Reading is a luxury, and if there is anything I feel privileged to have, that is my appetite, time, and money for books. At the beginning of this year, I decided to set a reading challenge for 2019 -- one book per week. So hopefully, I will finish 52 by the end of Dec. I did not think I could have nearly enough time and dedication to finish 52 books at the beginning, but halfway through the year, I am now convinced I don't have to limit myself by it.
Most of the books on my reading list are recent (published 2016-2019), but I also picked up a couple of classics that I haven't read when younger.
Resources:
Goodreads is a great website to find book reviews and new titles to add to your reading list; I also rely on book reviews from newspapers and magazines, best selling charts, and Bill Gates's blog. I spend time "reading" audio books in the morning with a slow breakfast and take time to get ready for the day (I highly recommend getting an audible membership and take advantage of the Amazon deals); in the evening before sleep, on days I don't have better things to do, I read Kindle books or iBooks on iPad (I actually have two spare Kindles, but I don't like the reading experience with them. The lookup function on the iPad is much better than Kindle); whispersync is a great invention that I enjoy, which I occasionally use if I would like to binge-read&listen to a book, but it can be expensive to buy both the Kindle and the Audible version of the same book. Mid of the year, I also discovered that library cards would allow me to deliver e-books to my kindle and listen to audiobooks through Libby. Although the libraries often only own best-sellers, and there can be a long waiting time and limited flexibility, it is very economical.
Subjects:
After teenager years, I found it harder and harder to finish fictions, which used to be something I enjoyed most when young. But occasionally, I can still be entertained by some authors' imagination and the more profound philosophical and moral problems in their stories.
Now I love reading biographies and autobiographies, especially the ones on people who have lived a very different life from mine. They allow me to see different upbringings and societies, and help me understand other people's struggles, pains, desires, and purposes of life. I find these books add new dimensions and perspectives to my simple mortal life.
I also love reading nonfiction, which covers topics I am interested in, especially on topics of science, technology, and economics. It fits my nature of being an infovore, and these books simply make me happy and entertained. I also love nonfiction on issues of human rights, legal and social (in)justice, moral, race and society. They make me think and reflect, on deeper and more crucial issues in the human history and current society, on how to make the world a better place, rather than focusing on my own interests and desires.

39 Comments

Dr. Tomoko Ohta, my scientific idol, on the Nearly Neutral Theory

7/8/2019

37 Comments

There is a social media debate about whether Dr. Tomoko Ohta should get the credit as the initiator and primary contributor of the Nearly Neutral Theory. In my mind, her name has always been synonymous to the Nearly Neutral Theory, so I was puzzled and wanted to figure out if I was wrong.

Since this is an important chapter in the history of population genetics, and because young scientists are often curious about what is it like to be a giant and to have worked with other giants, I decided to ask Dr. Ohta if she could clarify her contribution and comment on some other related issues. Luckily, my e-mail found her, and she replied.

Her narratives, help clarify the history to scientists who care about who and when, in addition to how and why, and shed light on how ideas and research evolve.

Here are a few things I have learned from my correspondence with Dr. Ohta. (With the permission from Dr. Ohta, I am sharing this exchange).

Regarding the history before the Nearly Neutral Theory.
“The idea of importance of slightly deleterious mutations goes back to H J Muller, Our load of mutations, 1950. The point is that it was based on phenotypes. So were the arguments of Kimura, Crow, Maruyama and others in 1960s. Also someone says that Darwin recognized existence of neutral variations. All these discussions were based on phenotypes.”

Regarding Kimura’s Neutral Theory and her own role in the Nearly Neutral Theory.
“Kimura’s neutral theory is concerned with the evolutionary changes at the molecular level. So is the nearly neutral theory. Now, my contribution is mainly on the behavior of nearly neutral mutant genes in populations, and on how it differs from the strictly neutral case. The prediction of the neutral theory on evolutionary rate and polymorphisms is simple and nice as everyone knows. Kimura liked the simple and elegant theory and did not like complicated problems. Once slightly deleterious (weakly selected) mutations are incorporated into account, the problem becomes very complicated. I thought that natural selection is not so simple as Kimura says, and presented the nearly neutral theory. Here interaction of selection and drift becomes very significant.”

Regarding Kimura’s work on Nearly Neutral Theory:
“Kimura recognized the significance of the nearly neutral theory in 1970’s, and published a couple of papers on this.But his discussion had been mostly on the strictly neutral mutations afterward.“

Regarding Neutral vs Nearly Neutral and their hot discussions:
“I and Kimura, sometimes including Crow, have had many hot discussions on the problem. I had often been criticized.”

Regarding how to move the field forward:
“Rather than going back to the credit problem, people should study about the recent progress on molecular processes of gene expression, that is very interesting. Numerous molecular interaction systems are working together at the chromatin level, for controlling gene expression patterns in various tissues and organs. It is remarkable that such complex systems evolved. I would like to say that the nearly neutral processes have been quite important for their evolution. Molecular machineries are connected directly or indirectly forming large network systems. We may need to investigate evolution at the systems level in the future.”

Regarding the work environment as a female scientist in Japan
In addition to these, Dr. Ohta mentioned that she did not have much gender problems in her research career, she has been able to discuss freely with people in Japan, and she thinks the research environment is good. She also mentioned that Kimura had been nice on this.

Epilogue:
After seeing what I have written, Dr. Ohta corrected me that she is not a giant, but she probably has been lucky that she could do her work at the best time in the beginning of molecular evolution. I have to respectfully disagree with her about the first part-- she has been a giant, she has worked side by side with another giant, she has worked on the shoulders of previous giants, and on her shoulder future giants will stand.

She also said: “I am very glad to know that the comments are useful for young people like you. From the time of Darwin, evolutionary biology is related to many different fields of biology, and nowadays many areas are developing so rapidly. So it is a difficult but interesting time for us.”

Indeed.

I very much appreciate Dr. Ohta taking the time to clarify the history to a random unknown trainee. Quoting a story that perhaps explains this "When I was young, I sent my letter and paper on slightly deleterious mutations to S Wright. He sent me back the letter saying how his shifting balance theory was different from mine, but his theory also might explain the data. I was happy to receive the letter, as some younger seniors did not respond to me. So I responded to you."

April Wei
July 6th, 2019

37 Comments

The origin of mistakes in research

1/31/2019

29 Comments

Mistakes can be costly. There are several types of mistakes that I have made in my research, some of them are not easy to identify, some of them once identified take serious amount of time to fix, some of them when fixed evolve into literal nightmares that haunt me at night, and perhaps, some of them still hide somewhere that are yet to be exposed. Here I summarize my mistakes in the past six years of research into six different types, to remind myself not to repeat similar mistakes that I have made. Hopefully, it would also provide something to people who read it.
Type A: logic flaws

In proposing hypotheses/model. This type of mistake occurs when I try to propose a mechanistic model to explain my observations/data/results or when I try to form a hypothesis for a question. It is easy to find an approximate explanation to explain a thing, most of the time. However, quite often, something could be missing. For example, if one of the steps has three possible scenarios, and only the first two are obvious and compatible with the observation, then it would be wrong to assume all possible scenarios are considered, because this kind of ignorance could lead to wrong prediction, over-stated conclusion, or worse, wrongly preferred hypothesis, or inaccurate model. Sometimes, several competing models could be tested, failing to identify the second-most probable model could create trouble, because fighting against a paper tiger is neither informative nor convincing. I am trying to keep this in mind, and keep asking the question "what else" and "how else" to avoid making this type of mistakes.
In learning and understanding. It occurs to me that sometimes I believe I have already understood something, but there is actually a key part I subconsciously fill in with my own assumptions which I have not yet fact-checked. It is always necessary to study the counter-argument thoroughly, if there is one, before choosing to believe in a different argument. I should also keep in mind, not to treat X as correct, unless I could explain why X is correct, and why everything else is wrong, in addition to why everything else claiming X is wrong is also wrong. Throughout the years, I learned that it is okay to say X works well for xxx data, but might not work when xxx, even if I provide pieces of evidence to support X; likewise, it is okay to say X does not work under xxx data but could work in other cases, even if I want to provide evidences against it; it is important to word things precisely, and to recognize the difference.

Type B: applying suboptimal/biased statistics without recognizing doing it.

Suboptimal: To be fair, this type belongs to the lack of experience rather than mistake, and a suboptimal unbiased test is still better than a biased test. However, the result of this is quite similar to making a mistake. As I have gained more research experiences, my judgment in choosing statistical tests also improves, but the change is gradual and slow. However, in the recent two years, I find it helpful to think about other people's tests from published papers, and think about how to improve them. It can be surprisingly common to find room for improvements even in published papers.
Biased: An example of using biased statistics is provided in Type E. Sometimes it could be challenging to figure out whether something is unbiased when you invent a new approach of testing things, which could be when the underlying distribution is unknown, or could be that a closed form cannot be achieved. I find it helpful to test with a simulation. There is once, a friend of mine did a test (something like first pair the data, then estimate A and B, and then A/B, then average), and I thought it should be less biased to do it in a different order. Fortunately, I overcame Type E and decided to do a simulation first, and find my presumption wrong.

Type C: ascertainment bias in coding and in observations.
Here I assume that observations are analytical results/outputs from some coded scripts. So I discuss coding and observation together. The most dangerous ascertainment bias in coding/scripting is that when the observations agree with expectations, it is actually due to a bug in the code. More often, when the observations look weird, the code would be revisited until things look highly likely. This is super dangerous. It could be more dangerous for small tasks which only need to be done once or twice than for bigger tasks which would be applied to data repeatedly. I find it helpful to do some testing even if the first time everything runs smoothly. Another thing I try to do is to avoid having any prediction and expectation, at least when I first analyze something. I also try to slow down and stay focus when I code for very simple things and only code when I desire to code, and I find it helpful.
Observations could be more general. It could be experimental results, which I have little familiarity. Ascertainment bias in observations could also be about acquiring knowledge from the literature, for example, if there is an argument in literature and a person reads more about one argument than the counter argument, the person may as well miss some valuable evidence on the other side. This could be similar to the bias in learning and understanding in Type A, but it is not due to logic but due to biased exposure/observation of the literature. This process could be entirely unintentional, or it could be subconscious. Another scenario could be similar to Type F, not observing existing literature on the topic of study could turn out to be quite a disaster.
Type D: missed important information from data.
This is right now my most painful experience in mistakes, and fortunately so far it occurred once. This is a scary mistake, not only because all the analyses need to be redone, but also because the previous observations may no longer hold, as well as all interpretations on top of them. Fortunately, in my case, the main result stayed the same. There can be at least two types of important information missed, especially when using public data.

Part of the data is missed: such as a variable which could be informative was not taken into account in the data processing, or a part of the data is presumed to be not existing in the database. Being more careful with all details about the database is thus important.
Part of the data pre-processing step is missed. This could occur when the people who published the data did some pre-processing to remove potential bias, or for some other purpose. This could be easily missed when not reading the previous publication into great details, thus learning about every single detail from the paper who published the data is key to avoid this type of mistake.

Type E: pride and prejudice.
The worst thing about pride is that it triggers me oversee my own mistakes. I struggle to eliminate the inner ego of myself, because otherwise, I could assess my questions and methods ignorantly. I know that I learnt this in the hard way. There was once, I ignored the first person's question on whether my test was unbiased, and did not try to prove it with a thorough simulation, and only found it was indeed biased when another person also instanced that the same. This experience makes me realize that I have to constantly remind myself to question my own judgement before other people question it, and I should definitely question my judgement if someone questions it. If something could be proven by a simulation or a mathematical proof, it might well worth the time.
Type F: reinventing the wheel.
Reinventing the wheel could waste a lot of time, and could also detriment the novelty of a project. It first occurred in the first project during my Ph.D., a method in detecting selection of overlapping genes. I didn't manage to find in literature that the wheel has already been invented until I was revising the first draft of the manuscript. Despite the two methods are quite different, and despite I eventually manage to publish mine as well, I have to spend a lot extras time to justify the new method, including scan for overlapping genes in the human genome, get examples, compare the speed and accuracy of the two methods, and etc. This type of mistake could be somewhat easily avoided by a more careful investigation of the literature. Nowadays, there are so many journals and so many papers, which does make it harder and harder to keep up with the literature. I adapt the following tactics to partially avoid this mistake: figure out all possible alternative terminologies and relevant concepts, go through the reference list of all key relevant papers, go through all papers that cited those papers, go through the work of relevant researchers, and if possible, ask someone who is more senior and knowledgeable to assess the topic.
Epilogue
Most people would not make as many mistakes as I do, or may not have made as many types of mistakes. Very often mistakes are inevitable, but could be fixed at an early stage to avoid future damage. Learning from the past, I found triple-checking being helpful, along with patience and keep calm when witnessing an exciting result.
Mistakes are not terrible, what is terrible is leaving a mistake uncorrected. It occurs to me that it is perhaps most important to have the mindset of knowing, that mistakes are made unintentionally; therefore, intentionally and constantly triple-check every possible step and correct them at the early stage could be more efficient than rushing for a quick result.

29 Comments

A forward-looking plan to release e-notes of my project

11/15/2018

46 Comments

Moving to California from Michigan and shipping seven five-subject notebooks (and leaving them on the shelf) made me realize the importance of keeping good records electronically. Despite that I started to keep hand-written notes with GoodNote on iPad, which instantly sync (and can search hand-writing words), I started to think about how to improve research notes.
More importantly, I would hope to improve transparency of research, as well as reproducibility, by all means. It may also help track down how a project turns and twists, and how new ideas and new tests come up.
I am trying to make it simple and informative.
The current format I am using is to have a section for notes of each week. It reminds me how little I have done. It also tells the pace of the project, in retrospective.
If I work on some technical parts of the project, I will include the technical problem and the solution and the failed trails.
If I plot some useful figures, I also include them.
If I read an interesting paper, I will comment and save their url.
If I have some new ideas, I will also write them down, which might or might not be directly related to the project.
I hope that once the project is done and the paper is published, I can prepare a downloading page to share my them.

46 Comments

My new painting of the giants in the evolutionary and population genetics field

4/20/2018

62 Comments

Afternote: Alexey Kondrashov, the knowledgeable and brilliant scientist who enlitghtens me often, mentioned that John Maynard Smith should also be here. So maybe, I will draw a new one some time later.

62 Comments

Coin a new term -- “Elf idea” as the counterpart of “Zombie idea”

11/8/2017

44 Comments

Economist John Quiggin coined term "zombie idea", originally for ideas in economy, which he explained as “Ideas are long-lived. They often outlive their originators, and, even when they have proved themselves wrong and dangerous, they are very hard to kill”.

Ecologist borrowed this, advertised by Dr. Jeremy Fox in his personal blog and also in his papers. The top "zombie ideas" he ranked are the intermediate disturbance hypothesis, r/K selection, species interactions are stronger and more specialized in the tropics, humped diversity-productivity relationships, “neutral” = “dispersal-limited”, and “neutral” = “drift” (see dynamicecology.wordpress.com/2016/02/02/lets-identify-all-the-zombie-ideas-in-ecology/). I personally found the term eye catching, but prefer to not label these ideas "zombie ideas". Although I am not an ecologist, I read all those arguments and found there was not enough evidence to conclude even some of the top ones (i.e. intermediate disturbance hypothesis, r/K selection, humped diversity-productivity relationships) wrong.

It is hard to determine, especially for some hard problems, that whether mixed evidences imply wrong hypothesis. As said by T. Huxley, “the slaying of a beautiful hypothesis by an ugly fact is the great tragedy of science”. It is probably more often than the opposite that beautiful hypotheses are abandoned even when problems are not even close to fully solved and conclusions are unclear, which I found true for those that were labeled as "zombie ideas". Isn't it a greater tragedy?

I'd rather call ideas that are simple, long lived, inspiring, and sometimes theoretically possible but short of empirical support or with mixed empirical evidence "elf idea", because elves live long and are elegant. As you see, by labeling ideas with different names and in an eye catching way changes people's feeling dramatically. By doing so, I don't mean to use this term as a paper tiger to fight against the other term, but rather, I hope, when a beautiful hypothesis is to be slayed, convincing evidence be shown as well. Moreover, the original hypothesis should be appreciated, even if it is definitely wrong, for its contribution in stimulating studies, new methodologies, new ideas, and for all the progresses made in the field because of it. After all, accidentally killing an elf is a big sin.

Nov 8, 2017
April