Saturday, January 9, 2016

Mind the Fuzzy Gray Areas - Drawing Conclusions in Genetic Genealogy

I was inspired by an email from a cousin this morning to ponder the logic of genetic genealogy a bit.  Genetic genealogy is deceptively easy on some sites.  It's packaged and marketed by the big sites as an auto-magical way to find relatives you didn't know you had.  So, my experience is that folks get very excited by having these relatives show up and then make assumptions about their own heritage that I often find to be guesswork, at best.  The thing is, your relation to to a person and the conclusions you can draw from that connection are not nearly as clear cut as it can sometimes look on whatever website you're using.

I should also preface this by saying, that to the average person, I'm probably the most annoying genetic cousin ever.  When a cousin tells me a conclusion they have drawn, I will, without fail, ask them for enough information so that I can re-discover it on my own and very often tell them that it's not a for sure conclusion - or maybe even wrong.  Sorry, not sorry, cousins!  Here's why.


Logic is defined as "reasoning conducted or assessed according to strict principles of validity."  It's the principal underlying math and science.  Were I to say 1 + 1 = 3, you could very quickly prove otherwise using a step by step reasoning.  This is logic.  We can use logic to help us prove things we don't already know - like who our third great grandparent is, based upon genetic genealogy.

There are two places where attempted logical cases often fall short in genetic genealogy:
  • Bad Data or Incorrect Conclusions - this is when one or more parts of your logic are incorrect either because logic was not used, the conclusions were drawn incorrectly or because the foundation assumptions are incorrect. 
  • Logical Fallacy - this is when the structure of your logical case allows for a false conclusion.  An example of this which is common to genealogy is an appeal to authority which goes something like "Jane Doe says John and Elizabeth are Gary's parents and Jane probably knows what she's talking about therefore John and Elizabeth are Gary's parents."  This is a fallacy because it's actually likely that Jane Doe is incorrect.  
Genealogy websites re ripe with really. Bad. Data.  People post family trees that are very, very wrong.  They make assertions that are not backed up by actual research or logic.  I, on a regular basis, encounter people who have assumed something that is just absolutely incorrect.  In the Norwood family, there is a family myth that is circulated about king Harold, which is absolutely untrue and has been proved to be untrue and that I can prove to be untrue... but yet it persists.  In "my" John and Patience Turner family, there are all kinds of stories about Patience that are completely false or un-proven that I encounter all the time. Some examples are that her maiden name was Barfield, that she was born in Guatemala or that she was mulatto.

Some are more naturally logical thinkers than others but logic is a practice that can be learned.  Once you understand logic and your mind is comfortable working logically, you can apply logic to anything, including genealogy.

Fuzzy Gray Areas 

Aside from the bad data of the humans involved in genealogy websites, there are several really fuzzy ray areas in genetic genealogy that make tough to use only genetic data to point to a specific MRCA. 
  • Recombination is entirely random.  While we sometimes consider it "safe to assume" that in every generation, half of each parent's genetic material gets passed on and therefore, half of each of their parents (meaning we have a quarter of each grandparent in us) comes to us and so on up the line, that isn't 100% accurate and sometimes can not be true at all making it not actually safe to assume at all.
  • There is a really fuzzy gray area in precise start and end points of a segment match, on genetic genealogy websites, in my experience.  I know this because I tend to use the actual numbers rather than the graphical interfaces and have data from gedmatch, ftdna and 23andme collated together.  Me and a match on 23andme have a segment start and end point. Me and that same match on ftdna and gedmatch wind up with three slightly different start and end points on each site leading to varying matching segment lengths.  This happens whether we had completely seperate starting tests or tested once and just imported to all three sites. A cM of difference could be the difference between the generations to MRCA calculation or the degree of relationship calculation - or even the calculation of whether we are related or not and makes working with smaller segments from consumer-friendly genetic genealogy websites entirely untenable.  The difference is entirely because of the algorithms used - it's the same two people and same genetic material, often the very same DNA test imported to multiple sites, just displayed on different websites.   Below is such visual example of what I mean.  The three non-blacked out lines are the matching segments between me and a matching cousin on three different websites.  Notice there is more than a cM of difference between FTDNA and the other two sites. 
  • People have secrets.  Those secrets were likely not written down and might not have been common knowledge if they weren't hidden entirely.  Adoptions, non-paternity events, double cousins, incest and a bunch of other scenarios involved in who had babies with who are often lost to researchable knowledge - but show up in our genes.  We cannot know, without the corroboration of solid research, that we are not encountering one of these scenarios with one of our genetic matches.  
The fuzzy gray areas, in general, will not hinder using genetic genealogy as a tool if you are mindful of them.  

How to Come to a Valid Genetic Genealogy Conclusion

  1. Study up on : 
    • Genetics. Know what is and isn't possible to a reasonable degree.  Genetic genealogy appears deceptively simple.  It's really not.  If you don't know about the fuzzy gray areas above, for instance, you will draw more incorrect conclusions than if you are aware of them.  Read, research, learn and understand how genetic material is passed down from generation to generation.
    • Logic. 
    • the difference between information, evidence and proof.  Know how to use information to formulate evidence in a way that proves or disproves a theory logically - and to use this ability to prove or disprove information you encounter and to create your own ancestral narrative.      
  2. A genetic genealogy conclusion must have both genetic genealogy evidence as well as genealogy research evidence.  Contrary to how it's marketed by some websites, genetic genealogy is not a magical ancestor finder.  It's one piece of a puzzle.  There is no way to accurately draw a conclusion from genetic genealogy without having some form of genealogy research in the mix.  It is possible to layer in one or the other at various parts of your genealogical case and maintain strong logical integrity but it is not possible to make an argument entirely of only genetics.  
  3. Question the source.  
    • If you're using a tool on a genetic genealogy website, understand how it works, what it does and what it doesn't do.  For example, ICW (in common with) tools, in general, tell you who is related to both you and another person (or two other people).  It doesn't tell you HOW each person is related and it doesn't guarantee that they all share one common ancestor.  Example : you find Person 1 in your matches and run the ICW tool to find out who you and person 1 have in common.  It gives you a list that includes Person 2.  It is possible for you and Person 2 to be related via MRCA2 and you and Person 1 to be related via MRCA1... two different MRCAs.  At face value, you don't know if you all three share the same ancestor or if you have two different ancestors in common.  In order to get value from the ICW tool, you have to know more about Person 1 and Person 2 than just that you are all related to one another.  The unwitting user of this tool might assume that just because you are all related, you must share the same ancestor.
    • If you get information from another person or a posted family tree or website, ask for their sources or how they came to know the information.
  4. Keep really good notes.  Keep a step by step for how you came to a conclusion. If you later find out that someone's family tree was wrong or that you misunderstood the results of a particular tool, you can then go back and correct anything that depended upon that data.  
  5. Be ok with being wrong.  In agile project managment, there is the concept of the Ispect and Adapt cycle that I find really useful to genealogy.  Be transparent and willing to openly discuss your process in the interest of inspecting your process so that you can potentially suss out shortcomings in the process or research and get more accurate results.  
  6. While aiming for black and white, get comfortable with the fuzzy gray areas.  Because of the nature of genealogy research being that we are researching connections between people who have usually long since died, there are very few 100% sure conclusions that can be drawn in genealogy.  The bottom line is that without being there and experiencing their stories first hand, we don't know what we're missing or have miscalculated.  At any time, some formerly unknown piece of information might float to the surface and disprove everything. Sometimes,depending upon the circumstances, as close as we can get is a most likely scenario, given the evidence we have.  So we do the best we can to do exhaustive research, come to logical, evidence based conclusions and be open to changing them should we find new evidence that suggests otherwise.  

No comments:

Post a Comment