Blog post

Friday, July 28, 2017

Visualising Ancestry DNA matches-Part 7-Adding shared admin lines

I’ve loved seeing the comments on this blog, and posts on Facebook, describing success with these methods. Thank you for the positive feedback, and congratulations on your finds! We’re not finished yet…

If you're new to this series, the index will steer you through the previous posts.

In this post we’re going to squeeze more information from the match list. I’m going to show you how to quickly and easily see groups of kits that share the same administrator. I’m aware that Ancestry has recently made changes and in future each new adult’s kit will be registered in a separate account. I don’t know what this means for ‘admin’ data – but for now we have the information so let’s make the most of it.

The potential benefits of linking people with the same administrator are:

  • Identify clusters of closely related people within a busy graph.
  • Add relationship lines between distant (to you) matches who are closely related to each other. These connections may improve clustering calculations on a busy graph that uses distant cousins.
  • Add additional distant matches (who are not related to a fourth or closer cousin) to the graph.

Below is one of my groups. The newly created/identified edge lines are highlighted in red. I’ve had some success in asking kit administrators about the common ancestor of matches whose kits they manage.

image

Assumptions

The assumptions that we make matter. We need to be aware of the assumptions we’re making, because a wrong assumption can lead to a wrong interpretation. In this post, we’re assuming:

  • Each instance of the same administrator name is the same person.
  • All of our matches who are managed by the same administrator are related to each other.

These seem to be reasonable assumptions for my relatively sparse matches. As I investigate the groupings revealed, I can ‘skip’ lines if I think they’re not appropriate. So far I haven’t had to. This may not be the case for your kit – take due care.

Moving on – how to do this!

Add/identify shared matches with the same administrator

Once again, a few point and clicks on the right menus, and the job is done. There aren’t too many steps.

  • Click the Graph Metrics button on the NodeXL Basic ribbon.image
  • Clear the Overall graph metrics check box (it doesn’t matter if you don’t, but we’re not using them)
  • Tick the Edge creation by shared content similarity box
  • Select the Options… button
    image
  • An options box should appear. Select admin from the Analyze the contents of this column dropdown box
  • Set the Strength threshold for edge creation to 100% (we only want exact admin name matches)
    image
  • Click OK to accept the Edge Creation Metrics settings you have entered.
  • Click Calculate Metrics on the Graph Metrics dialog to start processing.

The new edges will take some time to process.

View the shared admin links

When processing finishes, Refresh the graph to apply the changes.

To see the new lines, move to the Edges worksheet. You will see a new column titled Shared Content. The newly created edges will be at the bottom of the sheet, with the relevant administrator’s name in the Shared Content column. Select all the new lines and you’ll see them highlighted in red on the graph.

If you have a graph with a lot of linkages between groups make sure that the between group links are set to show. If there are highlighted lines running between groups (and you think the assumptions we have made about administrators hold) this suggests that the clustering of matches could be improved. You may get a better result if you rerun your preferred grouping algorithm now.

Colour the new lines

The colouring instructions below are a quick fix. There are different ways to apply colour and we’ll do more with colour in a later post.

For now, highlight the rows with entries in the Shared Content column, then:

  • Right click any of the highlighted lines on the chart to access the right click menu.
    This can be a bit tricky. If you click a dot all the lines connected to that match will also be selected. Whoops! We don’t want that. If it happens, go back a step. Highlight the rows on the edges sheet, and try again.
  • Click Edit Selected Edge Properties…
    image
  • Select the colour you prefer and click OK.
    image

You may not be able to see the colour on the graph at first. Duplicate lines in the standard grey will be sitting on top of them. This is easily fixed – just sort the Shared Content column from Z to A so that the new entries move to the top of the page. Refresh the graph.

Remove unwanted lines

Skipping

If you administer kits for cousins from different branches of your family then new, incorrect lines will have been added. These can be dealt with by finding your name in the Shared Content column and ‘Skipping’ the offending lines (enter ‘Skip’ in the Visibility column on the Edge worksheet). Deleting the edge line entirely will also work. You will need to delete the lines again each time you recreate the shared admin links.

Alternative:
You can use a formula to specify the lines that should be skipped. The template uses Excel tables, which have special properties. If the Visibility column is all clear and you enter a formula it will automatically be entered into every row including new rows that are added later. No updating required.

You might have already noticed that some cells have a red triangle in the corner. When you hover over these cells a comment box will appear. The comment boxes contain useful information about use of each column and what the possible values mean. image

Taking a simple case where “YOURNAME” is the only value in the Shared Content column that you want to skip, a formula that will do the job is:

=IF([@[Shared Content]]="YOURNAME",0,1)

This formula tells Excel that if the value in the Shared Content column is ‘YOURNAME’ the value should be ‘0’ (which we can see from the comment box means ‘Skip’). Otherwise, the value is ‘1’ (which means ‘Show’).

Deleting

If you find that the shared admin lines are not suitable for your situation at all, simply delete the lines entirely. You won’t need the now empty Shared Content column – it can also be deleted.

Excel tip:
To remove the lines select any cell(s) in the row(s) you want to remove. On the Home ribbon click Delete, Delete Sheet Rows. This won’t work if you have filtered the table to find the rows.  

image

Retain wanted information

When you ‘count and merge duplicate edges’ the first instance of an edge (starting from the top) is kept. Duplicates further down the sheet will be deleted – even if they add information such as Shared Content or 'skip’ instructions.

To make sure you retain the new admin lines when removing duplicates send them to the top of the worksheet.

  • Sort the Shared Content column from largest to smallest, then
  • Sort the Visibility column so that skip instructions are at the top
    • If using words in the Visibility column, sort from largest to smallest
    • If using a formula that results in a number, sort from smallest to largest

Then remove duplicates as usual.

Note:
You can also use the Shared Content column (or any other column) in addition to the vertices to determine if two edges match. This is useful to tell the difference between relationships from the ICW data, and relationships that were created only through having a shared administrator.
image

Coming up….

In the next post, we’ll supplement the graph with known ancestry information.