Dual index adapters with UMIs resolve index hopping and increase sensitivity of variant detection

Dual index adapters with UMIs resolve index hopping and increase sensitivity of variant detection


Hello and welcome to this
Integrated DNA Technologies webinar, “Dual Index Adapters
With UMIs Resolve Index Hopping and Increase Sensitivity
of Variant Detection.” My name is Sean
McCall, and I will be serving as moderator
for today’s presentation. Today’s presentation will
be given by Dr. Nick Downey. Dr. Downey is an application
scientist at IDT, where he helps customers
design and troubleshoot experiments ranging from QPCR
to next generation sequencing and conducts internal
training and applications of oligonucleotides
and genomics research. Nick’s presentation should
last about 30 minutes. And following the
presentation he will answer as many questions
as possible from attendees. The question-and-answer
session will be conducted by Dave Kupec,
NGS product manager at IDT. As attendees, you
have been muted, but we encourage you to ask
questions or make comments at any time during or
after the presentation by typing into the
questions box located on the right-hand side of
your screen in the GoToWebinar control panel. Also, in case you need
to leave early or want to revisit this webinar, we
are recording the presentation and will make the
link to the recording available a few days after the
presentation on our website. We will also post the
recorded presentation on our YouTube
and Vimeo channels and post the slides on
our SlideShare site. You will receive links to
these in a follow-up email. So now let me hand over to
Nick for his presentation. Thanks, Sean. Good morning, everybody, well,
afternoon if you’re overseas. It’s my pleasure to
give this talk today. And hopefully you’ll
find it useful. So what we’re
going to talk about is this new kind of adapter
that we’ve just launched, called the xGen Dual Index UMI Adapter. And we’ll look at that in the
context of the regular NGS workflow that includes
sample multiplexing. And we’ll discuss some
issues with cross-talk between barcodes,
where that can occur, and how these new adapters
can help with that. And then we’ll
move on and discuss how we can accurately measure
the rare variants with UMIs. And then at the end, I’ll
show you how to order the– oligos and some
information for using them. So the XGen Dual
Index UMI Adapter, we sometimes call three-in-one
design for reasons I’ll get to in a moment. It’s designed for
Illumina sequencers and is compatible with
regular ligation-based library preparation, which includes the
standard shearing and repair A-tailing. And it’s compatible with
PCR-free library methods, as well as PCR-included methods. It’s going to reduce
sample cross-talk with dual, unique
sample indexes. And we’ll get to
that in a moment. It’s also ideal for
error correction and/or for counting applications
through the use of a degenerate, nine-base UMI. So what did I mean by
a three-in-one design? Well, we can utilize
these same adapters in three different
kind of contexts. In the first
context, you might be just interested in genotyping. So here, a little cross-talk
or lack of ultimate sensitivity is not so much of an issue. You’re just trying
to read your insert, and you’re looking for
germline mutations, which are going to be
either homozygous or heterozygous in your sample. And so all you need to do really
is read the single i7 index to de-multiplex your
samples and move on. So this is the simplest use. If you are trying to multiplex
or go a little deeper with your sampling, you
can set up in the same way, using the same adapters, but
now add on the i5 index read, and this will allow you to
become a little bit more sensitive and to screen out
some of these sample cross-talk events. And ultimately, if you’re going
after rare variants, which I’m assuming a lot of
you are interested in, you can then couple in
the additional read. During the i7 index read. you read the additional
nine nucleotides of the UMI, and this allows you, then,
to identify unique molecules and to build consensus
reads for your sample. So here’s an overview
of the NGS workflow. Of course, the adapters come
in at the beginning when we’re preparing our library. And then we’ll move on. For the most part,
people will be using some sort of target
enrichments in conjunction with the UMIs. So we’ll have some
enrichment hopefully with our xGen Lockdown Probes. Then we’ll move onto the
sequencing, of course, and then sample analysis. So for those of you
who are not quite so familiar with the Library Prep
Workflow, what would happen is you would take
your sample of DNA. If it was standard DNA,
you would then shear it, produce fragments, which would
perhaps have uneven ends. So we end repair
and then A-tail so that we have a
consistent structure at the end of every fragment. This allows us, then, to
ligate our adapters on, using the 12 nucleotides
of the duplex, plus this TA overhang
to make a 13 base pair complementary sequence. You can bead cleanup to
remove the unligated adapters and then amplify with primers
at the ends of the adapters, cleanup once again, and then we
can move on to the next step. So I’ve mentioned sample
cross-talk already. And so what does
that really mean? So in this case, we
have two samples, and we have barcodes,
which are indicated by the gray, blue, and
green little boxes here. So you can see that we can
tell the two samples apart because we have a
blue box on the side and a green box on the side. So we read the samples. And then we can do
de-multiplex grouping, all those reads that have
the blue barcode in one group and all those with the green
barcode in the other group. And now we can do the
analysis of the reads. However you can have situations
where in your sample one you get some fragments that
end up with the green barcode. So now this has gray on one
side, green on the other. So although this actually
comes from sample one, during your
de-multiplexing step, it’s going to be grouped with
the other sample two molecules. And so as you then
analyze that data, you’re reading
something that actually came from sample
one but thinking it came from sample two. And in this example, we
do sort of a converse step as well, where we
identify this incorrectly as being sample one. It doesn’t have to be
reciprocal like this. It could be unidirectional. But it could go in both
directions, which is what we’re trying to convey here. So why is this a problem? Well, of course,
if you’re looking for a very low frequency
somatic variations, having one or two reads from
the wrong sample could look like a positive for a
rare allele in the sample you’re looking at. And that’ll be a false positive,
and so that would mislead you as to what was happening. Similarly with ancient DNA
research or viral detection, getting a single sequence
may cause misinterpretation. If you’re looking
at gene expression, you might see it as bleed over
from one sample to another. And that would, of course,
change the apparent expression levels in your data. And then, again, with
microbial profiling, you might expect to see
different populations in the samples, even though
that is not the truth. So this has been well
described as being an issue. And several people
have been working towards solutions for this. So where can this happen? It’s very often
easy to sort of put the blame in one
place or another. But in actual fact, almost
every step of the workflow is sensitive to this. So if you get your
adapters contaminated, which can happen quite
easily in the lab if you have any aspiration,
any droplets moving around– or during multiplex
capture you will actually get what’s called index hopping,
which is a result of strand hopping in PCR. And then on the
sequencer itself you can also get index hopping in
some machines to some extent, as well as potentially
misreading or simple carryover. And then during
your de-multiplexing it is possible to get index
missed assignment based on the sequences. Though, as we’ll focus on, the
barcodes we’re looking at today are not really sensitive
to this particular step. So traditionally
the way adapters have been sort of
build up in order to make, say, 96
different adapters would be to take 12 i7
indices and spread those across a plate in columns
and the i5 indices and spread those across
a plate in the rows. And so each individual well
has a unique combination, and so you can identify these. And so if we would
go down this column, you can see that we
have on the i5 side an orange barcode, a blue
barcode, pink, and black. But you’ll also
notice on i7 we have the same barcode every time. If we go across
a row, now the i5 is the same, although
the i7s are different. So these are all unique, and you
can identify different samples. But as we discussed
already, it could be possible to get a
misassignment from this. So here’s an example
of that misassignment. So in this example,
we’re imagining that a little bit of adapter
two got into adapter one, so a unidirectional
cross-contamination event. And for the simplicity of
math, we’re imagining it at 1%. So now when we look at
our sample one data, we would imagine that we would
get 1% misassignment to sample two in our experiment. So there is a solution for this. And that is these
unique dual indexes. So in this case, now we
have a P5 or an i5 barcode and an i7 barcode that go
together and are not repeated anywhere in our other samples. So now we have, if you
will, blue barcodes on both sides for sample
one and green barcodes on both sides for sample two. If we imagine the same
adapter contamination events, whereby 1% contamination from
adapter two into adapter one, we can now get four
different events. Most of our sample, of
course, will be as expected. But 1% will read with an i7
that is the green but with an i5 that is the blue. So this is an
unexpected combination. We know it shouldn’t
exist because we should be getting either
blue-blue or green-green. And so we can bioinformatically
filter this away. Similarly, if the i5 is
green and the i7 is blue, we can filter this out. The only risk we really
have for misidentification is getting the green
adapter on both ends. And this is the product of each
event, so 1% of the time here, 1% of the time here. So we’d only see
0.01% misalignments. So as you can see, we’ve
protected ourselves dramatically from any kind
of misidentification events. So it’s quite a powerful tool. It’s also important to remember
that this kind of event can occur on any of the
sequencing platforms. So here’s an example of
how these barcodes can mitigate index hopping
during multiplex capture. So what we have here are 16
different index combinations, so 16 different adapter sets. And here are the
expected numbers here. So in this experiment,
we took each library, and then we captured
it individually, mixed the libraries together,
and ran them on the sequencer. And so this is the data set. You can see any discoloration,
based on this scale, represents an unexpected
read in that sample. And so you can see a little bit
of lab contamination on here, but most of our
sample is as expected. If we take the same
libraries and now pool them in groups of four,
do the capture, and then run the 16
samples on the sequencer, you can see a very
distinct pattern, which is a four-by-four
grid, where we can see higher levels
of unexpected combinations. And they map directly to the
other adapters in the pool. This trend is continued if we
do the same libraries again, this time in an
eight-plex capture. And you can see we get
this eight-by-eight grid. And finally, of course, if
you do a 16-plex capture, the whole grid
lights up, as we’re seeing index hopping
between every combination. This also shows you that it’s
not really a selected thing. This is a random event. And so on average any
event can occur equally. You can see the
background from the lab throughout this experiment. So this potentially misreads
in a combinatorial experiments. But we were able to
filter all of this out and focus our
investigation on these particular combinations
that were expected. And thus we
protected our sample. So another place
that this can occur is on the flow cell
itself, on the sequencer. Now, this is data that
was provided by Illumina in one of the tech notes. And as you can see, they’ve
been measuring this index hopping event on non-patterned
flow cells and pattered flow cells. And if you look at
the numbers, they’re actually quite small and so on. The non-patterned flow cell
we get very low levels. On the patterned flow cell
we do get higher levels. And notice that the PCR-free
Library Prep in both cases seems to be a little
more susceptible to this kind of thing. The general consensus
is that this increased level in the
patterned flow cell is most likely due to the ExAmp
step, which is an exclusion amplification that is carried
out on the patterned flow cell, and that seems to be indicative
of this kind of event. So we tested PCR-free libraries
with our barcode sets, with our adapters. And as you can see, once
again, we get a lab data here. But we’re able to
protect our samples and filter out unexpected reads,
thus ensuring that we don’t have sample misassignments. So when we consider these
unique dual index adapters, you can see that
we’re able to mitigate and filter out and
protect our samples if we have adaptive contamination. During the multiplex
enrichment where we probably have some sort of mispriming
events going across between partial
extended products, we can filter out those. On the sequencer, if there’s an
event during cluster formation, we get chimera barcode sets,
and we can build to those out. And as I mentioned before,
during the de-multiplexing we can filter those out too. So clearly this unique
dual index approach is not unique to IDT. In fact, Illumina has recently
released an initial set of 24 UDI barcodes. And in these
announcements you can see that we are actually
the official manufacturer of these Illumina
barcoded adapters. And so this is
another system which utilizes the same approach
in order to protect samples during these experiments. We had already started the
work on our UMI adapter previous to this, and so
we’re continuing with that. So we’ve talked about how the
genotyping is straightforward. We’ve shown how the unique dual
indexes are able to protect us in more sensitive applications. So now let’s take a look
at these rare variants. And we’re going to engage
the use of the UMI. So why is this? Well, probably one of the more
common and rapidly growing areas of interest is in
circulating cell-free DNA as you can use that as the
so-called liquid biopsy to sample blood and then look
for signatures of cancer that could be anywhere in the body. But we have fragmented
of DNA representing that inside the blood. Of course, there’s going to
be regular cell DNA in here as well. And so we’re going
to have rare alleles that we’re trying to pull out. This kind of approach
is less invasive. And so already physicians
are very interested, because often going after
some of these tumors is difficult and
uncomfortable for patients. Whereas taking a
quick blood sample is much easier to get through. It also perhaps allows to
look at tumor heterozygousity so that we can understand
the tumor and maybe changes within the tumor
at an earlier stage. So in order to do
this, you have to be able to increase the
sensitivity of your assay and also avoid error,
such as false positives. You don’t want errors
in the sequencing. So probably the least
sensitive, by most accounts, is using an amplicon
with no UMI. This is somewhat difficult
to get very sensitive on, because all of your
start-stop sites for fragments are identical. And so identifying what is
a false positive, shown here in the red, versus a true
positive in the green is a little bit
more challenging. If we use standard shearing
libraries with no UMIs, we are able to engage sort of
the randomness of the shearing to make unique start
and stop sites. And this allows us to
identify more unique molecules and gives us a little
bit more sensitivity as to what is a
duplicate and what is actually a different event. We can go a little deeper
still using UMIs now. And so the UMIs allow you to
count individual molecules that would look from the
shearing pattern as if they were the
same but actually are different ligation events. So we can dig a
little deeper here. And then ultimately the
most sensitive technique is called duplex sequencing. And in this system
we have a UMI set, but it’s matched on
our adapters, such that we can bring the
top strand and bottom strand of our original
fragment back together and look for consensus
across both strands. And this allows us to
get even more sensitive to even a single strand damage. So this is technically
challenging. And really what we’re going
to be focusing on today is this level, which is a great
improvement on the previous and allows you to go quite
deep into rare alleles. So how does consensus help us? Well, if you imagine that we
have this original fragment and it has a true positive
mutation right here in the green, if we
look at our reads, you can see we have these
groups of fragments that start and stop in the same locations. And we have a bunch of
the green and red dots. So we’re identifying
changes in the DNA. But you can see that we’re
getting some false positives. So it looks like we have
some different alleles here. When we use the UMIs, we can
now tag these as different molecules. So we see them as
being different. These look like the same
molecule and some duplicates here. But these look like
unique molecules. Whereas before they
looked the same. And so now we’re able
to look a little deeper. We see more unique stuff. And we can maybe get a
little more sensitive. But if we use a
consensus sequence, now we’re grouping
these together based on their start, stop, and UMI. But we’re also looking
for the consensus between these sequences. So we have to have
a minimum grouping of three separate events. So you can see we
have three here. We only have one event
here, so we’re not going to be able to use that. So we’re going to
filter that one away. Here we have three again. And now we’re going
to look for what’s consistent across those samples. So we remove those
sequences that do not have at least three copies. And then we build a
consensus where we do. And you can see that we
remove false positives. Because here we still
have a false positive, but the rate of
false positives here is significantly less
than the previous cases. So this allows us to add
in some error correction. So it’s all very well saying
that that should happen. But can we can we
demonstrate it? So here we have an example
of an experiment where we’ve set up known truths. So we have some
[INAUDIBLE] samples. So these are very
well-characterized samples that we know the alleles. And then we imagine that
this sample, the NA24385, was our tumor and the NA12878
is our sort of normal tissue. And so we identified some
snips which were homozygously slight difference in
our tumor and some that were heterozygously
different in our tumor. So even if we mixed
them 50/50, we would immediately
get some alleles with allele frequency
of 50% and some at 25%. So now you can actually
titrate your mixes together to generate any kind of
allele frequency you wish. And so what we decided to do
was try to do some low input. So here we’re using only
25 nanograms of inputs. And we used 1% of our tumor
in 99% of our normal tissue. And we were looking for
something with a minimum allele frequency of 0.5%. So now we know
what the truth is, and we can start to measure
how sensitive we are and what our positive
predictive value is. This is kind of a measure of
how many true positives do you see versus how many
false positives do you see. and also how many
false negatives. So we used our xGen
Lockdown Probes to capture 288 common snips. And then we used our variant
caller called VarDict. So this is a publicly
available variant caller. So here we’re going
to look at the data. And in each case, we’re
going to look at the data without considering the UMI. And then we’re going to look at
the same data applying the UMI and looking for consensus reads. So we should be able
to see if there are any advantages to this system. We set the variant
call at a 0.2%. Remember, although we’re
mixing those samples together and we should get very
predictable amounts, there will be some
randomness in the reads. And so it is possible that
you will get not exactly 0.5%, but maybe some of these alleles
might appear at 0.3% or 0.7%. And so really what
we’re trying to do here is calculate can we find them. So when we look for our
expected positive mutations, we can see that
without the UMI we get 98.3%, which is
pretty sensitive. The consensus is a little lower. And I imagine this
is where we just sampled not quite deep
enough to collect all of those different mutations. And so we were right below the
threshold of [? calling. ?] So if we look at
the real numbers, then, out of the 291
mutations expected, we got 286 without the
UMI, and we lost 2, to get 284 when we
did the consensus. So there were five
false negatives in this and seven
false negatives here. So at the moment you’re
probably thinking, well, this isn’t much
of an improvement. Why is this helpful to me? Well, the real aid comes in
your positive predictive value. And what this represents now
is how reliable was that call? How often is a positive that
I see really a positive? And now you can see quite
a significant difference. Without the use of UMIs,
the PPV is only 69.6%. Whereas with the
consensus calling, it goes all the way up to 98.6%.
% And when we look at why that is, you can see a very
obvious feature here. In this case, without
the UMIs, we’re detecting 136 false positives. And so whilst we’re
very sensitive to the true positives,
we’re getting a lot of noise in our system. By examining these with the UMI
and getting consensus calls, we reduce that false positive
down to four instances. And this is probably
because one strand was damaged very
early on, and so it was in all of our replicants– replicant reads, I should say. So you can see here that this
UMI with consensus calling allows you to have great
sensitivity and a very high PPV value in this situation,
which increases your ability to trust your data. So we took this data. We wanted to look
at it another way to sort of get a sense of
how these two systems were different. So here we have the percentage
of the positives that you’re detecting– sorry, that
sensitivity percentage, or the PPV percentage. And this is where we
set the variant calling. So if we set it– as we
increase the threshold of variant calling, of course,
a number of those variants are going to disappear because
they were at low frequency. And so our sensitivity
drops off, as expected. What’s interesting or
important to note, of course, that because we’re doing that,
when we have a low threshold, we pick up more of
those false positives. And so our predictive
values are very low. As we increase the
threshold, that rules out some of that noise. And so our predictive power
increases, as you might expect. When we look at our
consensus calling, you can see the sensitivity
follows a roughly similar data set. And I should say that this
is all the same experiments. We’re just resampling it
with the variant caller. So these are real
data points, which is why they’re a bit wobbly. They’re actually real
for this experiment. What you’ll notice,
although the sensitivity looks to be about
the same, you can see there’s a quite a strong
difference in the PPV. We’re able to maintain that
sensitivity to much rarer thresholds– whoops, sorry– allowing us to make rarer calls
with much higher confidence. Obviously, at some
point, when you get your variant
calling low enough, you’re going to start to
pick up a lot more noise. And so understanding
whether where this break point
is very important and probably based
on your sample. Here we did have limited input. And so these kinds
of allele frequencies are probably errors anyway. So one of the ways this
is actually working is that the consensus
calling is able to remove some oxidative error events. And so C to A errors often
due to the oxidation of G’s. And so this is
well-characterized. And so you can see
here that we see that’s the most
common event that causes these false
positives in our sample. And this is oxidative damage
either to the DNA before we start the experiments,
or along the way maybe one of the G’s
is being oxidized and causes this
kind of signature. After amplification
we see it mutate. By using this consensus calling,
because those events probably happen randomly
throughout the process, we have other
molecules which are showing that they
weren’t damaged, because it’s a random event. And so now the
consensus means we can remove those false events. So without the
molecular barcoding, it’s difficult to distinguish
between true and false positives at frequencies
below roughly 1%, right? You can drive it down
if you try hard enough, but it does get technically
very challenging. But using the UMIs
to build consensus, we could variant call down
to 0.5% with good sensitivity and PPV using only a
small amount of input. So this increased the
sensitivity quite a lot. So the number of false
positives dropped about 30-fold when we use these. And it allowed us to
increase our sensitivity, with the PPV increasing to 92%. And this was probably
due to removing a lot of these oxidative errors. So hopefully that’s kind of
intriguing and interesting to you. And so maybe you want
to get your hands on some of these adapters. And so you can order
them through our website. So we start, if you to go to
the Products and Services menu, this pops up. And then down here in our Next
Generation Sequencing list, we have our Dual Index
UMIs Tech Access. So if you click on there,
you go to the ordering page. So at this point,
I should probably indicate these are what
we call tech access. So this is not a fully
validated product in the sense that we have stock available. Right now, we’re
making these to order. So everybody gets
a custom synthesis, which is hence the tech access. We will stock these
adapters in the future. So to order, on
the Ordering tab, if you know that you just
want to get these adapters, you can contact
our custom quotes group, who have the
sequencers ready to go. And they can answer your order. All you really need to tell them
is how many adapters you want. We standardly
provide two nanomoles per duplex, which
for standard DNA prep is about 25
libraries per duplex. We can make these because
it’s custom synthesis. We can make them on
highest scales if you wish, and you just probably want
to indicate in the email how much you need. If you have questions
about this or you need some help with
designs, you can contact application support. That’s the group I’m in. And we’d be happy to help
you with any questions about this system. If you look on the Support
tab, we have some things ready for you to utilize. So we have some
product sheets here to show you the
barcode sequences. We have some user guides. We’re not able to provide
full white-glove bioinformatic support. But all of the tools that
we’ve used and looked at are open source. And so these files will show
you those open source methods that we utilized to build
your analysis pipeline. We find that a
lot of people have their own pipelines or their
own way of doing things. So this just gives you some
starting point if you need it. There’s also a webinar
by my colleague Kristina Giorda, which
talks a little bit more about those dual indexes to
mitigate the index hopping. And we have some
posters down here on the use of
these UMI adapters. We just presented at AMP
a couple of weeks ago, and we should probably
have a new poster here in the near future. So if you want to take
a look these barcodes, we want to provide those up
front so you can take a look and sort of be happy about them. So we have an Excel file
that you can download. And it shows you the adapter
number and the barcodes. You can see here on the i7,
here is the sample barcode. And then the nine
based random one, you just enter that into
your sample sheet or analysis pipeline. We have the i5 index
in both formats, depending on which
machine you’re using. The index in the adapter is
the same, but what happens is that these machines read
the sequence on one strand. These machines read the
sequence on the opposite strand. And so just for
convenience, we have these all listed out
to make it easier to get it into your sample sheet
or your bioinformatic pathway. So in summary, the
IDT UMI adapter is a three-in-one
design that can be utilized for several
different inquiries, experiments of
different sensitivities. The dual indexing resolves
this issue of index hopping, and it enables the
low-frequency variant detection. We have 384 pre-designed
unique index sets. Each barcode has an edit
distance of three or greater to all the others for the
same side of the adapter. They’re all color
balanced in sets of four. We’ve designed them to
be compatible with two- and four-color sequences. GC content is 50%. So bioinformatically, we’ve
screened these as best we can. They’ve also been
deployed for quite a while now in other situations and
utilized on multiple sequencing platforms successfully. So these are
well-described barcodes. The adapters are made to order. And you can submit your
order request directly to [email protected] Finally, if you
have any questions about these, or actually
any custom adapters, we’d be happy to help you. And you can just contact us at
[email protected] And so with that, I’d like to
thank you for listening in. Hopefully this was useful. I think there’s a few questions. And so I’m going to ask
Dave to step in here. And we’ll try and answer
as many of these as we can. Thank you, Nick. This is Dave Kupec, the
product manager for IDT and our NGS products. Great information, Nick. And for people on the call,
if you have a question and have not done so
already, please type it into the questions box
located on the right-hand side of your screen. There are a few
questions for you, Nick. Here’s the first one. How did you determine
a minimum of three reads as the cutoff for
acceptance of reads? Oh, that’s a good one. So I think it was somewhat
arbitrary, in the sense that two wasn’t quite enough. But what we also did
was we did consider this on different numbers. So we said, what happens with 1,
which it obviously isn’t really a consensus, 2, 3, 4, 5, 6, 7. And this work has been done
on other UMI approaches too. And so what you do
is as you increase that requirement for
what’s called the family size, the number of sequences
for consensus, you, of course, decrease the number of times
you actually get enough reads. So that reduces your
sensitivity in that respect. And so then you plot
that sensitivity against the number of reads. And what you find is
that around three it’s enough to remain very
sensitive but not so many that you actually lose a
bunch of your sensitivity. So it’s kind of a balance
between those two. And we just did it
bioinformatically, resampling, and looking
at what was best. OK. Thank you for that. A follow-on question
would be, how did you determine false positives? Have you looked
at physicians that were meant to be non-mutated to
see what the rate of mutation is during library prep process? Ah-ha, woo. Yeah, so that’s
almost two parts. So the first part is that we– as I mentioned, the DNA that
we used is the [INAUDIBLE] standardized DNA. And so it is taken as
being very consistent, and the sequence
is well described. And so I suppose it is possible
that a few variants had crept into that cell line, but
that would be very unusual, and I think it
would be picked up. So we sort of know
the true positives because we’re using those
standardized DNA samples. And so we know what truth is. That’s why we chose this method. As far as going back
and looking through, I was actually talking to one
of our research scientists yesterday about that very
thing to just rule out the possibility that we do
have a little heterogeneity in the sample. And so for a few other
experiments we’re doing, we’re going to go back. But inherently we
believe that the sequence is well-known, consistent. And so those false
positives truly are false. I think that answers
the question. Yes, thank you. I think it does as well. And to those who
do pose a question, please feel free to
follow up in the chat box with any
follow-on questions, should you want more clarity. A Next question would be,
how many UMIs are available? Is it possible to multiplex up
to 96 samples using IDT’s UMIs? So we have 384 barcoded sets. The UMI itself is always nine
nucleotides of a randomer. So it is possible to
multiplex to high levels. However, usually the limiting
factor there is your input. We don’t recommend multiplexing
96 samples in a single capture, mostly because you lose
sensitivity, which, of course, is key to these experiments. So I wouldn’t recommend that. I would break up. If you had a large
capacity sequencer and you wanted to
run 96 samples, I think that’s
totally reasonable. As I said, we have
up to 384 barcodes. But what I would suggest is
that you keep those multiplex captures at a much lower level. Generally, for
general capture, we recommend up to about 12 plex. For these, obviously
the more of the samples you have in that capture, as
you saw in the earlier slides, the more of these
hopping events occur, and you might start to use
up some of your sequencing with those filtered reads. So I wouldn’t go too high
with the multiplexing. We haven’t really tested the
impact of going to a 16 plex with this kind of thing
and seeing how hits the [INAUDIBLE]—-
the 12 plex, sorry. I was thinking of the
experimental 16 indices. We haven’t really
tested how that really affects sensitivity. So we wouldn’t recommend
going very high with that. Sorry, long answer. Thanks, Nick. Another question
coming through the chat is related to how we
manufacture our adapters. The question is, do you
perform QC on the UDI UMI to be sure adapters are
not cross-contaminated during production? At this time, we
are not doing that. These are custom
synthesis events. And so to keep the
cost reasonable, we’re not performing
functional QC on those. We don’t see consistent
contamination events. Obviously, we’ve been making
oligos for a long time. And we treat this
very seriously, and we’ve mitigated risk along
the way as much as we can. So the risk is relatively low. It’s not zero, but we
do not do functional QC on these at this time. OK, great. Moving on to the
next question, I believe it’s about
optional design strategies. The question is, do you
make dual UMI adapters? Well, yes. So there’s two
questions to that. So the first answer
is yes, if you have a custom sequence
you’d like us to make or you’d like to
discuss designing, we can, of course, do that. We’re developing multiple
products along the way there. I’m not quite sure
what dual UMI means, so I don’t know
where it fits in. Some people think that you could
put a you a my on the i5 side as well. But that actually doesn’t
increase the sensitivity very much because the UMIs
are still strand specific. The way to increase
your sensitivity is go to that duplex
sequencing which I mentioned, and I think that is an area
of great interest to us. Fantastic. Thanks, Nick. And thank you to all for posing
these wonderful questions. The next one was about the
library construction method you used in the tumor
model mixture experiment. So which kit did we use? Yes. I think in that particular– oh, actually what did
we use in that one? You’ve caught me out there. I’m not sure. I can say that we’ve done that
kind of experiment multiple times. That was just one example. And we have used a
different library prep kits. These adapters are
compatible with everything we’ve tried so far. And that includes
the enzyme kits that come from NEB
and Kapa and Illumina. They all work well for us. So we can make nice libraries. So I don’t remember
exactly which one this one was, this particular
experiment was [INAUDIBLE].. And if anyone would
like more information, you can always contact
us after the webinars for follow-up questions. Next question would be, I
think it’s related to duplex sequencing, it looks like. What’s the sequencing coverage
for duplex sequencing? The coverage depth, so there
isn’t really a set amount. With the duplex
sequencing, people do tend to be able to go deeper. So they do tend to try to get
much deeper coverage depths. For these experiments
with the single UMIs, we’re looking at sort of
2,000 to 3,000x coverage, for the most part, I think,
the experiments I’ve seen with duplex sequencing. They can use that level. And because you’re
doing the consensus, you can reduce your error. But I also see some people
go even deeper than that. But that’s kind of
the range maybe. I’d hate to say a given number,
but maybe 2,000 to 5,000x coverage, though I
have seen up to 10. Thanks, Nick. The next question comes
in, how do you actually pipette the indexes? Do they already come in
pairs in a 96-well format? And I will follow
that up with, Nick, if you have any best
practices or tips and tricks you could share to help
prevent cross-talk, I’m sure that would
be helpful as well. For sure. OK, yeah, that’s great. I didn’t actually mention
the delivery format. So the way we’re making these
and delivering them right now in the customer phase is
that– well, I mentioned the yield was to nanomoles. And we’re resuspending that
at 15 micromolar, which is kind of a common adapter
concentration in those kits that I mentioned. You may be using a
particular application where you need to change
that concentration. But usually that’s downward
so it shouldn’t be a problem. We deliver them in
matrix screw-cap plates. So each well, if you will, is
actually an individual tube with its own screw cap. And so you can isolate them. Once you’ve receive that plate,
you can keep things isolated. To Dave’s point, upon receipt,
I would spin them down to make sure that they
remain at the bottom and you don’t get any
drips or drops on the lids. And then, if at all
possible, I would actually open each tube
individually, pipette out the volume required
to be a library prep, and then close that lid. And whilst that is very time
consuming, especially if you’re trying to get across
a 96-well plate, that really will isolate
things and mitigate the risk of any kind
of aspirate hopping between those wells, which is
really what we want to avoid. I’d also strongly
recommend using filter tips on your pipettes to mitigate
any contamination across on the pipette. Let’s see, what else? So the adapters have already
been duplexed for you. So we make the two
individual oligos. We bring them together
in a duplexing reaction so that they form that
sort of so-called Y shape. And then we dilute them to
the 15 micromolar for you in the tube. We ship it on dry ice
so they’re frozen. So they should be protected. I would recommend
thawing gently. So don’t use a warm
bath to quickly thaw– preferably in a
fridge or on ice. Want to avoid repeated
freeze-through events. Just generally
good lab practice. But they look to be very
stable in those conditions. That’s very helpful. Thank you, Nick. A next question is
regarding the use of UMIsa and how they reference
PCR duplicates. The question is, wouldn’t having
sequences with the same UMI mean that they’re
PCR optical duplexes for the same barcode
combinations? Yes. I think what you’re saying is,
aren’t they just duplicate? And that’s kind of what we’re–
sorry, I’m just going back. We’re actually
utilizing the fact that they are duplicate
in this consensus. So we’re actively trying
to find duplicate here. In this case– oops, I’ve
already gone past it. But in this case,
the duplicates, we would just remove by
normal bioinformatic. You just remove all
your duplicates, and you take the one
with the best read score. And so here you might
have a false positive, and you might count that because
it just randomly happened to have the best read score. When we look at
those duplicates, we can now see that this
event, this false positive, must have occurred
during the amplification. And thus most of the
molecules don’t have it. And so now we can
filter that out by looking at the consensus. I think that was the question. So I think that’s the answer. OK. Thanks again, Nick. A following question is related
to alternative data analysis methods. The question’s, we have been
using a cut-off model that is based on the highest
number of reads per UMI to determine the minimum
number of UMIs required to create a consensus sequence. Have you looked into
similar strategies? OK, that’s a little
too deep, I’m afraid, in the bioinformatics. Dave, can you note that one? We should probably talk
to our bioinformaticians and see if they’ve done that. I just don’t know the answer. Absolutely, will do. Next question
coming in is, can we mix dual TruSeq eight-base pair
indexes with the TruSeq UDI indexes in the same MiSeq run? Yes. So that gets to this
idea of edit distance. And so one should
always do that check. And we can do that for you
for specific sequences. If memory serves– oh, and so
I should also say that when you’re building barcodes– this
is kind of a sidebar on barcode making– you usually start with
an initial sequence and then build off that to
look at sequences which have the appropriate edit distance. And so if you start with
a different sequence at the beginning
of your building, you’ll end up with
a different family. And whilst within
those families they may have strong edit distances,
if you compare across families, you almost always get
low edit distance events. And so when we compare our
barcodes, generally speaking, to other companies’
barcodes, we do find there are some,
what are called, clashes. And by a clash I mean
that the barcodes have less than an
edit distance three. It’s rare that they are
zero so they’re identical. And most of the time they
still maintain an edit distance of two. So arguably, by some
people’s metrics, that would still be OK. It still would take two
errors in reading that barcode to make the mistake
of converting it. If you imagine that
you’d have to have two hours at both
ends that would match, you’re still pretty
well protected. So that’s all to
caveat the response that it is likely
that you would be able to mix these different
sets on the same MiSeq run. But it will end up
being very specific to the particular barcodes
of interest, or barcode sets. So it’s kind of a
half answer, sorry. Very helpful, Nick. Thank you. A follow up to that type
of question would be, how do you actually
choose to use the UMI? You mentioned it can be used
as a three-in-one adapter. Yes. So really it’s just
looking at the sensitivity. Let’s see if I can go back. It’s probably way back
in the presentation. So it probably just
depends on exactly where you feel you’re sitting on– whoa. Oh, I must have gone past it– on that sensitivity chart. So if you’re looking
for allele frequencies in around the 1%
region, you probably don’t need to worry
too much, probably the standard techniques. Using the dual index,
dual unique index is probably very helpful. Gosh, I can’t find my
slide all of a sudden. Ah, here we go. So where you feel
you are on this chart is probably important. A lot of people are
sitting kind of actually right around here
for standard use. But more and more, as with
these liquid biopsies, people are moving
into this zone. So I would say if
you’re going below 1%, you probably want
to consider UMIs. You can also use that adapter,
right, and only deploy the UMI in samples that you need it to. Great. Next question is, how do you
use the UMIs to help eliminate contamination on the run? So ruling out any
contamination post the addition of the
adapters, we know that the dual index with no
overlapping indexes help. But does this increase the
detection of contamination? So I think the question is from
a diagnostic point of view. I know we can’t eliminate
all contamination. But can we eliminate anything
after the addition of adapters? Well, if they–
hmm, I want to make sure I understand the question. Because if the DNA, if
some random DNA came in after the adapter
had been added, then it wouldn’t
have adaptors in it. It would just got lost. If you meant it was– but then if it was
another library, it would have
different barcodes. So I think the answer, if I’m
understanding the question correctly, I think the answer
is because it has the adapters, any adapters with
dual unique indices would protect you from
that event, I think. OK, good. Next question, how many
combinations of UMI are possible for
a single adapter? Yeah. So what we’ve set up is we’ve
used nine random bases, so nine ends. So that’s four to
the power of nine different theoretical
combinations. Now, some of those might not
occur at regular frequency, because the pattern of bases,
like a polyG run might fold, and you wouldn’t
see it in your run. But theoretically that
gives you over a quarter million different sequences that
would exist on that adapter. That’s a lot. Sure is. Thanks. Next question,
let’s see, oh, it’s regarding amplicon sequencing. What about amplicon sequencing
with IDT UMI oligos, instead of the
ligation-based method? And regarding the slide
20, where would amplicons with UMIs fit on that scale? Yeah, that’s a good question. It’s a tough one. And the reason it’s
a little tough, you got to be careful
how you define that. That’s not to dodge the issue. But there are different
ways of doing this. So what you would want to do
with your amplicon sequencing, you’d want that UMI on your
target-specific primer, and then perhaps a
stub that you could do a secondary PCR on with
the full-length adapter. But what that means
is for every cycle you’re potentially
putting a new UMI on a copy of an
original molecule. So if you had one
molecule, if you did one cycle of
amplification, of course, you’d run across
it once, and you’d have one UMI to
that one molecule. If you do two cycles,
you’ll run that and its second UMI-containing
primer across the same target. So now I have two
UMIs that are linked to the same original
molecule, right? So they look like
two different events, but they’re actually
from the same molecule. And so what you
end up there is you want to limit those cycles in
order to minimize that effect, and/or bioinformatically
you have to build that out at the other side. So what would happen,
I would imagine, if you kept it limited, you
would probably jump down below standard shearing. But I don’t think that you
would achieve the ligation UMI level because you would run
into that same problem of having multiple molecules with
the same start-stop. It’s kind of the opposite way
around, with different UMIs but actually the same
original molecule. So I think it sits in here if
you limit the initial rounds of amplification. Great. Thanks, Nick. The next question
is, where can people go to find more open source
bioinformatics tools? Yeah, so this is kind of an
area that a lot of people are talking about. And so I’ve seen a lot
of information coming out of various blogs from different
bioinformatics groups. There are papers out
there now that describe these different techniques. And so you might look at those. And then everything that we
source comes out of GitHub. And so you could
probably actually just peruse GitHub and look
at some of the descriptions and probably find your
way through there. But there’s multiple
ways you can get to those different resources. And those are kind
of an outline of how to maybe think about that. Great. Another good place to
start is we actually have analysis guidelines
on the Product page. At least that can point
you in the right direction. Yes, sorry. I’d already said that. So I didn’t repeat
myself, but you’re right. No problem. The next question coming
in is, how can we use– or, excuse me. Can we use these adapters
not just for ligation but as primers in PCR if we have
compatible Illumina overhang fragment to be amplified? Yeah, that kind of gets to the
previous question about the use of UMIs in a PCR situation. So these primers wouldn’t–
they’re duplexed. So as they would be shipped,
they would be not useful to you in that context, I don’t think. I think that you could do that. But as I said in
the previous answer, if you do it after you’ve
already done your initial PCR, you’ve already made
a bunch of duplicates of your original molecules. And that’s going to
obscure your analysis. So putting these
UMIs on as you’re putting on the
full-length adapter after you’ve already
amplified your target space is probably not going to help
you with regard to sensitivity, or at least not very much. You really want to
put it, I think, on the target-specific
primers and then do very low cycle amplification. Wonderful Next
question, moving on, is, I’d like to confirm one
thing for the dual index adapters. Are the index built
into the adapters, or are they inserted
by unique multiplex oligos similar to NEB’s kits? Absolutely not in that way. They are built in. We make the full-length
oligo from the beginning. So it is complete. The amplification you do is
just with the so-called P5, P7 primers, which are
about 20 nucleotides long. The barcodes are
already built in. And as we’ve been discussing
with this sort of amplicon questions, it’s
essential to have those part of that
initial ligation event in order to get the sensitivity
that we’re trying to drive for. Great And I believe this is
the final technical question of the day, right on time. This UMI can minimize
barcode hopping. But do you think
there is a chance that contamination
between such UMIs can still happen
during manufacturing? So, yeah, so we
should also be clear that when we talk
about this, we’re talking about mitigation
of the barcode hopping. The barcode hopping
will always occur. That’s just a fundamental
thermodynamic process during amplification. What these do is they allow
you to filter that data out. And so, as we indicated at the
beginning of the talk, I think, the advantage is with those
examples where I talked about contamination, right? So we imagine here,
this is actually– we’re modeling a full
adapter contamination event. So we’re actually putting
some of this adapter in here. And it still allows
you to filter that out. If it was just a
single oligo, imagine that the P5 oligo
with the green barcode got in here during
manufacturing. It actually would only
result in this event. And so it would never be
a danger of misassignment. So single oligo
cross-contamination during manufacture, should it
occur, would be of no real risk to misassignment
but would obviously require some filtering. So if I understand the
question correctly, I think the answer is
that even contamination during manufacture
is protected against, using this barcoding system. Great. And actually there’s
one last question that I think we have time for. And it’s regarding the
sample sheet you showed. It’s– Oh, yeah. I’ll go back the other way. It was a question
of, why are there two barcode lists for the i5? Yeah, so this does cause
confusion on the sheets. And I wish I knew
a sort of a better way of describing it on here. I kind of couldn’t
come up with something. So effectively these
machines read the barcode as an extension of the
copying over of the fragments. So on the machine,
you have strands of DNA anchored to the flow
cell via oligos that were arrayed on there by Illumina. And so you end up with this
sort of field of molecules. And of course they’re
single stranded. So in order to sequencing, you
can only extend your primer five prime to three prime. So after you’ve finished
reading in one direction, you have to sort of copy
that strand over so you can read the other direction. The way these machines
work is that when you’re doing that
copying event, you have a P5 primer on the flow cell. And as you’re copying,
you do some dark cycles. And then, of course, you know
how long your adapter is. So when you get to
the barcode, you then read the barcode sequence
fluorometrically. And after those nucleotides
are read, you go back and you copy the
rest of the strand to make that
complementary strand. In this case, when you’re
making a complementary strand, you complete the synthesis. You don’t read anything
while you’re making this complementary strand. And then you flow in a primer
which reads off that sequence. So now you’re reading back
on the opposite strand again. And so you’re reading
the complementary strand in these machines
versus these machines. And that’s why. If you look at these, they’re
just complements each other. That’s why there are two lists. It’s is a
machine-specific event. Thanks, Nick. One final question, and then
we’ll close the questions. And it’s a tough one. So we’re ending strong. The question’s around can
you elaborate a bit more on the consensus
analysis in the case of an RNA-Seq library,
where one strand is eliminated during library prep? Meaning there should be
no need for the consensus, since it’s already a
double-stranded DNA to do the analysis. Yeah, so in that case,
the use of the UMI there is more for
molecule counting. And so as we just
discussed, there’s a lot of combinations
of the UMI available. And so for RNA-Seq you
probably wouldn’t be so worried about consensus,
though you could. You’d probably just be using the
UMI as a molecule counter since often people are using, say,
an oligo(dT) primer that would have a– you’d always be reading
the same part of the RNA. You can count them that way. However, it is also fair to say
that what you’re able to do, you wouldn’t be able
to detect an error during the RT or perhaps
second strand since this event. Though, actually, the
second strand does make that second copy,
and those are ligated. Each of those strands will
have its own UMI attached. And so you might be able
to, if you try hard enough, you might be able
to look at those. But really, the idea is that
once you’ve made that DNA, any damage that occurred during
amplification– and damage just means an error. Anything that happened during
amplification or a capture event or a read events, so
you’re actually eliminating read error on the machine
itself, all of those could be removed by
the consensus sequence, if you’re really concerned
about looking for snip events in the RNA itself. So it can do it. but
most people are just using as a molecule
counter with no consensus. Great. Thank you, Nick. And they you to all
who asked questions. This will conclude
our Q&A session. With that, I’ll turn the call
back over to the moderator. Thank you. That is all the time
we have for questions. Thank you, everybody, for
posing so many great questions. I want to thank all of you for
attending today’s presentation. This is one of a
series of webinars we be presenting on next
generation sequencing as well as other topics. We will email you about
these future webinars as they are scheduled. Also, as a reminder, a recording
of this webinar will be posted shortly on our website and
at youtube.com/idtdnabio. There you will find several
of several other educational webinars. And thank you again
for attending, and we wish you the best of
success in your research.

Leave a Reply

Your email address will not be published. Required fields are marked *