I'm a complete beginner on this software, so apologies if I'm asking the wrong question.
I'm trying to search a large number of samples for a primer sequence. Unfortunately, there are also a large number of N bases in most of these. How do I exclude N bases from searches?
Alternatively, could anyone suggest how I can search for primer sequences in samples using another command/software method?
Thanks.
Excluding N bases from searches?
-
- Site Admin
- Posts: 103
- Joined: Thu Aug 21, 2008 7:23 pm
Re: Excluding N bases from searches?
There is no way to exclude Ns when using "Search Sequence" in CodonCode Aligner, but there are a few other things you can try, depending on what you are working with.
1. If you have chromatograms (.ab1 or .scf files) for your sequences, you can use the "Remove ambiguities" function to convert Ns to bases. You'll find this in the "Edit" menu under "Change Bases". This will work for Ns in sequence traces where you have signal.
2. If you have traces or at least qualities (e.g. from FASTQ files), you can use the "Clip Ends" function in the "Sample" menu to remove the Ns and low-quality regions. Sequence traces from PCR products sometimes have 10s or 100s of Ns after the end of the PCR product which should be clipped before further analysis.
3. You can also import your primer sequences from text files, and then use the "Assemble" command in the "Contig" menu. When assembling, matches to bases will get higher scores to N matches, so primers will be put at the correct place. You will first need to change the "Assembly" preferences by reducing the minimum match length and score to less than the primer length. You may need to clip ends first to avoid sequences being assembled at N runs. If you are working with many primers, this may be the fastest way to map them.
If you are working with text sequences that have neither traces nor qualities, you will probably need to look for other tools to use. For many functions (like end clipping), CodonCode Aligner either requires qualities, or works better if sequences have qualities.
1. If you have chromatograms (.ab1 or .scf files) for your sequences, you can use the "Remove ambiguities" function to convert Ns to bases. You'll find this in the "Edit" menu under "Change Bases". This will work for Ns in sequence traces where you have signal.
2. If you have traces or at least qualities (e.g. from FASTQ files), you can use the "Clip Ends" function in the "Sample" menu to remove the Ns and low-quality regions. Sequence traces from PCR products sometimes have 10s or 100s of Ns after the end of the PCR product which should be clipped before further analysis.
3. You can also import your primer sequences from text files, and then use the "Assemble" command in the "Contig" menu. When assembling, matches to bases will get higher scores to N matches, so primers will be put at the correct place. You will first need to change the "Assembly" preferences by reducing the minimum match length and score to less than the primer length. You may need to clip ends first to avoid sequences being assembled at N runs. If you are working with many primers, this may be the fastest way to map them.
If you are working with text sequences that have neither traces nor qualities, you will probably need to look for other tools to use. For many functions (like end clipping), CodonCode Aligner either requires qualities, or works better if sequences have qualities.
Re: Excluding N bases from searches?
Hello Peter, thank-you very much for the comprehensive reply. I was working on .ap1 files, so your first suggestion worked wonderfully.