selectseq

A command-line utility to manipulate biological sequences from a FASTA or FASTQ file. It can, given a list of identifiers, get only a subset of the sequences (or their complement, i.e., sequences NOT in the list). Can also get sequence number N only. Compressed sequences files are supported if readable by zcat.

Features

collect only some sequences out of a large FASTA or FASTQ file
get sequence number N only, regardless of ID
complement mode: return all sequences that are NOT in the list of IDs
"matching" mode: choose which part (between | characters) of the ID should match
sequence names provided one per line in a text file (first word in line used, or whatever is given to the -k option)
the > and @ symbols are ignored if present in the beginning of IDs in the list (useful if using FASTA or FASTQ identifiers)
if only one sequence is needed, its ID can be given directly to the -l option (no need of a file)
add a suffix to IDs before searching (useful when IDs come from proteins that have _1 in the ID, but genes do not)
compressed sequence database files (-s) are supported
quite mode, output only important warnings and errors

Project Activity

See All Activity >

License

GNU General Public License version 3.0 (GPLv3)

Follow selectseq

selectseq Web Site

Other Useful Business Software

Get Avast Free Antivirus with 24/7 AI-powered online scam detection

Get protection for today’s online threats. Free.

Award-winning antivirus protection, as well as protection against online scams, dangerous Wi-Fi connections, hacked accounts, and ransomware. It includes Avast Assistant, your built-in AI partner, which gives you help with suspicious online messages, offers, and more.

Free Download

Rate This Project

User Reviews

Be the first to post a review of selectseq!

Additional Project Details

Intended Audience

Science/Research

User Interface

Command-line

Programming Language

Perl

Related Categories

Perl Bio-Informatics Software

Registered

2011-05-20

Similar Business Software

BioTuring Browser

Explore hundreds of curated single-cell transcriptome datasets, along with your own data, through interactive visualizations and analytics. The software also supports multimodal omics, CITE-seq, TCR-seq, and spatial transcriptomic. Interactively explore the world's largest single-cell expression...

See Software
Geneyx

Geneyx Analysis is a comprehensive solution for next-generation sequencing (NGS) data that can scale the process of FASTQ to clinical reports for hospital and commercial labs. This advanced platform integrates machine learning and AI-based features to identify novel biomedical insights, while...

See Software
Galaxy

Galaxy is an open source, web-based platform for data-intensive biomedical research. If you are new to Galaxy start here or consult our help resources. You can install your own Galaxy by following the tutorial and choosing from thousands of tools from the tool shed. This instance of Galaxy is...

See Software