selectseq

A command-line utility to manipulate biological sequences from a FASTA or FASTQ file. It can, given a list of identifiers, get only a subset of the sequences (or their complement, i.e., sequences NOT in the list). Can also get sequence number N only. Compressed sequences files are supported if readable by zcat.

Features

collect only some sequences out of a large FASTA or FASTQ file
get sequence number N only, regardless of ID
complement mode: return all sequences that are NOT in the list of IDs
"matching" mode: choose which part (between | characters) of the ID should match
sequence names provided one per line in a text file (first word in line used, or whatever is given to the -k option)
the > and @ symbols are ignored if present in the beginning of IDs in the list (useful if using FASTA or FASTQ identifiers)
if only one sequence is needed, its ID can be given directly to the -l option (no need of a file)
add a suffix to IDs before searching (useful when IDs come from proteins that have _1 in the ID, but genes do not)
compressed sequence database files (-s) are supported
quite mode, output only important warnings and errors

Project Activity

See All Activity >

License

GNU General Public License version 3.0 (GPLv3)

Follow selectseq

selectseq Web Site

Other Useful Business Software

Easily Host LLMs and Web Apps on Cloud Run

Run everything from popular models with on-demand NVIDIA L4 GPUs to web apps without infrastructure management.

Run frontend and backend services, batch jobs, host LLMs, and queue processing workloads without the need to manage infrastructure. Cloud Run gives you on-demand GPU access for hosting LLMs and running real-time AI—with 5-second cold starts and automatic scale-to-zero so you only pay for actual usage. New customers get $300 in free credit to start.

Try Cloud Run Free

Rate This Project

User Reviews

Be the first to post a review of selectseq!

Additional Project Details

Intended Audience

Science/Research

User Interface

Command-line

Programming Language

Perl

Related Categories

Perl Bio-Informatics Software

Registered

2011-05-20

Similar Business Software

Galaxy

Galaxy is an open source, web-based platform for data-intensive biomedical research. If you are new to Galaxy start here or consult our help resources. You can install your own Galaxy by following the tutorial and choosing from thousands of tools from the tool shed. This instance of Galaxy is...

See Software
BioTuring Browser

Explore hundreds of curated single-cell transcriptome datasets, along with your own data, through interactive visualizations and analytics. The software also supports multimodal omics, CITE-seq, TCR-seq, and spatial transcriptomic. Interactively explore the world's largest single-cell expression...

See Software
Geneious

Geneious Prime makes bioinformatics accessible by transforming raw data into visualizations that make sequence analysis intuitive and user-friendly. Simple sequence assembly and easy editing of contigs. Automatic annotation for gene prediction, motifs, translation, and variant calling. Genotype...

See Software
Genome Analysis Toolkit (GATK)

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. The GATK...

See Software
MEGA

MEGA (Molecular Evolutionary Genetics Analysis) is a powerful and user-friendly software suite designed for analyzing DNA and protein sequence data from species and populations. It facilitates both automatic and manual sequence alignment, phylogenetic tree inference, and evolutionary hypothesis...

See Software
Illumina DRAGEN Secondary Analysis

The Illumina DRAGEN Secondary Analysis provides accurate, comprehensive, and efficient analysis of next-generation sequencing data. Graph reference genome and machine learning driving unprecedented accuracy. Provides ultra-efficient workflow; can fully process a 34x whole human genome in ~30...

See Software