bioinformatics

4

Solved

Using Bio.SeqIO to write single-line FASTA

QIIME requests this (here) regarding the fasta files it receives as input: The file is a FASTA file, with sequences in the single line format. That is, sequences are not broken up into multiple li...

python python-2.7 bioinformatics biopython fasta

Cheryllches asked 11/6, 2014 at 7:1

12

Solved

Reverse complement of DNA strand using Python

I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the sa...

python list bioinformatics biopython dna-sequence

Gat asked 7/8, 2014 at 17:50

4

Solved

extract sequences from multifasta file by ID in file using awk

I would like to extract sequences from the multifasta file that match the IDs given by separate list of IDs. FASTA file seq.fasta: >7P58X:01332:11636 TTCAGCAAGCCGAGTCCTGCGTCGTTACTTCGCTT CAAGTC...

search awk bioinformatics multiline fasta

Kalin asked 9/4, 2018 at 11:1

5

Solved

Read FASTA into a dataframe and extract subsequences of FASTA file

I have a small fasta file of DNA sequences which looks like this: >NM_000016 700 200 234 ACATATTGGAGGCCGAAACAATGAGGCGTGATCAACTCAGTATATCAC >NM_000775 700 124 236 CTAACCTCTCCCAGTGTGGAACCTCTAT...

r subset bioinformatics fasta

Glossectomy asked 21/1, 2014 at 16:23

4

Solved

How to merge multiple files with two common columns, and name the added col as file name?

I'm trying to merge multiple .bed files by identifying the first two columns chr and start following this, Merging multiple files with two common columns, and replace the blank to 0 However, I'm wo...

awk bioinformatics bed

Dental asked 29/1 at 4:58

5

Solved

Reverse string in specific fields with condition

I have this file: m64071_220512_054244/12584899/ccs rev pet047-10055 ACGTGCGACCTTGTGA TTGAGGGTTCAAACGTGCGACCTTGTGA m64071_220512_054244/128321000/ccs rev pet047-10055 ACGTGCGACCTTGTGA TTGAGGGTTCAAA...

bash perl awk bioinformatics dna-sequence

Nerta asked 21/12, 2023 at 19:24

5

Solved

Split dataframe into multiple dataframes by grouping columns

I have a dataframe of expression data where gene are rows and columns are samples. I also have a dataframe containing metadata for each sample in the expression dataframe. In reality my expr datafr...

r dataframe dplyr bioinformatics

Nepotism asked 21/12, 2023 at 18:6

3

Solved

Error: ! Failed to collect lazy table. Caused by error in `db_collect()` - using biomaRt package in R

I'm currently working on a bioinformatics project using R, and I'm encountering an error when trying to use the biomaRt package. After installing the package and loading it into R, I tried to selec...

r bioinformatics dbplyr biomart

Coburg asked 26/10, 2023 at 22:45

5

"WinError 2 The system cannot find the file specified" when trying to run Fortran

I have a Fortran program and want to execute it in python for multiple files. I have 2000 input files but in my Fortran code I am able to run only one file at a time. How should I call the Fortran ...

python python-3.x bioinformatics f2py

Gael asked 3/3, 2017 at 6:58

2

Draw a colored sphere from cartesian coordinates in pymol

I was looking in the wiki how to convert the following information about beads, cartesian coordinates + energy : 23.4 54.6 12.3 -123.5 54.5 23.1 9.45 -56.7 ....... to a draw in pymol that contain...

python visualization bioinformatics

Lally asked 13/1, 2010 at 21:57

2

Solved

Snakemake processing large workflow slow due to lengthy sequential checking of job completion? >100x speed reduction

I am working on a rather complex snakemake workflow that spawns several hundreds of thousands of jobs. Everything works... The workflow executes, DAG gets created (thanks to the new checkpoint impl...

workflow bioinformatics snakemake

Garnishment asked 28/2, 2019 at 9:49

14

Solved

Converting FASTQ to FASTA with SED/AWK

I have a data in that always comes in block of four in the following format (called FASTQ): @SRR018006.2016 GA2:6:1:20:650 length=36 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGN +SRR018006.2016 GA2:6:1:20...

awk sed bioinformatics fasta fastq

Raddle asked 9/10, 2009 at 7:22

3

Solved

How can I convert Ensembl ID to gene symbol in R?

I have a data.frame containing Ensembl IDs in one column; I would like to find corresponding gene symbols for the values of that column and add them to a new column in my data frame. I used bioMaRt...

r dataframe bioinformatics bioconductor

Pelops asked 16/2, 2015 at 14:22

3

Collapse intersecting regions

I am trying to find a way to collapse rows with intersecting ranges, denoted by "start" and "stop" columns, and record the collapsed values into new columns. For example I have this data frame: my...

r bioinformatics overlap

Consignee asked 6/6, 2013 at 8:33

3

Solved

How to read vcf file in R

I have this VCF format file, I want to read this file in R. However, this file contains some redundant lines which I want to skip. I want to get something like in the result where the row starts wi...

r bioinformatics genetics vcf-variant-call-format

Changeful asked 11/9, 2015 at 0:45

11

Solved

How to call module written with argparse in iPython notebook

I am trying to pass BioPython sequences to Ilya Stepanov's implementation of Ukkonen's suffix tree algorithm in iPython's notebook environment. I am stumbling on the argparse component. I have ne...

python ipython bioinformatics biopython suffix-tree

Uroscopy asked 5/6, 2015 at 1:12

4

Solved

snakemake: optional input for rules

I was wondering if there is a way to have optional inputs in rules. An example case is excluding unpaired reads for alignment (or having only unpaired reads). A pseudo rule example: rule hisat2_a...

python bioinformatics snakemake

Turnbow asked 16/7, 2018 at 12:47

7

Rosalind "Mendel's First Law" IPRB

As preparation for an upcoming bioinformatics course, I am doing some assignments from rosalind.info. I am currently stuck in the assignment "Mendel's First Law". I think I could brute force mysel...

python bioinformatics rosalind

Canton asked 4/8, 2014 at 12:50

4

Rosalind: Mendel's first law

I'm trying to solve the problem at http://rosalind.info/problems/iprb/ Given: Three positive integers k, m, and n, representing a population containing k+m+n organisms: k individuals are homozy...

python probability bioinformatics rosalind

Smoothshaven asked 15/10, 2014 at 15:10

4

Solved

pandas merge intervals by range

I have a pandas dataframe that looks as the following one: chrom start end probability read 0 chr1 1 10 0.99 read1 1 chr1 5 25 0.99 read2 2 chr1 15 25 0.99 read2 3 chr1 30 40 0.75 read4 What I ...

python pandas bioinformatics

Clicker asked 12/2, 2018 at 18:13

3

Solved

Converting nucleobase representation from ASCII to UCSC .2bit

Unambiguous DNA sequences consist only of the nucleobases adenine (A), cytosine (C), guanine (G), thymine (T). For human consumption, the bases may be represented by the corresponding char in eithe...

c algorithm bit-manipulation bioinformatics micro-optimization

Hereditament asked 9/1, 2022 at 8:28

14

biopython no module named Bio

FYI: this is NOT a duplicate! Before running my python code I installed biopython in the cmd prompt: pip install biopython I then get an error saying 'No module named Bio' when try to import i...

python python-3.x pip bioinformatics biopython

Viscometer asked 16/4, 2018 at 1:33

3

How to cache reads?

I am using python/pysam to do analyze sequencing data. In its tutorial (pysam - An interface for reading and writing SAM files) for the command mate it says: 'This method is too slow for high-thro...

python caching bioinformatics samtools pysam

Chokefull asked 16/12, 2015 at 1:37

4

Solved

multiFASTA file processing

I was curious to know if there is any bioinformatics tool out there able to process a multiFASTA file giving me infos like number of sequences, length, nucleotide/aminoacid content, etc. and maybe ...

bioinformatics biopython fasta bioconductor bioperl

Mitchiner asked 24/11, 2009 at 10:55

1

Solved

How to add to a cnetplot using ggplot functions?

I have a dataset in R that is a class of 'Formal class enrichResult'. I plot the genes in this dataset using cnetplot() from the package DOSE - which is meant to be based on ggplot graphics. This p...

r ggplot2 bioinformatics

About asked 9/9, 2021 at 16:45

bioinformatics Questions

Recommended topics

Hot tags