Menu

#37 Request for Circos-format Input File

Accepted
nobody
None
High
Enhancement
2014-01-10
2013-12-24
Anonymous
No

Originally created by: cwarde...@gmail.com

Hi,

I have used SVDetect to predict structural variants, but I have had a hard time creating the corresponding circos plot.  Your software seems easy to use and potentially very helpful to users with similar problems to myself.

Here are a couple lines for the circos input file:

1    hschr10    100004512    100006262    color=purple
1    hschr10    100005280    100008037    color=purple
2    hschr10    100004512    100006386    color=orange
2    hschr10    100006534    100008906    color=orange
3    hschr10    100005224    100007893    color=purple
3    hschr10    100007187    100009793    color=purple

The first column doesn't maps connected regions (see first two rows, 3rd and 4th row, etc.).

The second column is the chromosome (I can change the format to be more standard), the 3rd column is the start coordinate, and the 4th column is the stop coordinate.

The fifth column describes the desired color for the connections (in this case, they describe the category of structural variant)

Do you think you can add this to be an acceptable input file and/or describe how I can convert this format to something accepted by pomo?

Thanks,
Charles

Discussion

  • Anonymous

    Anonymous - 2014-01-10

    Originally posted by: jlin...@gmail.com

    Hi, I just notice this and sorry for getting back to you so late. You can update the input format this way:
    1    hschr10    100004512    100006262    color=purple
    1    hschr10    100005280    100008037    color=purple
    becomes
    10:100004512:100006262:TYPE\t10:100005280:100008037:TYPE\tpurple
    10:100004512:100006262:TYPE\t10:100005280:100008037:TYPE\torange
    ...
    TYPE can be GEXP, or PROT, or just leave it blank...
    One issue will be that the nodes are very close to each other, the arc will be minimal... We have an option to made nodes veer towards the center but I need to check the current deployment.
    Thanks for reporting this, we will look into adding support for circos input files, is it a tab space (\t) the standard and what is the file extension?

    Labels: -Type-Defect -Priority-Medium Type-Enhancement Priority-High
    Status: Accepted

     
  • Anonymous

    Anonymous - 2014-01-10

    Originally posted by: jlin...@gmail.com

    By the way, the \t is a tab space, or you can use comma or space. Use tsv or csv or txt for corresponding file extensions.

     
  • Anonymous

    Anonymous - 2014-01-10

    Originally posted by: cwarde...@gmail.com

    Hi,

    Thanks for your reply.

    Actually, I've found that the SVDetect SV table was easier to convert to POMO format than the file that was intended for circos input (in fact, I found converting that file to POMO format was easier than using the intended circos input in circos or Rcircos, which I didn't ever get to work).

    When parsing the SV .txt file, I can also look at the SV type and create two separate POMO files (for example, deletions will be described as annotations rather than interactions).  I think this provides the best solution that is easiest for me.  If you would like, I can give you an example of what the .sv format looks like, but I don't think it should really be considered a good general format (there are a lot of extra columns with necessary scores, etc.).

    On the other hand, if you are still looking for feedback, one thing I think would be helpful is to update the genome builds.  My alignment was for hg19, so I'm sure the cytoband was slightly wrong for my translocations.  I guess gene symbol interactions / annotations will be OK (because they will be mapped to hg17), but I think it is likely I will need to provide a combination of genomic intervals for at least some cases.

    Thanks,
    Charles

     
  • Anonymous

    Anonymous - 2014-01-10

    Originally posted by: jlin...@gmail.com

    Hi Charles,

    Thanks for your feedback. We are using hg19 for the human reference gene
    name/id translations.
    It is a good catch on your part about the different builds: pretty sure
    that the cytoband is based on hg18, and possibly chromosome sizes too. I
    will add a ticket to fix this.

    FYI, POMO is a descendent of Regulome Explorer, which was started on TCGA
    data with hg18 data.

     

Log in to post a comment.

MongoDB Logo MongoDB