Re: [Scaffoldhunter-devel] GSoC: Scaffold Hunter for medical image retrieval

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Falk,

The ability to dynamically select the features in the visualisation is a core requirement for any image retrieval. Although the feature selection can be done per feature, this should not be the rule - the ability to select N features at once will be important. 

100K dataset is a realistic sample size for our intended medical image retrieval and indeed the LIDC is ~1K. However, your numbers are highly dependent on the number of features. For a single patient image data in the LIDC, it consist of 100s of image slices and therefore, if we want raw features, we could be looking at massive feature space. Your approach with array would work well under reasonable data size but may not scale well and could restrict our feature extraction options. Nevertheless, your performance sounds promising for our application.

The features that can be extracted from CT is vast; see for example L. Dettori and L. Semler, 'A comparison of wavelet, ridgelet, and curvelet-based texture classification algorithms in computed tomogarphy' Computers in Biology and Medicine, 2007 

There are many ways to compare varying vector sizes; we can derive rules and penalty function, for example. The current VAMIR used a graph-matching algorithm to represent the features as nodes; varying node sizes were compared via a penalty function.

Now a novel approach would be to have an automated feature extraction process such that most relevant features, based on the query, could be represented in the information visualisation output. This can be further improved with dynamic, user-driven, feature selection, such that the feature selection can thus become an visual analytic problem!

Hope this helps,

Jinman

-----Original Message-----
From: ne...@in... [mailto:ne...@in...] 
Sent: Friday, 26 April 2013 7:35 PM
To: sca...@li...
Subject: Re: [Scaffoldhunter-devel] GSoC: Scaffold Hunter for medical image retrieval

Hi,

from what I understand, the motivation for the project is that a user should be able to dynamically choose what features should be visualized, as he might be interested in only a few aspects while deeming others unimportant with regard to his question. I thought about some key points that need to be considered.
The visualization relies on absolute distances between the query's and the returned images' features, so these distances have to be calculated whenever a new feature is chosen to be included in the visualization (as opposed to only sorting the results). However, in the majority of cases this will only result in a single subtraction per data base entry (e.g.
query_feature_5 - result_feature_5 for all result images).
In order to do some rough performance testing, I have simulated a data base of 100.000 entries that mimics the structure of the current prototype, which essentially stores the feature values as doubles in nested ArrayLists. Comparing a random query to all other entries in terms of one random feature never took more than 70 milliseconds on a below-average machine. Substituting the inner ArrayList for an actual Array led to a decrease of more than a tenfold, so there is already a simple way to optimize underlying data structures. Also, whereas I used
100.000 entries, the LIDC collection you proposed only comprises about
1000 images and a simulated query for this number dropped to less than 0.1 milliseconds. I am aware that this kind of benchmarking is not likely to be a 100% precise when taken out of context but still this issue does not really seem like a bottleneck atm.
However I am not sure how to handle a "classic" retrieval task that uses all features. In the past these values were not computed during runtime. I think vector-based methods could only be applied when disregarding the fact that cases in the data base contain a different number of tumours (unlike a graph-based approach), is that right? Comparing an entry with 5 tumours to a query with only 3 tumours then could be done through averaging the respective tumour features (e.g. their homogeneity). Still this would result in a high number of operations if we take into account around 20 different features per image. Again I would firstly have a look at how to improve the current ad-hoc approach of the data structure and test optimised implementations like the Trove library (since the underlying data is homogenous and stored as primitive types).
Furthermore, the possibility to select features would have to be integrated in the user interface in a smart way (this may sound trivial but looking at the code that's hard to say) and one has to think about what to display in the beginning or which feature combinations to propose to a user. Although the latter is technically not about coding (?)

I had a look at the LIDC data base. The annotations are present in XML files for which I have started to write a simple reader. It should be possible to integrate them in VAMIR easily if features can be extracted. I am not sure about the meaning of the features used in the annotations or how to normalize them, I guess a corresponding publication will shed light on that. But in general I think this project will be justified by the use of more extensive data than the 50 images that are used now.

Best,
Falk

Am 23.04.2013 14:46, schrieb Jinman Kim:
> Hi Falk, very happy to hear - can you tell us more about how you want 
> to
include 'visual features' to your project?
>
> For this project, we want to use public Lung cancer database (LIDC)
which already has many visual features such as ROIs, types of lymph nodes, sphericity, volume, etc. Will you use these or expand them?
>
> http://imaging.cancer.gov/programsandresources/informationsystems/lidc
>
>
> Jinman
>
>
> -----Original Message-----
> From: Karsten Klein [mailto:kar...@ud...]
> Sent: Tuesday, April 23, 2013 2:33 PM
> To: sca...@li...
> Subject: Re: [Scaffoldhunter-devel] GSoC: Scaffold Hunter for medical
image retrieval
>
> Am 21.04.2013 22:19, schrieb ne...@in...:
>> Hi,
>>
>> I am currently a graduate student of medical engineering at the 
>> University of Lübeck, Germany. From October 2012 to March 2013, I 
>> conducted an internship at the University of Sydney with the task of 
>> introducing methods of visual analytics for medical image retrieval 
>> into Scaffold Hunter. Under the supervision of the now GSoC mentors 
>> Dr. Jinman Kim and Dr. Karsten Klein, I built a prototype of a plugin 
>> that will now serve as a starting point for the GSoC project ideas 
>> related to image retrieval. As the first results were very promising 
>> and the approach is novel, I am highly motivated to continue my 
>> participation in this project in the scope of this year's Summer of 
>> Code and plan to apply for developing efficient techniques of visual 
>> feature selection. Hope to work together with you soon!
>>
>> Best regards,
>> Falk Nette
>>
>>
>>
>> ---------------------------------------------------------------------
>> -
>> -------- Precog is a next-generation analytics platform capable of 
>> advanced analytics on semi-structured data. The platform includes 
>> APIs for building apps and a phenomenal toolset for data science.
>> Developers can use our toolset for easy data analysis & visualization.
>> Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> _______________________________________________
>> Scaffoldhunter-devel mailing list
>> Sca...@li...
>> https://lists.sourceforge.net/lists/listinfo/scaffoldhunter-devel
>>
>
> Hi Falk,
>
> thanks for your interest. Good to see you were not too bored with the
work on Scaffold Hunter and you are still interested in improving the tool ;-).
> If you have any further questions please post them here.
>
> Best,
>     Karsten
>
>
> ----------------------------------------------------------------------
> -------- Try New Relic Now & We'll Send You this Cool Shirt New Relic 
> is the only
SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
> _______________________________________________
> Scaffoldhunter-devel mailing list
> Sca...@li...
> https://lists.sourceforge.net/lists/listinfo/scaffoldhunter-devel
>
> ----------------------------------------------------------------------
> -------- Try New Relic Now & We'll Send You this Cool Shirt New Relic 
> is the only SaaS-based application performance monitoring service that 
> delivers powerful full stack analytics. Optimize and monitor your 
> browser, app, & servers with just a few lines of code. Try New Relic 
> and get this awesome Nerd Life shirt! 
> http://p.sf.net/sfu/newrelic_d2d_apr
> _______________________________________________
> Scaffoldhunter-devel mailing list
> Sca...@li...
> https://lists.sourceforge.net/lists/listinfo/scaffoldhunter-devel
>

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scaffoldhunter-devel mailing list
Sca...@li...
https://lists.sourceforge.net/lists/listinfo/scaffoldhunter-devel