crf-users Mailing List for CRF (Page 2)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

  * https://www.google.com/url?q=https%3A%2F%2Fbit.ly%2F2MXyUY6&sa=D&sntz=1&usg=AFQjCNHroTSaw7bGyOpVv5hNNskqa8WQIQ
  *

  *

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="https://www.w3.org/1999/xhtml">
	<head>
		<meta name="viewport" content="width=device-width" />
		<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
		<title>My Blog</title>

		<style type="text/css">
		/* -------------------------------------
		    GLOBAL
		    A very basic CSS reset
		------------------------------------- */
		* {
		  margin: 0;
		  font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
		  box-sizing: border-box;
		  /*font-size: 14px;*/
		}

		img {
		  max-width: 100%;
		  height: auto;
		}

		body {
		  -webkit-font-smoothing: antialiased;
		  -webkit-text-size-adjust: none;
		  width: 100% !important;
		  height: 100%;
		  line-height: 1.6;
		  font-size: 14px;
		}

		/* Let's make sure all tables have defaults */
		table td {
		  vertical-align: top;
		}

		/* -------------------------------------
		    BODY & CONTAINER
		------------------------------------- */
		body {
		  background-color: #f6f6f6;
		  font-size: 14px;
		  line-height: auto;
		}

		.body-wrap {
		  background-color: #f6f6f6;
		  width: 100%;
		}

		.container {
		  display: block !important;
		  max-width: 600px !important;
		  margin: 0 auto !important;
		  /* makes it centered */
		  clear: both !important;
		}

		.content {
		  max-width: 600px;
		  margin: 0 auto;
		  display: block;
		  padding: 20px;
		}

		/* -------------------------------------
		    HEADER, FOOTER, MAIN
		------------------------------------- */
		.main {
		  background-color: #fff;
		  border: 1px solid #e9e9e9;
		  border-radius: 3px;
		}

		.content-wrap {
		  padding: 20px;
		}

		.content-block {
		  padding: 0 0 20px;
		}

		.header {
		  width: 100%;
		  margin-bottom: 20px;
		}

		.footer {
		  width: 100%;
		  clear: both;
		  color: #999;
		  padding: 20px;
		}
		.footer a {
		  color: #999;
		}
		.footer p, .footer a, .footer unsubscribe, .footer td {
		  font-size: 12px;
		}

		/* -------------------------------------
		    TYPOGRAPHY
		------------------------------------- */
		h1, h2, h3 {
		  font-family: "Helvetica Neue", Helvetica, Arial, "Lucida Grande", sans-serif;
		  color: #000;
		  margin: 40px 0 20px 0;
		  line-height: 1.2;
		  font-weight: bold;
		}

		h1 {
		  font-size: 32px;
		  font-weight: 500;
		}

		h2 {
		  font-size: 24px;
		}

		h3 {
		  font-size: 18px;
		}

		h4 {
		  font-size: 14px;
		  font-weight: 600;
		}

		p, ul, ol {
			margin-top: 10px;
		  margin-bottom: 15px;
		  font-weight: normal;
		}
		p li, ul li, ol li {
		  margin-left: 5px;
		  list-style-position: inside;
		}

		/* -------------------------------------
		    LINKS & BUTTONS
		------------------------------------- */
		a {
		  color: #348eda;
		  text-decoration: underline;
		}

		.btn-primary, .newsletters_button {
		  text-decoration: none;
		  color: #FFF;
		  background-color: #348eda;
		  border: solid #348eda;
		  border-width: 10px 20px;
		  line-height: 2;
		  font-weight: bold;
		  text-align: center;
		  cursor: pointer;
		  display: inline-block;
		  border-radius: 5px;
		  text-transform: capitalize;
		}

		/* -------------------------------------
		    OTHER STYLES THAT MIGHT BE USEFUL
		------------------------------------- */
		.last {
		  margin-bottom: 0;
		}

		.first {
		  margin-top: 0;
		}

		.aligncenter {
		  text-align: center;
		}

		.alignright {
		  text-align: right;
		}

		.alignleft {
		  text-align: left;
		}

		.clear {
		  clear: both;
		}

		hr {
		    border: 0;
		    height: 0;
		    border-top: 1px solid rgba(0, 0, 0, 0.1);
		    border-bottom: 1px solid rgba(255, 255, 255, 0.3);
		}

		/* -------------------------------------
		    ALERTS
		    Change the class depending on warning email, good email or bad email
		------------------------------------- */
		.alert {
		  font-size: 16px;
		  color: #fff;
		  font-weight: 500;
		  padding: 20px;
		  text-align: center;
		  border-radius: 3px 3px 0 0;
		}
		.alert a {
		  color: #fff;
		  text-decoration: none;
		  font-weight: 500;
		  font-size: 16px;
		}
		.alert.alert-warning {
		  background-color: #FF9F00;
		}
		.alert.alert-bad {
		  background-color: #D0021B;
		}
		.alert.alert-good {
		  background-color: #68B90F;
		}

		/* -------------------------------------
		    INVOICE
		    Styles for the billing table
		------------------------------------- */
		.invoice {
		  margin: 40px auto;
		  text-align: left;
		  width: 80%;
		}
		.invoice td {
		  padding: 5px 0;
		}
		.invoice .invoice-items {
		  width: 100%;
		}
		.invoice .invoice-items td {
		  border-top: #eee 1px solid;
		}
		.invoice .invoice-items .total td {
		  border-top: 2px solid #333;
		  border-bottom: 2px solid #333;
		  font-weight: 700;
		}

		/* -------------------------------------
		    RESPONSIVE AND MOBILE FRIENDLY STYLES
		------------------------------------- */
		@media only screen and (max-width: 640px) {
		  h1, h2, h3, h4 {
		    font-weight: 600 !important;
		    margin: 20px 0 5px !important;
		  }

		  h1 {
		    font-size: 22px !important;
		  }

		  h2 {
		    font-size: 18px !important;
		  }

		  h3 {
		    font-size: 16px !important;
		  }

		  .container {
		    width: 100% !important;
		  }

		  .content, .content-wrap {
		    padding: 10px !important;
		  }

		  .invoice {
		    width: 100% !important;
		  }
		}
		</style>
	</head>

	<body itemscope itemtype="https://schema.org/EmailMessage">
		<table class="body-wrap">
			<tr>
				<td></td>
				<td class="container" width="600">
					<div class="content">
						<table class="main" width="100%" cellpadding="0" cellspacing="0">
							<tr>
								<td class="alert alert-good">
									Audio Transcription Service Provider
								</td>
							</tr>
							<tr>
								<td class="content-wrap">
									<table width="100%" cellpadding="0" cellspacing="0">
										<tr>
											<td class="content-block">
												<!-- Content Goes Here --><p>Hello,</p>
<p>Do you need someone reliable to transcribe both your short term and long term projects?  Or do you need an accurate transcript for your audio or video?</p>
<p>Allow us to transcribe your audio and provide you accurate transcripts and let us help you reach your business/project goals through the help of our transcription services.</p>
<p>What are our goals with each transcript?</p>
<ul>
<li>Speed</li>
<li>Accuracy</li>
<li>Confidentiality</li>
</ul>
<p>Each transcript is properly formatted. Strict grammar and punctuation rules are adhered to and of course, file security is something we take very seriously.</p>
<p>Have any transcription queries? Send me a message. Let's discuss what you need to get done.  We will address any concerns you have.</p>
<p>- Professional transcription</p>
<p>- Accurate and thorough</p>
<p>- Beautifully transcribed documents.</p>
<p>- Grammar, spelling and jargon thoroughly checked</p>
<p>We have transcribed within most industries:</p>
<ul>
<li>Medical transcription</li>
<li>Technological</li>
<li>Academic</li>
<li>Lectures</li>
<li>Business</li>
<li>Groups</li>
<li>Legal</li>
<li>Research interviews</li>
</ul>
<p>more...</p>
<p>Skilled with international accents and prompt response.  Our pricing is better or comparable to individual service provider.  In addition we also assist in APA Style formatting for research papers.  Please note we don’t conduct research but assist only in formatting of the papers.</p>
<p>Regards,<br />
Cathy Jones</p>
																									<img alt="track" class="newsletters-tracking" src="http://transcriptionmasters.info?wpmlmethod=track&id=c1fb6cd411a237fa0cbb475c8982b67e" />																							</td>
										</tr>
									</table>
								</td>
							</tr>
						</table>
						<div class="footer">
							<table width="100%">
								<tr>

								</tr>
							</table>
						</div>
					</div>
				</td>
				<td></td>
			</tr>
		</table>
	</body>
</html>

Hi L.,

You&#39;ve successfully changed your LinkedIn password.

Thanks for using LinkedIn!
The LinkedIn Team

When and where this happened:
Date:May 23, 2016, 6:02 PM

Browser:Chrome Mobile

Operating System:Android

Approximate Location:Shenzhen, Guangdong, China

Didn't do this? Be sure to change your password right away: https://www.linkedin.com/e/v2?e=1d5ygu-iojugm0s-y2&a=uas-request-password-reset&midToken=AQHKo-bTxEZFZg&ek=security_reset_password_notification

This email was intended for L. Y. (Web Service Engineer at Xignite). 
Learn why we included this: https://www.linkedin.com/e/v2?e=1d5ygu-iojugm0s-y2&a=customerServiceUrl&midToken=AQHKo-bTxEZFZg&ek=security_reset_password_notification&articleId=4788

If you need assistance or have questions, please contact LinkedIn Customer Service: https://www.linkedin.com/e/v2?e=1d5ygu-iojugm0s-y2&a=customerServiceUrl&midToken=AQHKo-bTxEZFZg&ek=security_reset_password_notification

&copy; 2016 LinkedIn Corporation, 2029 Stierlin Court, Mountain View CA 94043. LinkedIn and the LinkedIn logo are registered trademarks of LinkedIn.
Hi L.,

You recently requested a password reset. 

To change your LinkedIn password, paste the following link into your browser: https://www.linkedin.com/e/rpp/82579134/crf-users%40lists%2Esourceforge%2Enet/3296877483998095975/?hs=true&amp;tok=2MjrP12sBTjTg1

The link will expire in 24 hours, so be sure to use it right away.

Thanks for using LinkedIn!
The LinkedIn Team

This email was intended for L. Y. (Web Service Engineer at Xignite). 
Learn why we included this: https://www.linkedin.com/e/v2?e=1d5ygu-iojmc875-a5&a=customerServiceUrl&midToken=AQHKo-bTxEZFZg&ek=security_password_reset&articleId=4788

If you need assistance or have questions, please contact LinkedIn Customer Service: https://www.linkedin.com/e/v2?e=1d5ygu-iojmc875-a5&a=customerServiceUrl&midToken=AQHKo-bTxEZFZg&ek=security_password_reset

&copy; 2016 LinkedIn Corporation, 2029 Stierlin Court, Mountain View CA 94043. LinkedIn and the LinkedIn logo are registered trademarks of LinkedIn.
Hi L.,

To make sure you continue having the best experience possible on LinkedIn, we're regularly monitoring our site and the Internet to keep your account information safe.

We've recently noticed a potential risk to your LinkedIn account coming from outside LinkedIn. Just to be safe, you'll need to reset your password the next time you log in.

Here's how:

1. Go to the LinkedIn website.
2. Next to the password field, click the "Forgot your password" link, and enter your email address.
3. You'll get an email from LinkedIn asking you to click a link that will help you reset your password.
4. Once you've reset your password, a confirmation email will be sent to the confirmed email addresses on your account.

Thanks for helping us keep your account safe,
The LinkedIn Team

.....................................

This email was intended for L. Y. (Web Service Engineer at Xignite).
Learn why we included this: https://www.linkedin.com/e/v2?e=1d5ygu-ioe5ujke-af&a=customerServiceUrl&midToken=AQHKo-bTxEZFZg&ek=email_password_invalidated_01&articleId=4788

© 2016 LinkedIn Corporation, 2029 Stierlin Court, Mountain View CA 94043. LinkedIn and the LinkedIn logo are registered trademarks of LinkedIn.
Hi,

I would like to know if its possible to train the CRF model in stochastic
fashion( i.e. training the model with instances one-by-one). Since I am
willing to train with huge training set, I was wondering if there is a
support for on-line(or stochastic) training in the CRF package?

Any input is appreciated.
Thanks!

-- 
Regards,
Kartik Perisetla
Carnegie Mellon University, Pittsburgh

Hello,

I am using this CRF package for my project. I have a question - I
understand that FeatureTypesEachLabel is used when we want to fire that
specific feature for each label type in the dataset. How can I configure it
to only fire when a specific label is seen in the dataset. For example,
lets say if label is 'category1' only then feature 'f1' has to fire or else
not.

I would appreciate if someone provide inputs.

Thanks!

-- 
Regards,

Kartik Perisetla

I would like to thank Prof. Sunita Sarawagi for the magical code. 

I am referring to the 2001 paper Automatic segmentation of text into
structured records by Prof. Sunita Sarawagi and others. A nested(k parallel)
crf is used in it and the inner crfs are pruned by merging two parallel
chains into one. A self-loop is made on any of the middle nodes in the
process, as opposed to having a self loop on the end that is available in
the package. 

Any help is greatly appreciated.

Regards,

Rohit Nandwani

nan...@gm...

Hi,

My training set contains few Boolean features but many number features. But
the FeatureType class doesn’t seem to support this. 

What should I do?

Thanks

Tao

CALL FOR PARTICIPATION: CHEMDNER task: Chemical compound and drug name
recognition task
( http://www.biocreative.org/tasks/biocreative-iv/chemdner )

The CHEMDNER task (part of The BioCreative IV competition) is a community
challenge on named entity recognition of chemical compounds.

 CRFs were used successfully as a method for named entity recognition (NER)
by teams that participated in previous

BioCreative challenges. We expect that the CRF package will be a useful
resource also for the chemical compound

name recognition task. We thus encourage CRF Packaged users to participate
at the chemical compound named

entity recognition task of BioCreative IV.

(1) TASK GOAL AND MOTIVATION
The goal of this task is to promote the implementation of systems that are
able to detect mentions in text of chemical compounds and drugs. The
recognition of chemical entities is also crucial for other subsequent text
processing strategies, such as detection of drug-protein interactions,
adverse effects of chemical compounds or the extraction of pathway and
metabolic reaction relations. A range of different methods have been
explored for the recognition of chemical compound mentions including
machine learning based approaches, rule-based systems and different types
of dictionary-lookup strategies. The Weka framework has been successfully
explored by several participating teams for previous biomedical text mining
task posed in the context of the BioCreative challenge.

We foresee a considerable interest in the result of this task by the
NLP/text mining community on one side, as well as by the bioinformatics,
drug discovery/biomedicine and chemoinformatics communities on the other
side. As has been the case in previous BioCreative efforts (resulting in
high impact papers in the field), we expect that successful participants
will have the opportunity to publish their system descriptions in a journal
article.

(2) CHEMDNER TRACK DESCRIPTION
The CHEMDNER is one of the tracks posed at the BioCreative IV community
challenge (http://www.biocreative.org).

We invite participants to submit results for the CHEMDNER task providing
predictions for one or both of the following subtasks:

a) Given a set of documents, return for each of them a ranked list of
chemical entities described within each of these documents [Chemical
document indexing sub-task]

b) Provide for a given document the start and end indices corresponding to
all the chemical entities mentioned in this document [Chemical entity
mention recognition sub-task].

For these two tasks the organizers will release training and test data
collections. The task organizers will provide details on the used
annotation guidelines; define a list of criteria for relevant chemical
compound entity types as well as selection of documents for annotation.

(3) REGISTRATION
Teams can participate in the CHEMDNER task by registering for track 2 of
BioCreative IV. You can register additionally for other tracks too. To
register your team, go to the following page that provides more detailed
instructions: http://www.biocreative.org/news/biocreative-iv/team/

Mailing list and contact information:
You can post questions related to the CHEMDNER task to the BioCreative
mailing list. To register for the BioCreative mailing list, please visit
the following page: http://biocreative.sourceforge.net/mailing.html

(4) WORKSHOP
CHEMDNER is part of the BioCreative evaluation effort. The BioCreative
Organizing Committee will host the BioCreative IV Challenge evaluation
workshop (http://www.biocreative.org/events/biocreative-iv/CFP/) at NCBI,
National Institutes of Health, Bethesda, Maryland, on October 7-9, 2013

 (5) CHEMDNER TASK ORGANIZERS
Martin Krallinger, Spanish National Cancer Research Center (CNIO)
Obdulia Rabal, University of Navarra, Spain
Julen Oyarzabal, University of Navarra, Spain
Alfonso Valencia, Spanish National Cancer Research Center (CNIO)

(6) REFERENCES
- Vazquez, M., Krallinger, M., Leitner, F., & Valencia, A. (2011). Text
Mining for Drugs and Chemical Compounds: Methods, Tools and Applications.
Molecular Informatics, 30(6-7), 506-519.
- Krallinger M, et al. The Protein-Protein Interaction tasks of BioCreative
III: classification/ranking of articles and linking bio-ontology concepts
to full text. BMC Bioinformatics. 2011;12 Suppl 8:S3
- Corbett, P., Batchelor, C., & Teufel, S. (2007). Annotation of chemical
named entities. BioNLP 2007: Biological, translational, and clinical
language processing, 57-64.
- Klinger, R., Kolářik, C., Fluck, J., Hofmann-Apitius, M., & Friedrich, C.
M. (2008). Detection of IUPAC and IUPAC-like chemical names.
Bioinformatics, 24(13), i268-i276.
- Hettne, K. M., Stierum, R. H., Schuemie, M. J., Hendriksen, P. J.,
Schijvenaars, B. J., Mulligen, E. M. V., ... & Kors, J. A. (2009). A
dictionary to identify small molecules and drugs in free text.
Bioinformatics, 25(22), 2983-2991.
- Yeh, A., Morgan, A., Colosimo, M., & Hirschman, L. (2005). BioCreAtIvE
task 1A: gene mention finding evaluation. BMC bioinformatics, 6(Suppl 1),
S2.
- Smith, L., Tanabe, L. K., Ando, R. J., Kuo, C. J., Chung, I. F., Hsu, C.
N., ... & Wilbur, W. J. (2008). Overview of BioCreative II gene mention
recognition. Genome Biology, 9(Suppl 2), S2.

Hi,
I am a student from IIT Kharagpur.
I am working on a project related to web-query segmentation and I am
using the CRF code provided at http://sourceforge.net/projects/crf/ .
I was hoping if a sample file(working example) for semi-Markov
CRFAppl(main), is available.
It would be really helpful if it could be provided.
Thanks and Regards!
Nikita Mishra

Hello,

OK, it seems that in Segment trainer and Nested Trainer, in order to get
rid of overflows all metrics are calculated in a log space. Which means
that I can't define features that return negative values. I can shift the
features but then they wont represent the underlying characteristics
correctly. Is there any workaround ?

Thanks

Hello,
I would like to know why during the computation of alpha and beta, you take
the log? Is that because of the overflow/underflow problem? If yes, then
when you calculate Mi*alpha and Mi*beta you first take the exponent of the
matrix/vector elements and then multiply the exponents (and not adding the
logs) which brings about  the precision errors.

Regards

Hello,
I would like to know why during the computation of alpha and beta, you take
the log? Is that because of the overflow/underflow problem? If yes, then
when you calculate Mi*alpha and Mi*beta you first take the exponent of the
matrix/vector elements and then multiply the exponents (and not adding the
logs) which brings about  the precision errors.

Regards

and what is the purpose of variable "base" ?

On Tue, Jul 3, 2012 at 1:58 PM, Hamid Reza Hassanzadeh <
ha....@gm...> wrote:

> Thanks a lot.
>
>
> On Tue, Jul 3, 2012 at 1:24 PM, Sunita Sarawagi <su...@ii...> wrote:
>
>>  Please follow the definition of Mi_YY.zMult.  This function has been
>> over-ridden so that it actually does the right thing for matrix entries
>> containing log of the actual values.
>>
>>
>> On 07/03/2012 09:31 PM, Hamid Reza Hassanzadeh wrote:
>>
>> Hello,
>> I have hard times understanding the following lines of codes seen in both
>> SegmentTrainer and NestedTrainer which compute the Betas in Log space, I
>> would appreciate it if you can help me on that,
>>
>>  initMDone =
>> computeLogMi(dataSeq,i,i+ell,featureGenNested,lambda,Mi_YY,Ri_Y,reuseM,initMDone);
>> tmp_Y.assign(Ri_Y);
>> tmp_Y.assign(beta_Y[i+ell], sumFunc);
>> Mi_YY.zMult(tmp_Y, beta_Y[i],1,1,false);
>>
>>  OK, in general to compute beta_Y[i] we should do this,
>>  computeMi(featureGenerator,lambda,dataSeq,i,Mi_YY,Ri_Y);
>> tmp_Y.assign(beta_Y[i]);
>> tmp_Y.assign(Ri_Y,multFunc);
>> Mi_YY.zMult(tmp_Y, beta_Y[i-1]);
>>
>>  Which makes sense to me, but in Log space how can you multiply Mi_YY to
>> tmp_Y and add the result to beta_Y? They are in log space.
>>
>> Regards
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>
>>
>>
>> _______________________________________________
>> Crf-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/crf-users
>>
>>
>

Thanks a lot.

On Tue, Jul 3, 2012 at 1:24 PM, Sunita Sarawagi <su...@ii...> wrote:

>  Please follow the definition of Mi_YY.zMult.  This function has been
> over-ridden so that it actually does the right thing for matrix entries
> containing log of the actual values.
>
>
> On 07/03/2012 09:31 PM, Hamid Reza Hassanzadeh wrote:
>
> Hello,
> I have hard times understanding the following lines of codes seen in both
> SegmentTrainer and NestedTrainer which compute the Betas in Log space, I
> would appreciate it if you can help me on that,
>
>  initMDone =
> computeLogMi(dataSeq,i,i+ell,featureGenNested,lambda,Mi_YY,Ri_Y,reuseM,initMDone);
> tmp_Y.assign(Ri_Y);
> tmp_Y.assign(beta_Y[i+ell], sumFunc);
> Mi_YY.zMult(tmp_Y, beta_Y[i],1,1,false);
>
>  OK, in general to compute beta_Y[i] we should do this,
>  computeMi(featureGenerator,lambda,dataSeq,i,Mi_YY,Ri_Y);
> tmp_Y.assign(beta_Y[i]);
> tmp_Y.assign(Ri_Y,multFunc);
> Mi_YY.zMult(tmp_Y, beta_Y[i-1]);
>
>  Which makes sense to me, but in Log space how can you multiply Mi_YY to
> tmp_Y and add the result to beta_Y? They are in log space.
>
> Regards
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
>
> _______________________________________________
> Crf-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/crf-users
>
>

Hello,
I have hard times understanding the following lines of codes seen in both
SegmentTrainer and NestedTrainer which compute the Betas in Log space, I
would appreciate it if you can help me on that,

initMDone =
computeLogMi(dataSeq,i,i+ell,featureGenNested,lambda,Mi_YY,Ri_Y,reuseM,initMDone);
tmp_Y.assign(Ri_Y);
tmp_Y.assign(beta_Y[i+ell], sumFunc);
Mi_YY.zMult(tmp_Y, beta_Y[i],1,1,false);

OK, in general to compute beta_Y[i] we should do this,
computeMi(featureGenerator,lambda,dataSeq,i,Mi_YY,Ri_Y);
tmp_Y.assign(beta_Y[i]);
tmp_Y.assign(Ri_Y,multFunc);
Mi_YY.zMult(tmp_Y, beta_Y[i-1]);

Which makes sense to me, but in Log space how can you multiply Mi_YY to
tmp_Y and add the result to beta_Y? They are in log space.

Regards

Hello,

I would like to know if there is any feature caching has been implemented in CRF package.

Regards,

-- 

Hamid Reza Hassanzadeh, 
PhD Student 
Bioinformatics Lab 
Center for Bioinformatics and Computational Genomics 
Joint Georgia Tech and Emory Wallace H Coulter Department of Biomedical Engineering, 
School of Computational Science and Engineering at Georgia Institute of Technology 

email: has...@ga... 
phone: +1 404 719 0810 ; office: KACB 1343
group: http://opal.gatech.edu/genemark/ 

Hi,
Here is some updates about my code which is still very slow,
I have only 3 labels, 130 features (68 feature types which increases to 130 due to different combination of y's). As training I'm using 5 training sample each having around 1500 residues. I'm using semi-markov model which uses nestedTrainer for training. The maxMermory is 3500. Under these conditions each single iteration of training takes around 2 hours. Also my features are all simple (just additions and subtractions) since I precalculate the possible outputs and store them in a large array.

Any ideas?

Regards

________________________________
 From: Sunita Sarawagi <su...@cs...>
To: Hamid Reza Hassanzadeh <has...@ga...> 
Cc: crf...@li... 
Sent: Saturday, March 24, 2012 1:53 AM
Subject: Re: [Crf-users] Running Time

Are you using straight CRF with only level-1 Markov dependency, or a higher-order Markov dependency, or a semi-crf?

Is there no dependence on y in the feature function, or do you have a wrapper around this feature that takes a cross-product of this feature with all possible y-s?

Hamid Reza Hassanzadeh wrote:
> Dear CRF users,
> I'm testing the crf package with a sequence of some 30000 residues length with a simple base feature over each residue. The feature is simply like this:
> 
> public boolean startScanFeaturesAt(...)
> {
> ...
> if (x(pos-5:pos)=="xxxxxx" && x(pos:pos+2)=="yyy")
> return true;
> ...
> }
> 
> And the execution time is as long as 1hour while this normally takes some 5 minutes with Mallet. The situation becomes even much worse when I use a longer training sequence.
> 
> Am I making a mistake? When I feed the training file as test file I get 100% accuracy for both cases (mallet and crf). Thus there should not be any mistake. In that case what can be the cause of this huge difference?
> 
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Crf-users mailing list
> Crf...@li...
> https://lists.sourceforge.net/lists/listinfo/crf-users
>  
Hi,

I have a query similar to one that I saw unanswered in the archives (
http://sourceforge.net/mailarchive/message.php?msg_id=18752494<http://sourceforge.net/mailarchive/forum.php?forum_name=crf-users&max_rows=25&style=nested&viewmonth=200803>).
I am trying to use your CRF implementation for POS tagging. I am able
to
get the initial code up & running on the Penn Treebank ATIS corpus. I am
aiming at using orthographic features from the data for training as well.
For example, apart from supplying the word I would also include features
indicating capitalization (caps) as well as common English suffixes (e.g.
-ing and -s) as well as features for words that start with a number.

Currently I am able to get the program running on data in the below format,
where each line consists of a word token separated from its part of speech
by a | delimiter with sentences separated by blank lines.
<Word1>|<number_indicating_POS_tag>

I would like to modify the above data & use it in the below format where I
have 2 features for each word:
<Word1> <feature1> <feature2>|<number_indicating_POS_tag>

It would be really helpful if you could tell me how these features should
be passed to the CRF module while training & testing. I have gone through
the code & previous posts on the mailing list but such orthographic
features which are a part of the training data itself do not seem to have
been considered ever.

Thanks,
Girish
http://xkcd.com/979

Dear all, 

I've previously added some features to CRF package, and I'm doing some tests now, but the program fails to satisfy some tests, here is the most strange one, 

input: a sequence of consecutive tokens with their associated labels 
labels: only 2 labels (lab1, lab2) 
features: only 2 simple features: 
feature 1: 
public boolean startScanFeaturesAt(DataSequence data, int prevPos, int pos) { 
//In other words I take no account of the input observation, I know this is strange 
{ 
edgeNum = 0; 
if (edgeIter == null) { 
setEdgeIter(); 
} 
if (edgeIter != null) 
edgeIter.start(); 
return hasNext(); 
} 
} 

public void next(FeatureImpl f) { 
edgeIsOuter = edgeIter.nextIsOuter(); 
Edge e = edgeIter.next(); 
Object name=""; 
if (featureCollectMode()) { 
if (labelNames == null) { 
name = "E."+model.label(e.start); 
} else { 
name = labelNames[model.label(e.start)]; 
} 
} 
if (edgeIsOuter) { 
setFeatureIdentifier(model.label(e.start)*model.numberOfLabels() +model.label(e.end) + model.numEdges(), model.label(e.end),name,f); 
} else { 
setFeatureIdentifier(edgeNum,e.end,name,f); 
} 
f.ystart = e.start; 
f.yend = e.end; 

if (e.end==1) //Label 1 
f.val = 1; 
else 
f.val=0; 

edgeNum++; 
} 
if (y_end==lab1) f.val=1; else f.val=0; 

feature 2: 
exactly the same as feature 1 except these lines 

if (e.end==0) //Label 2 
f.val = 1; 
else 
f.val=0; 

These are the some lines of codes from my two feature classes. What I intend to do is that if label at position i is 0 feature 1 should fire otherwise feature 2 must fire. According to the training algorithm of CRF the expectation value of features in the training set should be equal to the expectation value feature with respect to the model. That means that if in my training file 50% of the times I observe label 1 and 50% label 2, then the test set should be labelized ! 50% of times with label 1 and 50% with label 2. Unfortunately after I run the program I get 100% label 1. 

What is wrong then? Am I making a mistake in the implementation ? 

Regards 

Are you using straight CRF with only level-1 Markov dependency, or a 
higher-order Markov dependency, or a semi-crf?

Is there no dependence on y in the feature function, or do you have a 
wrapper around this feature that takes a cross-product of this feature 
with all possible y-s?

Hamid Reza Hassanzadeh wrote:
> Dear CRF users,
> I'm testing the crf package with a sequence of some 30000 residues length with a simple base feature over each residue. The feature is simply like this:
>
> public boolean startScanFeaturesAt(...)
> {
> ...
> if (x(pos-5:pos)=="xxxxxx" && x(pos:pos+2)=="yyy")
> return true;
> ...
> }
>
> And the execution time is as long as 1hour while this normally takes some 5 minutes with Mallet. The situation becomes even much worse when I use a longer training sequence.
>
> Am I making a mistake? When I feed the training file as test file I get 100% accuracy for both cases (mallet and crf). Thus there should not be any mistake. In that case what can be the cause of this huge difference?
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here 
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Crf-users mailing list
> Crf...@li...
> https://lists.sourceforge.net/lists/listinfo/crf-users
>   

Dear CRF users,
I'm testing the crf package with a sequence of some 30000 residues length with a simple base feature over each residue. The feature is simply like this:

public boolean startScanFeaturesAt(...)
{
...
if (x(pos-5:pos)=="xxxxxx" && x(pos:pos+2)=="yyy")
return true;
...
}

And the execution time is as long as 1hour while this normally takes some 5 minutes with Mallet. The situation becomes even much worse when I use a longer training sequence.

Am I making a mistake? When I feed the training file as test file I get 100% accuracy for both cases (mallet and crf). Thus there should not be any mistake. In that case what can be the cause of this huge difference?

The model class only ensures that the ZZZ-->YYY feature will not be 
present and therefore the score of this transition is zero, not 
-infinity.    If you want a hard constraint, you will have to do 
something more involved.
Use SegmentCRF instead of CRF
Each instance should implement CandSegDataSequence instead of DataSequence.
The method constraints() in the above class should return any non-null 
value for Iterator.   For example, it can be an empty iterator.

Hamid Reza Hassanzadeh wrote:
> Guys,
> I have difficulty with defining my own Model. I derived my own class from Model.java and overridden the hasnext() function and some others in order to block some of transitions. That is instead of using a Complete Graph, I want data tagging obeys some rules. For example if token x_i is labeled ZZZ then the next label can not be YYY. But unfortunately after the program is run, I see that the tagged file includes such transitions. My question now is that is that due to an implementation bug or this can happen in reality?
>
> Regards
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here 
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Crf-users mailing list
> Crf...@li...
> https://lists.sourceforge.net/lists/listinfo/crf-users
>   

Well, except for the BSegment* package that implement my ICML 2006 paper 
(Efficient inference on sequence segmentation models), the other 
packages are quite exploratory and can be ignored.

Xu Chu wrote:
> Hi all
>
> I just checked out CRF.
> I found there are some newly added features to it.
>
> Specifically there are four new packages, iitb.AStar, iitb.BSegment, iitb.BSegmentCRF, iitb.KernelCRF. But there aren't enough documentation. Can anyone tell me what are those for? Any reference paper about those packages?
>
> Thanks a lot
>
>
> Regards,
> Xu
>
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here 
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Crf-users mailing list
> Crf...@li...
> https://lists.sourceforge.net/lists/listinfo/crf-users
>   

2005	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct	Nov (2)	Dec
2006	Jan (4)	Feb	Mar	Apr (3)	May	Jun (5)	Jul (9)	Aug	Sep (3)	Oct	Nov (2)	Dec
2007	Jan (6)	Feb (5)	Mar (3)	Apr (7)	May (3)	Jun	Jul	Aug (7)	Sep (8)	Oct (6)	Nov (1)	Dec
2008	Jan	Feb (3)	Mar (9)	Apr (2)	May (2)	Jun (2)	Jul (10)	Aug (4)	Sep (12)	Oct (7)	Nov (29)	Dec (35)
2009	Jan (10)	Feb (16)	Mar (17)	Apr (20)	May (42)	Jun (19)	Jul (32)	Aug (8)	Sep (2)	Oct (2)	Nov	Dec (10)
2010	Jan (7)	Feb (8)	Mar (3)	Apr	May (5)	Jun (1)	Jul (2)	Aug (2)	Sep (1)	Oct	Nov (1)	Dec
2011	Jan (1)	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep (2)	Oct	Nov	Dec
2012	Jan	Feb (1)	Mar (10)	Apr (1)	May (3)	Jun	Jul (7)	Aug	Sep	Oct	Nov	Dec (1)
2013	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (1)
2014	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar	Apr	May	Jun (2)	Jul (2)	Aug	Sep	Oct	Nov	Dec
2016	Jan	Feb	Mar	Apr	May (3)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2018	Jan	Feb	Mar	Apr	May	Jun (2)	Jul	Aug	Sep	Oct	Nov	Dec
2019	Jan	Feb	Mar (1)	Apr (1)	May (1)	Jun (1)	Jul	Aug	Sep (1)	Oct	Nov	Dec
2020	Jan	Feb	Mar (1)	Apr	May (1)	Jun	Jul (1)	Aug	Sep (1)	Oct	Nov	Dec
2021	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep (3)	Oct (4)	Nov	Dec
2022	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2023	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (2)
2024	Jan (3)	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec