Hi,
It is possible to use CRF for a complete sequence,
only thing is that ur sequence will not a be sequence
but a single entity.
I am not clear as what kind of features you want to
use. Probably you will find the MaxentClassifier in
CRF package useful which is nothing but Logistic
Regression.
Otherwise, if ur data has sequential properties and
you are looking forward to use different kinds of
features in crf package, then as a hack you can try
labelling all the tokens in a sequence with single
label.
The segment package input data format supports this.
You just need to specify ur text sequence in a single
line and then specify the label at the end
This is a test line | 1
This is another sequence | 2
Regarding classifying only seq 8 and not classifying
seq5,6,7, there is no direct support in CRF package.
To be honest, CRF package is mainly designed for
features and algorithms, but the dataset loading,
preprocessing is something that is the job of the
user. So, you can do some simple preprocessing and
remove sequences which contain labels that were not
present in training data.
-regards
Amit
--- Joel Bruno Santos da Costa <joel@...>
wrote:
> CORRUPTED MESSAGE
>
> This is the Courier Mail Server 0.47 on
> imap.it.iitb.ac.in.
>
> I received the following message for delivery to
> your address. This message
> contains several internal formatting errors. This
> is often caused by
> viruses that attempt to infect remote systems.
> Instead of blocking
> this message, it has been converted as a safe,
> text-only attachment that
> can be safely read with a text editor.
>
> This sometimes also happens when the sender's mail
> software has a bug
> that creates improperly-formatted messages.
> Although these kinds of
> formatting errors may often be ignored by other mail
> servers, this
> server detects and intercepts improperly-coded
> messages in order to
> prevent viruses from taking advantage of bugs in
> E-mail programs:
>
>
-----------------------------------------------------------------------------
> This message contains improperly-formatted binary
> content, or attachment.
>
> See <URL:ftp://ftp.isi.edu/in-notes/rfc2045.txt> for
> more information.
>
-----------------------------------------------------------------------------
> > Received: from localhost (localhost [127.0.0.1])
> (forwarded by amitj@...)
> by imap.it.iitb.ac.in with local; Mon, 30 Apr 2007
> 00:10:08 +0530
> id 00006147.4634E688.000002F1
> Delivered-To: amitj@...
> Received: from mailrly1.iitb.ac.in
> (mailrly1.iitb.ac.in [::ffff:10.209.3.1])
> by imap.it.iitb.ac.in with esmtp; Mon, 30 Apr 2007
> 00:10:07 +0530
> id 0000611F.4634E687.000002ED
> Received: (qmail 8577 invoked by uid 510); 30 Apr
> 2007 00:06:04 +0530
> Received: from 146.164.34.2 by mailrly1
> (envelope-from <joel@...>, uid 502) with
> qmail-scanner-2.01
> (clamdscan: 0.90.2-exp/3178. spamassassin: 3.1.7.
> Clear:RC:0(146.164.34.2):SA:0(-4.4/4.0): Processed
> in 18.123 secs); 30 Apr 2007 00:05:45 +0530
> X-Spam-Checker-Version: SpamAssassin 3.1.7
> (2006-10-05) on mailrly1.iitb.ac.in
> X-Spam-Level:
> X-Spam-Status: No, score=-4.4 required=4.0
> tests=ALL_TRUSTED,BAYES_00 autolearn=unavailable
> version=3.1.7
> X-Envelope-From: joel@...
> Received: from unknown (HELO isis.cos.ufrj.br)
> (146.164.34.2)
> by mailrly1.iitb.ac.in with SMTP; 30 Apr 2007
> 00:05:45 +0530
> Old-Received-SPF: pass (mailrly1.iitb.ac.in: SPF
> record at cos.ufrj.br designates 146.164.34.2 as
> permitted sender)
> Received: from localhost (isis [127.0.0.1])
> by isis.cos.ufrj.br (Postfix) with ESMTP id
> 977782011D
> for <amitj@...>; Sun, 29 Apr 2007
> 15:35:41 -0300 (BRT)
> Received: from isis.cos.ufrj.br ([127.0.0.1])
> by localhost (isis.cos.ufrj.br [127.0.0.1])
> (amavisd-new, port 10025)
> with ESMTP id 24962-09 for <amitj@...>;
> Sun, 29 Apr 2007 15:35:39 -0300 (BRT)
> Received: from cos.ufrj.br (isis [127.0.0.1])
> by isis.cos.ufrj.br (Postfix) with ESMTP id
> AB62620119
> for <amitj@...>; Sun, 29 Apr 2007
> 15:35:39 -0300 (BRT)
> From: "Joel Bruno Santos da Costa"
> <joel@...>
> To: amitj@...
> Subject: CRF Package
> Date: Sun, 29 Apr 2007 16:35:39 -0200
> Message-Id: <20070429182314.M29516@...>
> X-Mailer: Open WebMail 2.51 20050228
> X-OriginatingIP: 201.19.131.37 (joel)
> MIME-Version: 1.0
> Content-Type: text/plain;
> charset=iso-8859-1
> X-Virus-Scanned: amavisd-new at cos.ufrj.br
>
> Dear Amit,
>
> I'am student at Federal University of Rio de
> Janeiro. May you help me with CRF
> Package (crf.sourceforge.net)?
>
> Is it possible to use it for classification
> without tagging, i.e., one class per
> sequence ? Do you have some example ?
>
> I want to train the model with the sequences of
> some class and after test with
> sequences of this class and others sequences of
> other classes.
>
> Something like this:
>
> train:
> seq1: class A
> seq2: class A
> seq3: class A
> seq4: class A
>
> test:
> seq5: class B
> seq6: class C
> seq7: class D
> seq8: class A
>
> The objective is classify seq8 and not classify
> seq5, seq6 and seq7.
>
> Thanks a lot.
>
> Best Regards,
>
> Joel Bruno Santos da Costa
> Mestrado - Intelig�ncia Artificial
> PESC - Programa de Engenharia de Sistemas e
> Computa��o
> COPPE/UFRJ
>
>
|