TADM / Discussion / Help: tadm problem with a large data set

G'day,

We are trying to run tadm/evaluate (with a patch form Stephan Oepen) on a fairly large data set (875809 features). It is actually enhanced semantic dependencies for parse ranking with HPSG.

On a 32 bit machine, we can't even load the event file without running out of memory (I have added the error message at the end).

On a 64 bit machine, it runs and produces a parameter file, but when we try to evaluate a test set, evaluate core dumps.

In fact we have three files (train, devel and test). We tried training on test or devel and that produces parameter files we can use with evaluate on test, devel or train. Therefore, we think the event files themselves are well formed.

Has anyone ever seen behavior like this before? If not, any ideas as to what may be causing it?
We would be happy to provide the data-sets for debugging, they are around 80M in all.

Francis Bond, NICT

P.S. Error message on the 32 bit machine
bond@knut:~/sprg$ tadm -events_in trains_df.eve.gz -params_out para_df-smooth.par -monitor -fatol 1e-32 -frtol 1e-7 -variances variances -malloc_log

Maximum Entropy Parameter Estimation
version 0.99.5 (07 August 2005)

Start: Tue Jan 16 23:34:19 2007

Events in = trains_df.eve.gz
--------------------------------------------------------------------------
Petsc Release Version 2.3.0, Patch 44, April, 26, 2005
See docs/changes/index.html for recent updates.
See docs/faq.html for hints about trouble shooting.
See docs/index.html for manual pages.
-----------------------------------------------------------------------
/home/bond/logon/sdsu/bin/linux.x86.32/tadm on a linux-gnu named knut by bond Tue Jan 16 23:34:19 2007
Libraries linked from /home/oe/src/logon/anl/petsc-2.3.0/lib/linux-gnu
Configure run at Sat Feb 11 14:53:06 2006
Configure options --download-f-blas-lapack=1 --with-default-optimization=O --with-mpi=0 --with-dynamic=0 --with-clanguage=C++ --with-shared=0
-----------------------------------------------------------------------
[0]PETSC ERROR: PetscMallocAlign() line 62 in src/sys/src/memory/mal.c
[0]PETSC ERROR: Out of memory. This could be due to allocating
[0]PETSC ERROR: too large an object or bleeding by not properly
[0]PETSC ERROR: destroying unneeded objects.
[0] Maximum memory PetscMalloc()ed 1213346708 maximum size of entire process 0
[0] Memory usage sorted by function
[0] 14060 ClassPerfLogCreate()
[0] 812 ClassRegLogCreate()
[0] 1213296340 Dataset::readEvents()
[0] 30060 EventPerfLogCreate()
[0] 812 EventRegLogCreate()
[0] 720 PetscFListAdd()
[0] 32 PetscPushSignalHandler()
[0] 244 PetscStackCreate()
[0] 2336 PetscStrallocpy()
[0] 524 StackCreate()
[0] 788 StageLogCreate()
[0]PETSC ERROR: Memory requested 2402076528!
[0]PETSC ERROR: PetscTrMallocDefault() line 188 in src/sys/src/memory/mtr.c
[0]PETSC ERROR: Dataset::readEvents() line 248 in dataset.cc
[0]PETSC ERROR: VecDuplicate() line 1586 in src/vec/interface/vector.c
[0]PETSC ERROR: Invalid argument!
[0]PETSC ERROR: Wrong type of object: Parameter # 1!
[0]PETSC ERROR: initializeDataset() line 563 in dataset.cc
Params out = para_df-smooth.par
[0]PETSC ERROR: VecDuplicate() line 1586 in src/vec/interface/vector.c
[0]PETSC ERROR: Null argument, when expecting valid pointer!
[0]PETSC ERROR: Null Object: Parameter # 1!
[0]PETSC ERROR: VecSet() line 945 in src/vec/interface/vector.c
[0]PETSC ERROR: Invalid argument!
[0]PETSC ERROR: Wrong type of object: Parameter # 1!
Segmentation fault (core dumped)

tadm problem with a large data set

Forums

Help

tadm problem with a large data set document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

tadm problem with a large data set