I have a data conversion problem on 'ON ERROR' clause.
I'd like to skip dirty rows using 'ON DATA CONVERSION ERROR SKIP' clause, but it doesnt work.
DDL:
create table MYLOGS
(anonid int, query varchar, querytime datetime format 'yyyy-MM-dd
HH:mm:ss', itemrank int, clickurl varchar )
column sep '\t';
Query:
select
year(querytime) as qy, month(querytime) as qm, count(*) as qcnt
from MYLOGS
group by qy, qm ON DATA CONVERSION ERROR SKIP
;
On Reduce, I got an error:
java.io.IOException: Could not convert to date:null
at com.business.cloudbase.hadoop.job.AggFunHandler
$AggFunReducer.reduce(Unknown Source)
at com.business.cloudbase.hadoop.job.AggFunHandler
$AggFunReducer.reduce(Unknown Source)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:
318)
at org.apache.hadoop.mapred.TaskTracker$Child.main
(TaskTracker.java:2198)
Thanks,
Youngwoo
Can you post few lines from your log file... you can mask the fields, I just want to see why NULL is returned for Date column.
Hi Taran,
It's a simple table.
My DDL:
create table MYLOGS
(anonid int, query varchar, querytime datetime format 'yyyy-MM-dd
HH:mm:ss', itemrank int, clickurl varchar )
column sep '\t';
This is search logs and the log file contains dirty rows. I just wanted to skip dirty rows using year(), month() functions.
Thanks,
Youngwoo