I have slightly modified the fuzzy machting function tinytm_get_fuzzy_matches so that it will retrieve all segments for the respective language combination, if an empty string is pass in as the source text. The source text for the PostgreSQL function is below. I thought this could be useful, to have something like a primitive TM export function.
Cheers,
Martin
-- Function: tinytm_get_fuzzy_matches(character varying, character varying, character varying, character varying, character varying)
-- DROP FUNCTION tinytm_get_fuzzy_matches(character varying, character varying, character varying, character varying, character varying);
CREATE OR REPLACE FUNCTION tinytm_get_fuzzy_matches(character varying, character varying, character varying, character varying, character varying)
RETURNS SETOF tinytm_fuzzy_search_result AS
$BODY$
DECLARE
p_source_lang alias for $1;
p_target_lang alias for $2;
p_source_text alias for $3;
p_tag_string alias for $4;
p_penalty_string alias for $5;
result tinytm_fuzzy_search_result%ROWTYPE;
row RECORD;
BEGIN
IF p_source_text = '' THEN
FOR row IN
SELECT
tinytm_levenshtein(ts.source_text,ts.source_text) + 100 as levenshtein,
ts.source_text,
ts.target_text
FROM
tinytm_segments ts
LEFT OUTER JOIN tinytm_languages sl ON (ts.source_lang_id = sl.language_id)
LEFT OUTER JOIN tinytm_languages tl ON (ts.target_lang_id = tl.language_id)
WHERE
source_lang_id in (select * from tinytm_lang_ids_from_lang(p_source_lang)) and
target_lang_id in (select * from tinytm_lang_ids_from_lang(p_target_lang))
LOOP
result.score := row.levenshtein;
result.source_text = row.source_text;
result.target_text = row.target_text;
RETURN NEXT result;
END LOOP;
ELSE
FOR row IN
SELECT
100 * (source_len - levenshtein) / source_len as score,
ts.source_text,
ts.target_text,
source_len
FROM
(select distinct
tinytm_levenshtein(trim(p_source_text), ts.source_text) * 1.0 as levenshtein,
length(p_source_text) * 1.0 as source_len,
sl.language as source_lang,
tl.language as target_lang,
source_text,
target_text
from
tinytm_segments ts
LEFT OUTER JOIN tinytm_languages sl ON (ts.source_lang_id = sl.language_id)
LEFT OUTER JOIN tinytm_languages tl ON (ts.target_lang_id = tl.language_id)
where
source_lang_id in (select * from tinytm_lang_ids_from_lang(p_source_lang)) and
target_lang_id in (select * from tinytm_lang_ids_from_lang(p_target_lang))
order by
levenshtein
LIMIT 20
) ts
where
source_len > 5
LOOP
IF row.score > 50.0 THEN
result.score := row.score::numeric(6,1);
result.source_text = row.source_text;
result.target_text = row.target_text;
RETURN NEXT result;
END IF;
END LOOP;
END IF;
RETURN;
END;$BODY$
LANGUAGE 'plpgsql' VOLATILE;
ALTER FUNCTION tinytm_get_fuzzy_matches(character varying, character varying, character varying, character varying, character varying) OWNER TO "admin";
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I understand your point of an export. But maybe that would be possible with a direct access to the SQL tables?
It really would be no problem to insert your changes into the code. However, my "female intuition" (a difficult to explain collection of experiences during my last 20 years in IT...) tells me that this option will create problems in the future... Imagine some implementation not catching the case of an empty string, and everything will hang then on a slow network connection or something similar...
It would be cleaner to create an explicit API call for that purpose, actually.
Cheers!
Frank
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I guess an explicit API call would be the best solution. You could just added to the stored procedures and then it can easily be replicated by the web services interface.
Cheers,
Martin
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi there,
I have slightly modified the fuzzy machting function tinytm_get_fuzzy_matches so that it will retrieve all segments for the respective language combination, if an empty string is pass in as the source text. The source text for the PostgreSQL function is below. I thought this could be useful, to have something like a primitive TM export function.
Cheers,
Martin
-- Function: tinytm_get_fuzzy_matches(character varying, character varying, character varying, character varying, character varying)
-- DROP FUNCTION tinytm_get_fuzzy_matches(character varying, character varying, character varying, character varying, character varying);
CREATE OR REPLACE FUNCTION tinytm_get_fuzzy_matches(character varying, character varying, character varying, character varying, character varying)
RETURNS SETOF tinytm_fuzzy_search_result AS
$BODY$
DECLARE
p_source_lang alias for $1;
p_target_lang alias for $2;
p_source_text alias for $3;
p_tag_string alias for $4;
p_penalty_string alias for $5;
result tinytm_fuzzy_search_result%ROWTYPE;
row RECORD;
BEGIN
IF p_source_text = '' THEN
FOR row IN
SELECT
tinytm_levenshtein(ts.source_text,ts.source_text) + 100 as levenshtein,
ts.source_text,
ts.target_text
FROM
tinytm_segments ts
LEFT OUTER JOIN tinytm_languages sl ON (ts.source_lang_id = sl.language_id)
LEFT OUTER JOIN tinytm_languages tl ON (ts.target_lang_id = tl.language_id)
WHERE
source_lang_id in (select * from tinytm_lang_ids_from_lang(p_source_lang)) and
target_lang_id in (select * from tinytm_lang_ids_from_lang(p_target_lang))
LOOP
result.score := row.levenshtein;
result.source_text = row.source_text;
result.target_text = row.target_text;
RETURN NEXT result;
END LOOP;
ELSE
FOR row IN
SELECT
100 * (source_len - levenshtein) / source_len as score,
ts.source_text,
ts.target_text,
source_len
FROM
(select distinct
tinytm_levenshtein(trim(p_source_text), ts.source_text) * 1.0 as levenshtein,
length(p_source_text) * 1.0 as source_len,
sl.language as source_lang,
tl.language as target_lang,
source_text,
target_text
from
tinytm_segments ts
LEFT OUTER JOIN tinytm_languages sl ON (ts.source_lang_id = sl.language_id)
LEFT OUTER JOIN tinytm_languages tl ON (ts.target_lang_id = tl.language_id)
where
source_lang_id in (select * from tinytm_lang_ids_from_lang(p_source_lang)) and
target_lang_id in (select * from tinytm_lang_ids_from_lang(p_target_lang))
order by
levenshtein
LIMIT 20
) ts
where
source_len > 5
LOOP
IF row.score > 50.0 THEN
result.score := row.score::numeric(6,1);
result.source_text = row.source_text;
result.target_text = row.target_text;
RETURN NEXT result;
END IF;
END LOOP;
END IF;
RETURN;
END;$BODY$
LANGUAGE 'plpgsql' VOLATILE;
ALTER FUNCTION tinytm_get_fuzzy_matches(character varying, character varying, character varying, character varying, character varying) OWNER TO "admin";
Hi Martin,
I'd be concerned with performance if there are literally millions of segments in the DB. What do you think?
Cheers!
Frank
That's true. There could be a parameter to limit the number of segments returned. But it would be useful to have this option. It's like a TM export.
Cheers,
Martin
Hi Martin,
I understand your point of an export. But maybe that would be possible with a direct access to the SQL tables?
It really would be no problem to insert your changes into the code. However, my "female intuition" (a difficult to explain collection of experiences during my last 20 years in IT...) tells me that this option will create problems in the future... Imagine some implementation not catching the case of an empty string, and everything will hang then on a slow network connection or something similar...
It would be cleaner to create an explicit API call for that purpose, actually.
Cheers!
Frank
Hi Frank,
I guess an explicit API call would be the best solution. You could just added to the stored procedures and then it can easily be replicated by the web services interface.
Cheers,
Martin