2007-08-23 18:43:39 UTC
This is a very useful script but there is a problem if the site you are trying to ping with a trackback doesn't list the RDF information. In that case, the auto_discovery function won't find anything to ping. The script should probably be modified to handle these cases. The one I have seen most involves a trackback URI of something like ...trackback.php... so we can just look for that. My mod looks like:
<code>
function auto_discovery($text)
{
global $uri_array;
// Get a list of UNIQUE links from text...
// ---------------------------------------
// RegExp to look for (0=>link, 4=>host in 'replace')
$reg_exp = "/(http)+(s)?:(\\/\\/)((\\w|\\.)+)(\\/)?(\\S+)?/i";
// Make sure each link ends with [space]
$text = eregi_replace("www.", "
http://www.", $text);
$text = eregi_replace("
http://http://", "
http://", $text);
$text = eregi_replace("\"", " \"", $text);
$text = eregi_replace("'", " '", $text);
$text = eregi_replace(">", " >", $text);
// Create an array with unique links
$uri_array = array();
if (preg_match_all($reg_exp, strip_tags($text, "<a>"), $array, PREG_PATTERN_ORDER)) {
foreach($array[0] as $key => $link) {
foreach((array(",", ".", ":", ";")) as $t_key => $t_value) {
$link = trim($link, $t_value);
}
$uri_array[] = ($link);
}
$uri_array = array_unique($uri_array);
}
// Get the trackback URIs from those links...
// ------------------------------------------
// Loop through the URIs array and extract RDF segments
$rdf_array = array(); // <- holds list of RDF segments
foreach($uri_array as $key => $link) {
//echo $link . "<br />";
if ($link_content = @implode("", @file($link))) {
preg_match_all('/(<rdf:RDF.*?<\/rdf:RDF>)/sm', $link_content, $link_rdf, PREG_SET_ORDER);
for ($i = 0; $i < count($link_rdf); $i++) {
if (preg_match('|dc:identifier="' . preg_quote($link) . '"|ms', $link_rdf[$i][1])) {
$rdf_array[] = trim($link_rdf[$i][1]);
}
}
// ------------------------------------------
// --- sometimes trackbacks given w/out RDF
// --- info so let's look for them...JB (08-23-07)
//$urlpattern = '/http:([^ ]+)trackback.php([^<|^ ]+)/i';
$urlpattern = '/http:([^ ]+)trackback.php([^<|^ ]+)/i';
preg_match($urlpattern, $link_content, $link_trackback);
//print_r($link_trackback); echo "<hr />";
if (stristr($link_trackback[0],"trackback")) $no_rdf_array[] = trim($link_trackback[0]);
}
}
// Loop through the RDFs array and extract trackback URIs
$tb_array = array(); // <- holds list of trackback URIs
if (!empty($rdf_array)) {
for ($i = 0; $i < count($rdf_array); $i++) {
if (preg_match('/trackback:ping="([^"]+)"/', $rdf_array[$i], $array)) {
$tb_array[] = trim($array[1]);
}
}
}
// ------------------------------------------
// --- if we didn't find any RDF info but did
// --- find a trackback URI via a preg_match
// --- then let's use it instead...
if (empty($rdf_array)) {
$tb_array = $no_rdf_array;
}
// Return Trackbacks
return $tb_array;
}
</code>
I am sure this could be improved upon greatly.
JB