Todd,
Thanks for the code snip. I have already written this block. So I think I'll
stick with it... Basically I use this function where a couple of strcmp's
used to be.
This code seems to help, but the site seems to be still hanging in getting the
playlist. I think it just might be slow. So I am going to commit this to CVS
and if you can let me know if it works that would be good. There are some
other changes that need be made other than this so get the CVS.
int URLcmp(char *url1, char *url2)
{
char *buffer1, *buffer2;
char *tmp;
if (DEBUG) printf("url1 %s\nurl2 %s\n",url1,url2);
if (strcmp(url1,url2) == 0) return 0;
// replace %20 with spaces in both strings
buffer1 = strdup(url1);
buffer2 = strdup(url2);
while (tmp = strstr(buffer1, "%20")) {
if (tmp != NULL) {
*tmp = ' ';
tmp++;
*tmp = '\0';
strcat(buffer1, tmp + 2);
}
}
while (tmp = strstr(buffer2, "%20")) {
if (tmp != NULL) {
*tmp = ' ';
tmp++;
*tmp = '\0';
strcat(buffer2, tmp + 2);
}
}
if (DEBUG) printf("buffer1 %s\nbuffer2 %s\n",buffer1,buffer2);
if (strcmp(buffer1,buffer2) == 0){
free(buffer1);
free(buffer2);
return 0;
} else {
free(buffer1);
free(buffer2);
return -1;
}
}
On Thursday 04 December 2003 8:06 am, todd wrote:
> hi,
> this code might help i use it in a web crawler i wrote to normalize
> urls: The second function is the one to call it does
> the canonicalizing of the URL string. it has some c++ but should be
> easy enough to replace with equiv malloc/fre
>
> hope this helps!! :)
> -todd
> ///////////////////////////////////////////////////////////
> static char* normalize_url( const char *url, size_t *len )
> {
> // check for # at the end of the URL
> char *buf = (char*)malloc(sizeof(char)* ((*len)+1) );
> char tmp[3] = { '\0', '\0', '\0' };
> size_t i, j;
> for( i = 0, j = 0; i < (*len); ++i, ++j ){
> if( url[i] == '#' || isspace(url[i]) )
> break;
> if( url[i] == '%' && i + 1 < (*len) && i+2 <= (*len) ){
> // do a transformation
> tmp[0] = url[++i];
> tmp[1] = url[++i];
>
> int val = strtol( tmp, 0, 16 ); // read in the hex values
> if( !isspace( val ) ){
> // replace %val with ascii code
> buf[j] = toascii( val );
> }
> else{
> // we want spaces to stay encoded
> --i;
> --i;
> buf[j] = url[i]; // copy the % and continue
> }
> }
> else{
> buf[j] = tolower(url[i]);
> }
> }
> if( j < (*len) ){
> buf = (char*)realloc(buf,sizeof(char)*(j+1) ); // save memory
> }
> if( j > 1 && buf[j-1] == '/' ){
> --j; // should we realloc to save 1 byte?
> }
>
> buf[j] = '\0';
> *len = j;
> return buf;
> }
> ///////////////////////////////////////////////////////////
> char *canonicalize_url( const char *url, size_t &url_len, const char
> *host_name
> {
> // TODO: clean logic up
> bool allow = true;
> // check for http: or another protocol if its not http: then we drop it
> if( !g_strstr_len( url, url_len >= 7 ? 7 : url_len, "http://" ) ){
> // if we didn't find this then if it doesn't have just // we drop it
> allow = false;
> }
> // ensure the URL is in absolute form
> std::string link("http:");
> if( g_strstr_len( url, url_len >= 7 ? 7 : url_len, "//" ) ){ // this
> means its
> // we still need to check for nasty things like // instead of http://
> if( url[0] == '/' && url[1] == '/' ){
> link.append( url, url_len );
> allow = true;
> }
> else{
> link = std::string( url, url_len );
> }
> // adns would be used here
> }
> else{ // we need to help it out its local
> link += "//";
> link += host_name;
> link += (url[0] == '/' ? "" : "/");
> link.append( url, url_len );
> }
> url_len = link.length();
> if( !allow )
> return NULL;
> return normalize_url( link.c_str(), &url_len );
> }
> ///////////////////////////////////////////////////////////
>
> Kevin DeKorte wrote:
> >Paulo,
> >
> >Ok I see sort of what is wrong. It is going to take a little work to fix
> > it. The spaces in the URL are getting translated to %20s, which is
> > correct. But I need to make changes to the code to allow for that.
> >
> >Kevin
> >
> >On Wednesday 03 December 2003 10:34 pm, Paulo Moura Guedes wrote:
> >>Good job!
> >>This project really makes a diference...
> >>
> >>I only found one site where the plug-in doesn't work.
> >>Here is the URL: http://www.cotonete.iol.pt/
> >>In the left section look at "CANAIS" and choose "ANOS 60" for example.
> >>I get a message "getting playlist" and nothing happens.
> >>
> >>I have mplayer-1.0pre2-1.i386.rpm and mplayerplug-in-1.0-1.fc1.i386.rpm
> >>(FC1).
> >>
> >>I hope this helps...
> >>
> >>Regards,
> >>Paulo
> >
> >-------------------------------------------------------
> >This SF.net email is sponsored by: IBM Linux Tutorials.
> >Become an expert in LINUX or just sharpen your skills. Sign up for IBM's
> >Free Linux Tutorials. Learn everything from the bash shell to sys admin.
> >Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
> >_______________________________________________
> >Mplayerplug-in-devel mailing list
> >Mplayerplug-in-devel@...
> >https://lists.sourceforge.net/lists/listinfo/mplayerplug-in-devel
>
> -------------------------------------------------------
> This SF.net email is sponsored by: IBM Linux Tutorials.
> Become an expert in LINUX or just sharpen your skills. Sign up for IBM's
> Free Linux Tutorials. Learn everything from the bash shell to sys admin.
> Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
> _______________________________________________
> Mplayerplug-in-devel mailing list
> Mplayerplug-in-devel@...
> https://lists.sourceforge.net/lists/listinfo/mplayerplug-in-devel
|