I have Youtube videos embeded on a few pages and I am looking for a way to index the videos. I did a little digging and found OSS's Youtube filter.
I updated the StandardAnalyzer schema to include the Youtube filter for both title and descriptions and then recrawled the page with the video embeded - which didn't work. I guess I expected this result since the documentation said to crawl on Youtube.
So I tried a manual crawl of the video itself using the direct Youtube url (e.g. https://www.youtube.com/watch?v=_hnUMBLH-aw&list=UUGVeGrwqwk20FQe-FjriZbQ). I disabled robots.txt in case that was causing any issues and added the exact URL into the inclusion rules. I got the following results:
Fetch status: Fetched
Parser status: Parsed non-canonical
Index status: Not indexed
Response Code: 200
Content length: -1
I am using OSS v1.5.2.
Which parser would I need to modify/add in order for the videos to be indexed from Youtube?
Is there a way for oss to get the youtube description and titles from the embed so that information is within that content page?
Thanks in advance!
I changed the HTML parser parameter "Ignore non canonical" to False and it indexed the youtube video. But the content shows "Upload Sign in, Search, Loading... This video is unavailable" which isn't really ideal... Is there something I am missing?
OpenSerchServer uses the Youtube API in which you can receive the title and description.
I suggest you to create a new Analyzer for Title and description and link to the field that you need.
Please find the screenshot.
Here are some steps to implement a way to work with Youtube extraction:
You can then index some pages containing links to youtube's videos. Videos's titles will be extracted and indexed in new field.
As you can see in sf_youtube_search.png there are two titles in one document, because the page I indexed contained two links to Youtube.
If one search directly for a video title, document is found as well, since it has been properly tokenized by the StandardTokenizer set on the field (sf_youtube_search_chip_and_dale.png)
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.