[Assorted-commits] SF.net SVN: assorted: [191] movie-lookup
Brought to you by:
yangzhang
From: <yan...@us...> - 2007-12-24 01:39:37
|
Revision: 191 http://assorted.svn.sourceforge.net/assorted/?rev=191&view=rev Author: yangzhang Date: 2007-12-23 17:39:35 -0800 (Sun, 23 Dec 2007) Log Message: ----------- added movie-lookup Added Paths: ----------- movie-lookup/ movie-lookup/trunk/ movie-lookup/trunk/README movie-lookup/trunk/src/ movie-lookup/trunk/src/MovieLookup.hs movie-lookup/trunk/src/Setup.hs movie-lookup/trunk/src/hbo.bash movie-lookup/trunk/src/lookup.bash movie-lookup/trunk/src/movie-lookup.cabal Added: movie-lookup/trunk/README =================================================================== --- movie-lookup/trunk/README (rev 0) +++ movie-lookup/trunk/README 2007-12-24 01:39:35 UTC (rev 191) @@ -0,0 +1,7 @@ +- Build the `movie-lookup` binary. + +- Go to HBO's online schedule and select whatever options yield the largest + amount of information in a printable view. Copy the plain-text body of this + page into `hbo.txt`. + +- Run `hbo.bash`, which uses `lookup.bash`, which uses `movie-lookup`. Added: movie-lookup/trunk/src/MovieLookup.hs =================================================================== --- movie-lookup/trunk/src/MovieLookup.hs (rev 0) +++ movie-lookup/trunk/src/MovieLookup.hs 2007-12-24 01:39:35 UTC (rev 191) @@ -0,0 +1,66 @@ +module Main where + +import Control.Arrow +import Control.Monad +import Data.ByteString.Char8 (ByteString) +import qualified Data.ByteString.Char8 as BS +import Data.Char +import Data.List +import Data.String +import Debug.Trace +import System.Environment +import System.FilePath +import System.Path.Glob +import Text.Regex + +t x = trace (show x) x + +-- TODO Commons + +fork :: [Either a b] -> ([a],[b]) +fork = foldr f ([],[]) + where f e (as,bs) = case e of + Left a -> (a:as, bs) + Right b -> ( as, b:bs) + +simplify = BS.map toLower >>> + BS.words >>> BS.unwords >>> + BS.unpack >>> strip >>> BS.pack + +simplifyTitle s = + let simplified = BS.unpack $ simplify s + rgx = mkRegex "^the |, the$" + dearticled = subRegex rgx simplified "" + in dearticled + +subRegexBS regex target replace = BS.pack $ subRegex regex (BS.unpack target) replace + +main = do + -- load the indexes; their union is the catalog + (dir:_) <- getArgs + indexes <- glob $ dir </> "*" + catalog <- fmap (BS.lines . BS.concat) $ mapM BS.readFile indexes + + -- process queries + queries <- fmap BS.lines BS.getContents + let (hits, misses) = query catalog queries + + -- output + putStrLn "hits" + putStrLn "----" + forM_ hits $ \(title, url) -> do + putStrLn $ baseUrl ++ BS.unpack url + putStrLn "" + putStrLn "misses" + putStrLn "------" + mapM_ BS.putStrLn misses + +baseUrl = "http://www.rottentomatoes.com" + +query :: [ByteString] -> [ByteString] -> ([(ByteString, ByteString)],[ByteString]) +query catalog = map queryOne >>> fork + where queryOne q = case find (cmp q) catalog of + Just entry -> Left (q, subRegexBS rgx entry "\\1") + Nothing -> Right q + cmp q entry = simplifyTitle q == simplifyTitle (subRegexBS rgx entry "\\2") + rgx = mkRegex ".*href=\"([^\"]*)\">([^<]*)<.*" Added: movie-lookup/trunk/src/Setup.hs =================================================================== --- movie-lookup/trunk/src/Setup.hs (rev 0) +++ movie-lookup/trunk/src/Setup.hs 2007-12-24 01:39:35 UTC (rev 191) @@ -0,0 +1,4 @@ +#!/usr/bin/env runhaskell + +import Distribution.Simple +main = defaultMain Property changes on: movie-lookup/trunk/src/Setup.hs ___________________________________________________________________ Name: svn:executable + * Added: movie-lookup/trunk/src/hbo.bash =================================================================== --- movie-lookup/trunk/src/hbo.bash (rev 0) +++ movie-lookup/trunk/src/hbo.bash 2007-12-24 01:39:35 UTC (rev 191) @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +set -o errexit -o nounset + +cat hbo.txt | +sed 's/[^\.]*\.\.\.\.// ; s/\.\..*//' | +sort -u | +./lookup.bash Property changes on: movie-lookup/trunk/src/hbo.bash ___________________________________________________________________ Name: svn:executable + * Added: movie-lookup/trunk/src/lookup.bash =================================================================== --- movie-lookup/trunk/src/lookup.bash (rev 0) +++ movie-lookup/trunk/src/lookup.bash 2007-12-24 01:39:35 UTC (rev 191) @@ -0,0 +1,35 @@ +#!/usr/bin/env bash + +set -o errexit -o nounset + +# get (cache) all the index pages (300-400 of these!) + +tmp="${TEMP:-/tmp}" +movietmp="$tmp/movies.txt" +if [ ! -d "$tmp/www.rottentomatoes.com" ] +then wget -P "$tmp" -r -l 1 http://www.rottentomatoes.com/features/stats/index.php +fi + +# search for the movies (read from stdin) + +if [ ! -f "$movietmp" ] +then dist/build/movie-lookup/movie-lookup "$tmp/www.rottentomatoes.com/features/stats/index" > "$movietmp" +fi + +# download the movies + +grep ^http: "$movietmp" | +wget -q -i- -x -P "$tmp" -nc || true + +# extract scores + +grep 'critics_tomatometer_score_txt_percentage' "$tmp"/www.rottentomatoes.com/m/*/index.html | +grep -v '<div' | +sed 's/<span.*//' + +# tomatometer looks like: +# <div id="critics_tomatometer_score_txt"> +# 13<span id="critics_tomatometer_score_txt_percentage">%</span> +# </div> + +# vim:et:sw=2:ts=2 Property changes on: movie-lookup/trunk/src/lookup.bash ___________________________________________________________________ Name: svn:executable + * Added: movie-lookup/trunk/src/movie-lookup.cabal =================================================================== --- movie-lookup/trunk/src/movie-lookup.cabal (rev 0) +++ movie-lookup/trunk/src/movie-lookup.cabal 2007-12-24 01:39:35 UTC (rev 191) @@ -0,0 +1,18 @@ +Name: movie-lookup +Version: 0.0 +Description: Retrieve ratings from RottenTomatoes.com. +License: GPL +License-file: LICENSE +Author: Yang Zhang +Maintainer: gmail:yaaang +Build-Depends: base, + FilePath >= 0.11, + MissingH >= 0.18, + regex-base >= 0.71, + regex-compat >= 0.71, + regex-posix >= 0.71, + unix >= 1.0 + +Executable: movie-lookup +Main-is: MovieLookup.hs +ghc-options: -O This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |