This package, provides routines and a utility to prepare and automate upload of videostreams from EuroPython conferences to the accounts, owned by the EuroPython Society (EPS) to archive.org and YouTube (the video sites).
During a EuroPython conference recordings are made from talks, keynotes and other events. This results in video streams that might need additional handling (editing, leader insertion, being made conforming to the Code-of-Conduct, cutting, conversion). Eventually this should result in multiple video files, one per event to be uploaded to YouTube and archive.org
The event scheduling software has metadata that should be associated with the above mentioned video files. The metadata determines title and other artefacts that can be associated with an uploaded stream on the video site.
In 2014 the video were originally uploaded to a non-EuroPython owned account on YouTube only, with metadata that was incompatible with how previous EuroPython steams (2013 and earlier) were uploaded. Re-uploading these streams to the video sites would have resulted in quality loss, but an FTP site with the higher quality original streams was found, combined with metadata scraped from the website and uploaded to the video sites. This software was not published and for a large part specific to the special circumstances for 2014.
The video streams for the 2015 process were delivered as the 430Gb content of two NTFS formatted hard-drives. The videos were (mostly) split and stored in a directory hierarchy depicting room, date, am/pm, and talk slot (1-4). Luis J. Salvatierra provided the USB 3 disc from the company hired to record the conference talks. The discs also contained some non-relevant material (such as Western Digital provided backup software). Unfortunately the discs were not exact copies, on one them two directory names were changed (in a failed attempt to correct the spelling).
The metadata was delivered by Alexandre M. Savio in a single JSON file. This was done after the conference was over to take into account rescheduled, cancelled or otherwise changed info. The information in the JSON file consists of a toplevel event type dictionary with keys: Keynotes, Talks, etc. and values being a mapping from event number to event specific metadata the start of the file looked like:
{
"Keynotes": {
"364": {
"track_title": "Google Room",
"speakers": "Carrie Anne Philbin",
"abstracts": [
"The problem of introducing children to programming ....."
"",
""
],
"tags": [
"python"
],
"duration": 60,
"title": "Keynote: Designed for Education: A Python Solution",
"timerange": "2015-07-23 09:30:00, 2015-07-23 10:30:00",
"have_tickets": [
true
],
"id": 364,
"emails": "*******@raspberrypi.org"
},
"365": {
.
.
The processing has as target the combination of the relevant part of a video stream input (or inputs in case a talk was split over multiple files) with appropriate metadata and then upload both to the video sites.
For that there are three main inputs that need to be combined:
Not all of this data is avaialble in its final form from the start, and the in-out point info cannot be done until the other two are available and have been combined.
Given the number of videos, this process should be automated as much as possible and manual intervention should be restricted to preparatory steps, so that lengthy processes, like the actual uploads, can be done automatically. The processing should be able to deal with that and add newly available info as it becomes available. In particular manualy added/changedq data should never be lost/overwritten
While the conference is going on there is only partial info available, future talks have no streams and their metadata might still change (reschedule, cancel)
If the live streams stay online for a long time, people will start to link to these files on YouTube. Bringing individual talk videos with metadata online and unlinking/removing the live streams enhances the chances that people link to the final version. This is especially important for any comments made on the live stream, which cannot be transferred to the individual talk uploads.
YouTube is currently arguably the most popular site for watching videos. Videos uploaded there will be scaled and processed which makes it impossible to download the original quality. YouTube is geared towards watching the videos online, not to downloading.
Archive.org has a long term preservation mission, as well as handling the videos in their original quality (as long as the file is 10Gb or less). It also supports downloading the videos, in addition to watching them on the site.
The whole process of aligning metadata with videos, selecting, and uploading, is supported by the various subcommands of the ueps utility, short for uploadeuropythonstreams. The actual steps to bring video and metadata together as an uploaded whole consists of:
This is a separate utility to upload single files to archive.org. Originally this was part of ueps, but the packages used by the internetarchive package that is used for uploading have dependencies on specific versions of other packages (such as six) that clash with, e.g. the ones used by youtube-upload. Therefore this was split of into its own utility to be installed into a separate virtualenv and run from there.
Apart from the home-brew utilities ueps and archive-upload, the ueps utility relies on youtube-upload, mediainfo and avconv (a special version including AAC support) to be available.
I also used youtube-dl to get live streams from YouTube for those events for which no data was available on disc.
The following assumes installation of virtualenvs under some path (~/bin) with explicit paths to the virtualenvs set up for the utilities.
The ueps utility should be setup in a python2.7 virtual environment on Linux using:
virtualenv /home/venv/ueps /home/venv/ueps/bin/pip install uploadeuropythonstreams
The source is maintained on bitbucket
The original name of the utility was upeuros (for UPloadEUROpythonStream), but I was not sure this would collide with the CoC (try to pronounce it...).
I have a file ~/bin/ueps, with execute permission bits set, that looks like:
#!/home/venv/ueps/bin/python from ruamel.uploadeuropythonstreams import main main()
The archive-upload utility should be setup in a python2.7 virtual environment on Linux using:
virtualenv /home/venv/epau /home/venv/epau/bin/pip install europythonarchiveupload
The source is maintained on bitbucket
The original name of the utility was upeuros (for UPloadEUROpythonStream), but I was not sure this would collide with the CoC (try to pronounce it...).
I have a file ~/bin/ueps, with execute permission bits set, that looks like:
#!/home/venv/epau/bin/python from ruamel.europythonarchiveupload import main main()
Compiling and installing the latest version might not be necessary, but it is relatively straightforward. I used the information from the download page to download the commandline (CLI) version 0.7.76
Extract the file, change to the newly created directory and run:
./CLI_Compile.sh
afterwards install mediainfo using the instructions prompted by the script.
# yasm # http://yasm.tortall.net/Download.html mkdir yasm cd yasm wget http://www.tortall.net/projects/yasm/releases/yasm-1.2.0.tar.gz tar xvf yasm*.gz cd yasm* ./configure && make && sudo make install cd ../..
# from: http://stackoverflow.com/a/11236763/1307905
# x264 git clone git://git.videolan.org/x264.git x264 cd x264 mkdir avconv-source ./configure --enable-static make sudo make install cd ..
http://wiki.hydrogenaud.io/index.php?title=Fraunhofer_FDK_AAC#Libav.2Favconv
git clone git://github.com/mstorsjo/fdk-aac.git fdk-aac cd fdk-aac ./autogen.sh ./configure --prefix=/usr --disable-static make sudo make install cd ..
git clone git://git.libav.org/libav.git avconv cd avconv ./configure --enable-libx264 --enable-libfdk-aac --enable-nonfree --enable-gpl make make install
avconv -codecs | grep aac avconv -codecs | grep 26
Both utilities use the configuration directory path ~/.config/uploadeuropythonstreams, and a YAML config file name ueps.yaml is automatically created when you run:
ueps config --edit
Edit this file as specified below and adjust values according to the EOL comments.
When specifying directories/files in this YAML file, and if the name starts with /, the path is assumed to be absolute, otherwise it is normally relative to the base or other parent directory.
Example for 2015, edit with ueps edit --config:
global:
verbose: 0
year: 2015
base_dir: /data0/DATA/EuroPython
# any dir/file if not starting with "/" is relative to base_dir
original: original # original video dir
map-file: map_video.yaml
video_dir: video # here we'll have A1, A2, A3, B1, B2
metadata-dir: metadata # directory for the individual files
flat-yaml: flatten.yaml # used to generate individual metadata files
metadata:
json: talk_abstracts.json # file from Alexandre
video:
splitpoints: video_split_points.yaml
2015:
location: Bilbao, Euskadi, Spain # this is used during uploading
coordinates: # for YouTube
- 43.26679
- 2.94361
video_ignore: # files/dirs under original to ignore
- autorun/.*
- autorun.inf$
- ReadMe.pdf$
- \$RECYCLE.BIN/.*
- System Volume Information/.*
- WD Smartware Pro Free Trial/.*
# mappings from regex to new path/filename, processed until one matches
video_map_path:
# [^\.] at the beginning of the filename to filter out hidden files
# that are probably rsync residues
.*ANYWEAR/.* .* (?P<day>\d*) .*/(?P<ampm>[A|P]M)/[^\.].* (?P<num>\d)( \(Output 1\))?\.(?P<ext>\w{3,4}):
A3/2015-07-{day}/{ampm}_{num}.{ext}
.*GOOGLE/(?P<day>\d*) .*(?P<ampm>[A|P]M)/[^\.].* (?P<num>\d)( ?\(PGM\))\.(?P<ext>\w{3,4}):
A1/2015-07-{day}/{ampm}_{num}.{ext}
.* A2/\w* \d (?P<day>\d*) .*/(.* )?(?P<ampm>[A|P]M)/[^\.].*\.(?P<num>\d)\.(?P<ext>\w{3,4}):
A2/2015-07-{day}/{ampm}_{num}.{ext}
.* A2/\w* \d (?P<day>\d*) .*/(?P<ampm>[A|P]M)/[^\.].*\.(?P<ext>\w{3,4}):
A2/2015-07-{day}/{ampm}.{ext}
.* A2/\w* \d (?P<day>\d*) .*/.* (?P<ampm>[A|P]M)/[^\.].*\.(?P<ext>\w{3,4}):
A2/2015-07-{day}/{ampm}.{ext}
.*BARRIA 1/\w* \d (?P<day>\d*) .*/(?P<ampm>[A|P]M)/[^\.].* (?P<num>\d)( ?\(Output 1\))?\.(?P<ext>\w{3,4}):
B1/2015-07-{day}/{ampm}_{num}.{ext}
.*BARRIA 2/\w* \d (?P<day>\d*) .*/(?P<ampm>[A|P]M)/[^\.].* (?P<num>\d)( \(Output 1\))?\.(?P<ext>\w{3,4}):
B2/2015-07-{day}/{ampm}_{num}.{ext}
If you name things consistently you should be able to reuse parts of the file from year to year. But in general this is a once run program that is adapted for next year without necessary backwards compatibility (unless we need to upload to another video site at some point, but then checking out old version might be good enough).
The use of video_map_path is described as part of the steps to massage the data.
The above is the configuration on the server at my home. On the server used to upload to archive.org, filled using rsync by Luis, the original directory is absolute (/home/luis/video) and not under .../EuroPython/2015
In order to be able to upload to the account user-names and passwords for the account need to be available. YouTube upload also needs secrets which were partly pre-generated and partly generated on first run (with --auth-browser). The relevant files are stored next to ueps.yaml:
The "normal" layout is a base directory with a subdirectory for each year and that "year" directory holding various year specific subdirs and data files:
/data0/DATA/EuroPython/
+-- 2014
| +-- metadata
| +-- videos
| | +-- ...
| | ...
| ...
`-- 2015
+-- flatten.yaml
+-- map_video.yaml
+-- metadata
| +-- 2015_07_20_AM_A1_0_999_Welcome.yaml
| ...
| `-- 2015_07_24_PM_B2_5_053_Speeding_up_search_with_locality_sensitive_hashing.yaml
+-- original
| +-- autorun
| | `-- wdlogo.ico
| +-- autorun.inf
| +-- ReadMe.pdf
| +-- $RECYCLE.BIN
| | ...
| +-- ROOM A2
| | +-- DIA 1 20 Julio
| | | ...
| | +-- DIA 2 21 Julio
| | | ...
| | +-- DIA 3 22 Julio
| | | +-- AM
| | | | `-- Live Streaming Room A2 2015-07-22 AM.f4v
| | | `-- PM
| | | `-- Live streaming from room A2 2015-07-22 PM.f4v
| | +-- DIA 4 23 Julio
| | | ...
| | `-- DIA 5 24 Julio
| | ...
| +-- ROOM BARRIA 1
| | +-- DIA 1 20 Julio
| | | +-- AM
| | | | +-- BARRIA 1 - ponencia 1(Output 1).mp4
| | | | +-- BARRIA 1 - ponencia 2 (Output 1).mp4
| | | | `-- BARRIA 1 - ponencia 3(Output 1).mp4
| | | `-- PM
| | | +-- BARRIA 1 - ponencia 4(Output 1).mp4
| | | +-- BARRIA 1 - ponencia 5 (Output 1).mp4
| | | +-- BARRIA 1 - ponencia 6 (Output 1).mp4
| | | `-- BARRIA 1 - ponencia 7 (Output 1).mp4
| | +-- DIA 2 21 Julio
| | ...
| +-- ROOM BARRIA 2
| | +-- DIA 1 20 Julio
| | | ...
| | +-- DIA 2 21 Julio
| | | ...
| | +-- DIA 3 22 Julio
| | | +-- AM
| | | | +-- Live Streaming from Barria 2 2015-07-22 ponencia 1 (Output 1).mp4
| | | | +-- Live Streaming from Barria 2 2015-07-22 ponencia 2 (Output 1).mp4
| | | | `-- Live Streaming from Barria 2 2015-07-22 ponencia 3 (Output 1).mp4
| | | `-- PM
| | | +-- Live Streaming from Barria 2 2015-07-22 ponencia 4 (Output 1).mp4
| | | +-- Live Streaming from Barria 2 2015-07-22 ponencia 5 (Output 1).mp4
| | | +-- Live Streaming from Barria 2 2015-07-22 ponencia 6 (Output 1).mp4
| | | `-- Live Streaming from Barria 2 2015-07-22 ponencia 7 (Output 1).mp4
| | ...
| +-- ROOM GOOGLE
| | +-- 20 Julio AM
| | | ...
| | +-- 20 Julio PM
| | | +-- Directos Euskalduna101 0(PGM).mov
| | | +-- Directos Euskalduna101 1 (PGM).mov
| | | ...
| | +-- 21 Julio PM
| | | ...
| | +-- 21 Jullio AM
| | | +-- Europython 21 AM 1 (PGM).mp4
| | | +-- Europython 21 AM 2 (PGM).mp4
| | | +-- Europython 21 AM 3 (PGM).mp4
| | | `-- Europython 21 AM 4 (PGM).mp4
| | +-- 22 Julio AM
| | | +-- Europython 22 AM 1 (PGM).mp4
| | | +-- Europython 22 AM 2 (PGM).mp4
| | | +-- Europython 22 AM 3 (PGM).mp4
| | | `-- Europython 22 AM 4 (PGM).mp4
| | ...
| +-- ROOM PHYTON ANYWEAR
| | +-- DIA 1 20 Julio
| | | ...
| | +-- DIA 2 21 Julio
| | | +-- AM
| | | | +-- Sala Python 1 (Output 1).mp4
| | | | +-- Sala Python 2 (Output 1).mp4
| | | | `-- Sala Python 3 (Output 1).mp4
| | | `-- PM
| | | +-- Sala Python 4 (Output 1).mp4
| | | +-- Sala Python 5 (Output 1).mp4
| | | `-- Sala Python 6 (Output 1).mp4
| | ...
| +-- System Volume Information
| | +-- IndexerVolumeGuid
| | +-- MountPointManagerRemoteDatabase
| | `-- _restore{10BF4F30-BD90-46CF-AFA6-76DD512DBC6C}
| | `-- RP532
| | +-- change.log
| | `-- S0083239.Acl
| `-- WD Smartware Pro Free Trial
| +-- WDSmartWareProFreeTrial.exe
| `-- WDSmartWareProFreeTrial.tmx
+-- talk_abstracts.json
+-- video
| +-- A1
| | +-- 2015-07-20
| | | ...
| | +-- 2015-07-21
| | | +-- AM_1.mp4 -> ../../../original/ROOM GOOGLE/21 Jullio AM/Europython 21 AM 1 (PGM).mp4
| | | +-- AM_2.mp4 -> ../../../original/ROOM GOOGLE/21 Jullio AM/Europython 21 AM 2 (PGM).mp4
| | | +-- AM_3.mp4 -> ../../../original/ROOM GOOGLE/21 Jullio AM/Europython 21 AM 3 (PGM).mp4
| | | +-- AM_4.mp4 -> ../../../original/ROOM GOOGLE/21 Jullio AM/Europython 21 AM 4 (PGM).mp4
| | | +-- PM_5.mp4 -> ../../../original/ROOM GOOGLE/21 Julio PM/Europython 21 PM 5 (PGM).mp4
| | | +-- PM_6.mp4 -> ../../../original/ROOM GOOGLE/21 Julio PM/Europython 21 PM 6 (PGM).mp4
| | | +-- PM_7.mp4 -> ../../../original/ROOM GOOGLE/21 Julio PM/Europython 21 PM 7 (PGM).mp4
| | | `-- PM_8.mp4 -> ../../../original/ROOM GOOGLE/21 Julio PM/Europython 21 PM 8 (PGM).mp4
| | +-- 2015-07-22
| | | +-- AM_1.mp4 -> ../../../original/ROOM GOOGLE/22 Julio AM/Europython 22 AM 1 (PGM).mp4
| | | +-- AM_2.mp4 -> ../../../original/ROOM GOOGLE/22 Julio AM/Europython 22 AM 2 (PGM).mp4
| | | +-- AM_3.mp4 -> ../../../original/ROOM GOOGLE/22 Julio AM/Europython 22 AM 3 (PGM).mp4
| | | ...
| | ...
| +-- A2
| | +-- 2015-07-20
| | | ...
| | +-- 2015-07-21
| | | +-- AM_0.f4v -> ../../../original/ROOM A2/DIA 2 21 Julio/Livestreaming Room A2 2015-07-21 AM/Livestreaming From Room A2 2015-07-21 AM.0.f4v
| | | +-- AM.f4v -> ../../../original/ROOM A2/DIA 2 21 Julio/Livestreaming Room A2 2015-07-21 AM/Livestreaming From Room A2 2015-07-21 AM.f4v
| | | +-- PM.f4v -> ../../../original/ROOM A2/DIA 2 21 Julio/Livestreaming Room A2 2015-07-21 PM/sample.f4v
| | | `-- PM.mpg -> ../../../original/ROOM A2/DIA 2 21 Julio/Livestreaming Room A2 2015-07-21 PM/MP2_Jul21_183859_0.mpg
| | ...
| +-- A3
| | ...
| +-- B1
| | +-- 2015-07-20
| | | +-- AM_1.mp4 -> ../../../original/ROOM BARRIA 1/DIA 1 20 Julio/AM/BARRIA 1 - ponencia 1(Output 1).mp4
| | | +-- AM_2.mp4 -> ../../../original/ROOM BARRIA 1/DIA 1 20 Julio/AM/BARRIA 1 - ponencia 2 (Output 1).mp4
| | | +-- AM_3.mp4 -> ../../../original/ROOM BARRIA 1/DIA 1 20 Julio/AM/BARRIA 1 - ponencia 3(Output 1).mp4
| | | +-- PM_4.mp4 -> ../../../original/ROOM BARRIA 1/DIA 1 20 Julio/PM/BARRIA 1 - ponencia 4(Output 1).mp4
| | | +-- PM_5.mp4 -> ../../../original/ROOM BARRIA 1/DIA 1 20 Julio/PM/BARRIA 1 - ponencia 5 (Output 1).mp4
| | | +-- PM_6.mp4 -> ../../../original/ROOM BARRIA 1/DIA 1 20 Julio/PM/BARRIA 1 - ponencia 6 (Output 1).mp4
| | | `-- PM_7.mp4 -> ../../../original/ROOM BARRIA 1/DIA 1 20 Julio/PM/BARRIA 1 - ponencia 7 (Output 1).mp4
| | ...
| +-- B2
| | +-- 2015-07-20
| | | +-- AM_1.mp4 -> ../../../original/ROOM BARRIA 2/DIA 1 20 Julio/AM/Barria2 1 (Output 1).mp4
| | | +-- AM_2.mp4 -> ../../../original/ROOM BARRIA 2/DIA 1 20 Julio/AM/Barria2 2 (Output 1).mp4
| | | +-- AM_3.mp4 -> ../../../original/ROOM BARRIA 2/DIA 1 20 Julio/AM/Barria2 3 (Output 1).mp4
| | | +-- PM_4.mp4 -> ../../../original/ROOM BARRIA 2/DIA 1 20 Julio/PM/Barria2 4 (Output 1).mp4
| | | +-- PM_5.mp4 -> ../../../original/ROOM BARRIA 2/DIA 1 20 Julio/PM/Barria2 5 (Output 1).mp4
| | | +-- PM_6.mp4 -> ../../../original/ROOM BARRIA 2/DIA 1 20 Julio/PM/Barria2 6 (Output 1).mp4
| | | `-- PM_7.mp4 -> ../../../original/ROOM BARRIA 2/DIA 1 20 Julio/PM/Barria2 7 (Output 1).mp4
| | ...
`-- video_split_points.yaml
Hopefully the relationship to the configuration file is clear. The material under 2014 is old and not used. The configuration entry year makes only the material under directory 2015 relevant.
The directory structure under original is irregular, but the one under video is regular and flattened and can be more easily used to determine in- and out-points for cutting the videos.
The video data disc contained some more info than just the video streams, but this extra data is insubstantial. To minimise the risk of deleting something from the original material use rsync to copy the data to your machine after mounting the drive read only.
If the drive gets mounted by plugging in, use:
mount -o remount,ro /path/to/mount/point cd /path/to/mount/point rsync -av --progress . target
where target should correspond to your resulting video_dir specified in the configuration file (after combining with base_dir and year if applicable).
Make sure every user that needs to can the files:
find original/ -type d -exec chmod 755 {} +
find original/ -type f -exec chmod 644 {} +
Uploading was done using rrsync on the server to restrict access. After that rsync was used to upload the data in one specific directory.
No normal ssh access was possible, because of restrictions in ~/.ssh/authorized_keys.
This initial task can be done, after storing talk_abstracts.json using:
ueps metadata --flatten
assuming the configuration file values are set. It will generate the flatten.yaml file.
Inspect the file for correctness, but don't edit by hand just yet as this would be overwritten if some programmatic changes are made and the above command re-run.
With the 2015 data the abstracts values in the JSON file were a bit of a problem. It was a list with 3 entries of which the second often and the third was empty always. The second entry was, when available, a more detailed description, sometimes repeating the first entry. Uploading both (to archive.org which has enough space for metadata), would have lead to doubled text.
There was also the problem of the newline differences in the abstracts. Some files had newlines inserted about every 70 chars, and double newlines to indicate a new paragraph. Others had longer lines and used a single newline to create a new paragraph. The conversion process tries to do the smart thing with this.
The flattened file is a YAML file, with the large abstracts as literal scalars for its readability. A single top-level key-value mapping entry of this file look like:
361:
track_title: Google Room
speakers: Guido van Rossum
tags:
- python
duration: 60
title: 'Keynote: Python now and in the future'
timerange: 2015-07-21 09:30:00, 2015-07-21 10:30:00
have_tickets: [true]
id: 361
emails: guido@python.org
type: Keynote
abstract: |-
This is *your* keynote! I will have some prepared remarks on the state
of the Python community and Python's future directions, but first and
foremost this will be an interactive Q&A session.
I brough talk_abstracts.json and later flatten.yaml under revision control, just in case something caused a useful/final version to be overwritten.
First specify which directories in the original video data to ignore, this is more flexible than deleting, as rsync-ing new data might get you those freshly deleted dirs/files back again. Use the 2015: video_ignore: sequence in the configuration file for this.
Then run:
ueps video --org
This will will show you any unmapped video data, make sure you either delete the original if it is broken off rsync residue (the dotted files), or adjust the video_map_path entries to map all files names. The AM/PM directory level is dropped to get half of the directories in the output, but that info is preserved in the file names.
When done (no unmatched files), run ueps video --org --save and check the map_video.yaml file before proceeding.
The welcome session, nor the Lightning Talks did have entries in the flattened YAML file. These were added as follows:
990:
track_title: Google Room
speakers: Fabio Pliger, Oier Beneitez
tags:
- EuroPython
- conference
duration: 45
title: Welcome
timerange: 2015-07-20 09:00:00, 2015-07-20 09:30:00
id: 990
type: Other session
abstract: |-
Welcome to EuroPython 2015
991: <
track_title: Google Room
speakers: Various speakers
tags:
- EuroPython
- lightning talk
duration: 45
title: Lightning Talks
timerange: 2015-07-20 17:15:00, 2015-07-20 18:00:00
id: 991
type: Other session
abstract: |-
Lightning talks, presented by Harry Percival
992:
<<: *lt
timerange: 2015-07-22 17:15:00, 2015-07-22 18:00:00
id: 992
993:
<<: *lt
timerange: 2015-07-23 17:15:00, 2015-07-23 18:00:00
id: 993
994:
<<: *lt
timerange: 2015-07-24 17:15:00, 2015-07-24 18:00:00
id: 994
995:
<<: *lt
timerange: 2015-07-24 18:00:00, 2015-07-24 19:00:00
id: 995
The << is usage of the merge facility.
Once the map_video.yaml file is OK, use it with:
ueps video --map
to create the directory hierarchy under the video directory with links to the original data.
You can remove the links, change the mapping file and rerun the command. I needed to do this as both drives did not have the exact same naming.
At this point the video names should be alphabetically ordered within the room/date structure. This is the point where you are going to split the flat-yaml file, so any global editing should be done now (e.g. check user names for correct casing).
Now run:
ueps metadata --relate
and for each of the directories under the videos that have non-associated videos it will try to find matching list of talk names/event ids from the flattened yaml file (by using track/date/time). If the number of tracks on a day corresponds to the number of videos this should be trivial.
Any remaining stuff needs checking. Either some talk was not recorded and the others need to be assigned by hand, or some live stream needs splitting first.
Once everything matches a metadata file is written in the metadata directory based on the flattened YAML data and the related video. This file has a unique id generated for archive.org. The unique id for YouTube is returned after successful uploading.
Finding the cut points can be done with VLC. As the mouse is a bit course for finding starting/end points it is better to use Alt+Left/Right Arrow (10 second jump, use shift for 3 second jump, or use Ctrl instead of Alt for 1 minute jump).
Note that the videos for Room A2 included the keynotes (from A1), which was kind of confusing initially.
These in and out points are stored in single file.
TBD How to get this info, merge it and update the individual metadata files.
Cutting can be done for most files when uploading to archive.org, as there you can have files of up to 10Gb in size. Cutting (without conversion is fast enough (seconds) so that there is little waste in gaps in upload time.
Conversion, as necessary for YouTube was more of a problem this can take minutes (on my desktop machine) to hours (on my older co-located server).
So for youtube I first converted and in parallel started uploaded (which was slow from home as well), and for archive.org I cut and uploaded.
TBD, this needs rethinking/reimplementing based on co-located server work.
with the in- and out- points merged into the individual metadata files, you can start uploading by doing:
for i in 2015_07_20_*_A1_* ; do europythonarchiveupload upload "$i"; done
and do only the Monday videos from Google Room (A1). The metadata is update if a file has been uploaded, so trying to upload twice is caught and not a problem.
There is a delay in getting the videos to show up on archive.org, it can take hours for them to be processed. This can be very confusing as there is, as far as I know, no way to see what is "in the queue" on the website. (You cannot reuse the same handle)
The videos on YouTube are uploaded as private, this gives you a chance to review before release and to correct/extend some metadata than cannot be set using the API (AFAIK).
First select all private videos:
https://www.youtube.com/my_videos?o=U&sq=is%3Aprivate&vmo=private
After the "is:private" in the search box add EuroPython2015 (adjust the year, it is one word) and search once more.
Select all videos, deselecting the top ones that have no timecode as they are still being processed (usually only one) if necessary
Now selects Actions -> More Actions, this will allow you to do multiple actions at the same time.
Select License and Privacy:
(The image has location selected as well, this is now handle automatically as part of the upload)
Once all the files for a particular date are upload you can also associate the date with each file (select on "20 July 2015")
A list of encountered problems: - 4 (Four!) different file types. - Room A2 sample.1.f4v double in AM and PM (same data), missing live stream for afternoon except for 19 minutes. Downloaded from YouTube with:
youtube-dl https://www.youtube.com/watch?v=PJS7aeZTOY8
Room A2, 07/20 file MP2_Jul20_135554_0.mpg had no sound
Event 272 had no track_title and timerange
EPS sessions in Barria 1 were not captured added video: False key/value pair. Also for S.Wirtel's talk
combined a minimal start of the first Lightning talk with the rest using:
mencoder -ovc copy -oac mp3lame 5.mov 6.mov -o /data1/7.mp4
converting of f4v to mp4 using ( -t specifies duration length):
avconv -ss 00:12:00 -i sample.1.f4v -t 15:00 -map 0 -c:v libx264 -c:a copy 1.mp4
-t has to come after -i !!! Worst than that avconv silently fails to parse -ss 00:10 as ten seconds. It has to be -ss 00:00:10
MP4 files were about 20x larger than corresponding files from 2014.
You cannot delete an item on archive.org without asking the operators. You can however delete all objects and refill the create unique ID.
If you use the web interface read about how to delete multiple files (make a directory, drag-and-drop all items in, remove directory). Also notice the [Update item!] button at the bottom of the page.
You can rename the files using JSON patches
If streams need to be split/converted (in 2015 this was necessary for all the videos from Room A2), you can use avconv (2015 version used on Ubuntu 12.04 was 0.8.17, install libavcodec-extra-53 to enable libx264 output). The basic conversion format is:
avconv -ss HH:MM:SS -i input.f4v -t HH:MM:SS -c:v libx264 -c:a copy out.mp4
The parameter for -t is a duration.
The ueps utility supports splitting based on information in the splitpoints file (specified under video in the config file). This YAML file should look like:
cmd: avconv -ss {start} -i "{src}" -t {length} -c:v libx264 -c:a copy {out}
Room A2/2015-07-20/AM/sample.1.f4v:
- from: 9:35
to: 42:11
out: 2.mp4
- from: 48:17
to: 1:16:25
out: 3.mp4
any entry with a "/" is considered a source file. The "to" entries are end points and the appropriate length for avconv is calculated. The target file is created in the directory of the source file, and if it already exists the entry is skipped.
You can check the commands to be executed by using:
ueps video --split
if the commands look good execute them by using:
ueps video --split --execute
Conversion runs in about double real-time if going from mp4 to mp4.
This step only became necessary after trying the first upload, realising that it was going to take 15 hours for a 9 minute opening session video. While checking the files I also noticed that one contained two talks (i.e. was 75 minutes instead of 45).
ueps video --check
Some stuff that should be done:
proper utility install
figure out if youtube-upload can set license for YouTube (default only applies if going through browser) (Answer: no have to do that once) To edit all https://www.youtube.com/my_videos?o=U and select the videos then actions playlist https://www.youtube.com/playlist?list=PL8uoeex94UhGGUH0mFb-StlZ1WYGWiJfP&action_edit=1
some mechanism to gracefully stop long operations (i.e. multiple scheduled uploads or conversions), with uploads, as the program to upload is called multiple times, you can briefly make the upload directly return in the source code.
check actual upload speeds to archive.org/youtube from a hetzer based server
see if the announcers can mark the start and end timings, so there is no announcement, and also no afterwards blabla about "The next talk will be in five minutes ...."
takes about a minute on average per video to get the exact start and end points and enter them. Longer if the lead time is big (Anywhere Room videos, live streams)
add testing microphone loudness before announcement?
check on final output size being at least as certain length (in case input was wrongly specified as a minute of video or so).
tell the announcers to clap their hands away from the microphone?
use VLC --start-time (seconds) and --stop-time to verify cut on videos
should probably include track/room name in description on YouTube
should keep track of how much time taken for uploading and store
put the whole thing under docker?
should we include track name/conference room in the description of the talk:
[EuroPython 2015] [22 July 2015] [Bilbao, Euskadi, Spain] [En Español]
update the https://www.youtube.com/user/PythonItalia EuroPython editions
using UPDATE in a talk description is of no use.
using full talk name after conversion before uploading to archive.org, as this name is what is used for downloading as well (currently RR_YYYYMMDD_XM_#.mp4 if cut)
deleting the target of a link using find:
find B2/2015-07-2[012] -type l -printf '%l\0' | xargs -0 rm -v
describe the use tmux on the server, scrolling Ctrl+B, PageUp
use avconv's -metadata or map_metadata to include metadata in the files that are uploaded https://libav.org/documentation/avconv.html#Metadata using an .ini like file http://jonhall.info/how_to/dump_and_load_metadata_with_ffmpeg
Should include:
;FFMETADATA1 title=Configuration file readability: beyond ConfigFile and JSON. artist=Anthon van der Neut album=EuroPython2015 date=2016-07-20 genre=lecture copyright=2014 Creative Commons Attribution synopsis=bla bla bla bla bla bla bla bla bla bla
dump:
avconv -i "file.mp4" -f ffmetadata metadata.txt
load:
avconv -ss 00:00:10 -i "file.mp4" -t 00:20:15 -i metadata.txt \ -map_metadata 1 -c:a copy -c:v copy -o out.mp4
mark old commands as deprecated and remove them.
some lock mechanism to prevent two processes uploading at the same time.
check upload tag set (two times EuroPython2015 on archive.org)