2011-07-08 It seems like the process that gathered the MP3 files for the YT consumer set simply repeated the previous file when it encountered an unavailable file. There are 16 cases where the same file occurs twice (or more) in sequence (here indicated by identical file sizes): -rw-r--r-- 1 dpwe 1512 762141 2010-11-30 17:16 YT_animal_162.mp3 -rw-r--r-- 1 dpwe 1512 762141 2010-11-30 17:16 YT_animal_163.mp3 -rw-r--r-- 1 dpwe 1512 47140 2010-11-30 17:16 YT_beach_095.mp3 -rw-r--r-- 1 dpwe 1512 47140 2010-11-30 17:16 YT_beach_096.mp3 -rw-r--r-- 1 dpwe 1512 4312474 2010-11-30 17:16 YT_beach_109.mp3 -rw-r--r-- 1 dpwe 1512 4312474 2010-11-30 17:16 YT_beach_110.mp3 -rw-r--r-- 1 dpwe 1512 432952 2010-11-30 17:16 YT_birthday_055.mp3 -rw-r--r-- 1 dpwe 1512 432952 2010-11-30 17:16 YT_birthday_056.mp3 -rw-r--r-- 1 dpwe 1512 3806138 2010-11-30 17:16 YT_birthday_059.mp3 -rw-r--r-- 1 dpwe 1512 3806138 2010-11-30 17:16 YT_birthday_060.mp3 -rw-r--r-- 1 dpwe 1512 224936 2010-11-30 17:16 YT_birthday_123.mp3 -rw-r--r-- 1 dpwe 1512 224936 2010-11-30 17:16 YT_birthday_124.mp3 -rw-r--r-- 1 dpwe 1512 704461 2010-11-30 17:16 YT_dancing_005.mp3 -rw-r--r-- 1 dpwe 1512 704461 2010-11-30 17:16 YT_dancing_006.mp3 -rw-r--r-- 1 dpwe 1512 704461 2010-11-30 17:16 YT_dancing_007.mp3 -rw-r--r-- 1 dpwe 1512 300206 2010-11-30 17:16 YT_dancing_012.mp3 -rw-r--r-- 1 dpwe 1512 300206 2010-11-30 17:16 YT_dancing_013.mp3 -rw-r--r-- 1 dpwe 1512 300206 2010-11-30 17:16 YT_dancing_014.mp3 -rw-r--r-- 1 dpwe 1512 300206 2010-11-30 17:16 YT_dancing_015.mp3 -rw-r--r-- 1 dpwe 1512 864047 2010-11-30 17:16 YT_dancing_019.mp3 -rw-r--r-- 1 dpwe 1512 864047 2010-11-30 17:16 YT_dancing_020.mp3 -rw-r--r-- 1 dpwe 1512 434159 2010-11-30 17:16 YT_dancing_039.mp3 -rw-r--r-- 1 dpwe 1512 434159 2010-11-30 17:16 YT_dancing_041.mp3 -rw-r--r-- 1 dpwe 1512 313967 2010-11-30 17:17 YT_museum_120.mp3 -rw-r--r-- 1 dpwe 1512 313967 2010-11-30 17:17 YT_museum_122.mp3 -rw-r--r-- 1 dpwe 1512 258461 2010-11-30 17:17 YT_music_137.mp3 -rw-r--r-- 1 dpwe 1512 258461 2010-11-30 17:17 YT_music_138.mp3 -rw-r--r-- 1 dpwe 1512 1859267 2010-11-30 17:17 YT_parade_005.mp3 -rw-r--r-- 1 dpwe 1512 1859267 2010-11-30 17:17 YT_parade_006.mp3 -rw-r--r-- 1 dpwe 1512 198445 2010-11-30 17:18 YT_ski_078.mp3 -rw-r--r-- 1 dpwe 1512 198445 2010-11-30 17:18 YT_ski_082.mp3 -rw-r--r-- 1 dpwe 1512 312842 2010-11-30 17:18 YT_volleyball_002.mp3 -rw-r--r-- 1 dpwe 1512 312842 2010-11-30 17:18 YT_volleyball_003.mp3 -rw-r--r-- 1 dpwe 1512 244293 2010-11-30 17:18 YT_wedding_080.mp3 -rw-r--r-- 1 dpwe 1512 244293 2010-11-30 17:18 YT_wedding_081.mp3 I manually inspected the two larger cliques, dancing_005/006/007 and dancing_012/013/014/015. Using the ID-to-YT-URL map, I found that dancing_006, 007, 014 and 015 have indeed been removed, and dancing_013 has some kind of age filter impeding access. Dan Ellis dpwe@ee.columbia.edu