Monthly Archives: October 2022

Making subs2srs cards using transcripts for FRENCH & german

1wKPU9.md.png
1wKb85.md.png <-gotta find mine too!!!

1wKaOr.md.png

the caveat is that you have subtitles with timing that match the video. The text of the subtitles does not need to match the dialogue.

 Subtitle files that MATCH the french dub do not exist (maybe a couple shows on netflix?) but there’s a LOT OF transcripts for the french dubs of american shows on hypnoseries. Do note that sometimes they paste the french subtitles rather than the actual dub in the VF section so the only way to know if the VF is the transcript of the dub is to get the video/audio and open your ears. I thought of a way to generate subs2srs decks using a transcript 

So what I need is the video with french dub audio, english subs that match the video or French subs that match the video in the timing but the text does not match the audio, notepad, potplayer, and subs2srs. I usually use the English subs since those are the ones with the best timing plus it helps with getting comprehensible input.

<- absolutely obsessed with this song for some reason. on loop for a day

To fenagle a subs2srs deck out of the transcript + vid + subtitles I used notepad, potplayer, aegisub, subs2srs subtitle edit, and websites. This method is not geared towards generating a deck that contains every single line from the episode. . this method is for mining lines. I love seeing how the number of lines I mine go down with each ep (well there might be ups and downs but the general trend is downward since french is easy for english-speakers). A less labor-intensive approach would be just mining the dub transcript text for anki cards sans audio/screenshot using vocabtracker.com or chrome plugin for vocabtracker or marking lines with + to mine then generating the audio using the azure plugin using the latest version of anki. Azure TTS is pretty kickass~ I usually set azure tts audio to the fastest speed I can handle.

1wpjAq.md.png

general steps are

  1. get the transcript 2. get subtitles + video 3. use potplayer +autohotkey to mine  lines of interest 4. generate a subs2srs deck + condensed audio 

♆➸➸➸♡➸➸➸♡ ♕♘ ♞♜ ❅

k8-OLLNPa3-M.gif (800×524)

BREAKDOWN of the steps

1) After confirming the transcript matches the DUB and not a copy paste of the french subtitles I paste the transcript from hypnoseries into http://www.unit-conversion.info/texttools/add-line-breaks/#data to add line breaks before ( and after ) and also after . and sometimes ? and !

2) get subtitles from subscene and/or SOUS TITRE . you gotta get subs that match the video. it’s either gonna be from dvds or NETFLIX (ie grey’s anatomy). using Aegisub for editing one subtitle file and subtitleedit to batch process many subs: convert frame-rate 23.297 to 25 if necessary. I also shift the times if necessary ie i had to shift 1 second for some show. if the episode is like 40 minutes long and the subs are 42 minutes long you have to convert the framerate of the subs. Then, I do Find and replace \n with space THEN Find and replace space with _ so it’s like i_am_sad rather than i am sad. I also do find and replace with . with ._ ? with ?_ and ! with !_ to ensure all the lines contain an underscore _. Order of the find and replace is important here!  I first eliminate line breaks then replace all the spaces with an underscore. I prefer using English subs and the timing is usually better and some french subs have subpar, abyssmal timing.

3) use potplayer to autoloop the video by subtitle. Went over this in my parallel text post
https://choronghi.wordpress.com/2021/05/11/making-parallel-text-with-deepl/

4) go through ep via potplayer using loop by subtitle line-  when i find a line of interest I mark the line in the transcript with +, then make a line break by pressing enter, I press numpad 4 to copy the subtitle on the potplayer screen. i suppose + would be problematic if math is featured in the episode. guess you could come up with something else besides +. paste the line on notepad. rinse and repeat and make sure stuff is in order.

Making PARALLEL TEXT with DeepL

I ended up making 2 authotkey scripts MEDIAFIRE LINK

paste bin: https://pastebin.com/jwwwikkC && https://pastebin.com/dFGvYRs1

So now when I press Right Enter the script presses numpad 4 to copy the line from potplayer, presses HOME button to bring you to the beginning of the line you are on notepad, adds +++ , presses enter, inserts ####, then presses control v to paste the potplayer line, then inserts ### under the line. Because I have to rely on the enter button I have to set notepad to format->word wrap OFF. I press the RIGHT enter while notepad is active. the potplayer global shortcut numpad 4 does not require potplayer to be the active window 

1wKyyb.md.png

I also made another script for copying the line the insertion point is on. When I press / it duplicates the line I’m on notepad. Necessary since they sometimes break up the sentence in half in the subs and all the manual clicking etc add up

so for the most part I am pressing 7, 9, /, right enter, and the arrow buttons to mine my sentences 

1bBpqQ.md.png

5) filter text
https://onlinetexttools.com/filter-text
paste the text and

filter for _
and filter for +
if you didn’t screw up then you should have the same number of lines . paste in excel or excel-like program to confirm and also check with the .tsv that subs2srs generates

1bB7Ma.md.png

1bBOhe.md.png

6) append ; to the text that contains _
using this website https://nimbletext.com/Live/788319538/
this site is very cool! https://nimbletext.com/HowTo/ManipulateText

Paste both into excel or excel-like program to check they match. If they don’t figure out why.

7) generate subs2srs deck and condensed audio  using the filter you generated and the subtitle file. paste the ; crap into only generate cards for lines that CONTAIN. or save the text into a .txt then open the .txt in the designated area in subs2srs. I think i only have the option of opening the .txt in the designated area because the text is so long. Hit preview a couple times before generating to ensure the lines are being filtered as desired. I pad the timing for the deck in subs2srs and i do not pad the timing in subs2srs for the condensed audio. for condensed audio I open the subtitle file in aegisub and lead-ins/lead-outs and tell it to make adjust subtitles continuous under timing post-processor 

1bBlCk.md.png

you can only generate the subs2srs deck using ONE subtitle file which is the one with the _ you used to mine from potplayer. if you use 2 subtitle files it will combine lines etc and mess up the number of total lines generated.

7b) combine mp3 of condensed audio with makeitone mp3 album maker. then  run truncate silence on audacity.

1wp8Z3.md.png

8) paste the subs2srs file into excel. I always find and replace quotation marks to get rid of ” before pasting. paste the text you filtered for + next to the expression column. if they don’t match up figure out why and fix it via shifting etc ie if it’s off by one it should be an easy fix. Even if the number of lines is the same it’s possible that the lines are off by one etc somewhere so be sure to check here and there

9) deepl translate the text in the filtered text that contain + and paste into excel

10) import into anki and paste the media files into anki.collection media folder. I usually replace the _ before importing into anki. Do note that the [.mp3] area etc has _ so make sure you only do find and replace for the columns that contain the dialogue

11) learn using subs2srs! i use lingoes pop-up dictionary and goldendict to look up words . use morphman if you want. I LIKE MORphman because i can feed myself 1t = i + 1 sentences and i can make the unknown word huge

1bXvaZ.md.png

1bXnLK.md.png

1bXYlm.md.png

1bXMkP.md.png

b is line beore and a is line after

♆➸➸➸♡➸➸➸♡ ♕♘ ♞♜ ❅

links to transcript sites:

German:

notable shows are x-files, charmed, angel, buffy
http://www.tv-scripte.de/  also for subs2srs decks for anime dubs check out the german discord for refold.

French:

harry potter 1st movie
https://docs.google.com/document/d/1x5sKp9CVl1w3EjgJrJyemvzOwMTzbRbw3gHTK2YHwNc/edit?usp=sharing txt version

charmed http://charmed.fantasy.free.fr/charmedfantasy.html

buffy http://bufyvs.free.fr/series/btvs/transcript.php

simpsons https://www.simpsonspark.com/guide-des-episodes

after selecting an episodes click ON Voir le script de l’épisode en VF
I personally recommend episodes 6.25 & 7.1 which are infamous mystery episodes.

hypnoseries. After finding the show click on les episodes-> scroll down -> click on vf. If you only see VO there’s no french dub transcript. 

There are some shows where they paste the french subtitles (not the dub dialogue) into the VF section usually it has the subtitle timing and stuff so it’s very obvious. 

These are some shows with VF section!  there’s lots more
gossip girl
desperate housewives 1×1 to 1×3, 1×8, 2×1
grey’s anatomy season 1,
lost

heroes
big bang theory
charmed
buffy the vampire player
bones
arrow

nothing as in no vf section:
MAD MEN, breaking bad better call saul 
How to get away with murder
CROWN
Schitts creek
x-files
south park

allegedly VF (I didn’t check it myself) : friends, angel, one tree hill, smallville, supernatural, flash, prison break, veronica mars , pretty little liars, lucifer, glee

stuff pasted in VF isn’t the dub dialogue: game of thrones, malcolm in the middle, how i met your mother

♆➸➸➸♡➸➸➸♡ ♕♘ ♞♜ ❅

for the VIDS – find  the french dub via googling vf regarder or get the dvds since they have french dubs on them… though for the simpsons i think it’s the canadian french dub rather than the french dub so the transcript will not matchhhhhhhhhhhhhhh

search folge for german dubs – on dailymotion – angel – charmed – buffy. there’s also some other special sites out there~

https://i.postimg.cc/qvHXNZWp/111.png

https://i.postimg.cc/7Y2nYysB/222.png

learning from a dub definitely you saves you a lot of hassle of reading through a lotta entries in the dictionary since they did a good job with the french dub. i don’t own any rubber stamps… marge is known for having an unpleasant voice and it’s more unpleasant in the french dub imo lol