Get JBP's lecture transcripts and inject them into their appropriate forum topics

jbp
Tags: #<Tag:0x00007f4e1ec6a480>

(Benjamin Lupton) #1

It would be fantastic for search and SEO, as well as study for us to get the transcripts/captions of Peterson’s lectures into this forum, as then we can jump to the specific parts of his videos when he says certain things, and can search the videos by what he says.

Youtube does have an official download UI for the timecoded captions, but it is only available to the channel owner, which is not us.

They also seem have an API, which I have not yet tried:

https://developers.google.com/youtube/v3/docs/captions/download

However, searching GitHub there does seem to be several tools:

And it would be easy enough to build a headless browser script to scrape them from the page in realtime. It can even become a saas.

Additional resources:


(Nick Redmark) #2

http://search.jordanbpeterson.com/


(Benjamin Lupton) #3

Pity they don’t make the data public. Jordan would be able to go a lot further if he embraced open source. Especially as those transcriptions he is making searchable were submitted by the community. He should at least give them access too.


(Nick Redmark) #4

Perhaps you will find more info here (I don’t have access to reddit right now):


(Benjamin Lupton) #5

Seems he is using one of the scraping tools above, but is not going into details as he desired it to be a private commercial venture - I guess which he then sold to Peterson.


(Benjamin Lupton) #6

Great, seems the v3 API doesn’t require you to be the channel owner:

https://developers.google.com/youtube/v3/docs/captions/list#usage

/**
 * API response
 */
{
  "kind": "youtube#captionListResponse",
  "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/E_2T0GwW9dmWmtzjw7RwvYg5_1o\"",
  "items": [
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/6cGn6jKHpY3WcGkOVfk6GeanGro\"",
      "id": "ymvDzem3dRLeGFRG4tgivCl20PIRXLQ8kDKPRdAP6Sg=",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2017-09-08T05:27:56.760Z",
        "trackKind": "ASR",
        "language": "en",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/QCcw_9DBqRz4-yxMFFQlDZBWZn0\"",
      "id": "MpHP7NxwDIYqCDMsT329zrHkKqXzmNkm",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-03-05T00:31:29.320Z",
        "trackKind": "standard",
        "language": "cs",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/1EwbwicP0kBtg46fLasU7EKyX2U\"",
      "id": "MpHP7NxwDIYhuvON-RXqEFnsHxzyiTRS",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-05-17T21:40:57.897Z",
        "trackKind": "standard",
        "language": "el",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/aKaTWucjFqav4Oz2k1DpHSRM9SI\"",
      "id": "MpHP7NxwDIZ_HlG7HDXPSqzj9FR9zS3u",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2017-10-15T15:02:07.930Z",
        "trackKind": "standard",
        "language": "en",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/7hOnRg7GPYVK2yWHOCr63XyjAZU\"",
      "id": "vvV5Whe6EHaHRmbz__y-vvK6M_OdZLsod0ASc9D0v_c=",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2017-06-22T18:19:13.117Z",
        "trackKind": "standard",
        "language": "es-ES",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/Ig06awSpVMFnowORt8rFx83XFXo\"",
      "id": "MpHP7NxwDIZ48u-oIxyQ-CmWZMRXHcw4",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-10-02T02:33:15.650Z",
        "trackKind": "standard",
        "language": "es",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/5_Lxz46WT3XGgagjG-lxsGCLk44\"",
      "id": "MpHP7NxwDIbE8GEQxiz4CwN7Gf50RT28",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2017-12-13T21:40:28.478Z",
        "trackKind": "standard",
        "language": "fr",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/UwswSanYmZTb0NRwVTqs1jxJBIg\"",
      "id": "MpHP7NxwDIYgSdM-HPqKmH4f3Zq-peRn",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-03-30T17:50:12.543Z",
        "trackKind": "standard",
        "language": "hr",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/upGN5ZyF0ECVGG3g8Hxvw-QnVBM\"",
      "id": "MpHP7NxwDIbgoQmVp1GQNyudzjaJ26r9",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2017-09-05T11:46:10.482Z",
        "trackKind": "standard",
        "language": "pl",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/CbZhgPSn6JU14MPDuJShcyJ730Q\"",
      "id": "MpHP7NxwDIb2Y8vixR1OO4p48lXTDnPc",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-02-02T14:38:52.507Z",
        "trackKind": "standard",
        "language": "pt",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/2XCRWLTJG-rKKJbVSSYdDxv-1-c\"",
      "id": "MpHP7NxwDIa6PTKl4TPWbASgIXcBu9E6",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-01-17T22:05:54.769Z",
        "trackKind": "standard",
        "language": "ro",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/-jBcs6MgqQmuvBBX2HN24-DZhaA\"",
      "id": "MpHP7NxwDIYJv0RmOObEaqoH04xA_wFv",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2017-06-22T18:12:17.678Z",
        "trackKind": "standard",
        "language": "ru",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/Z5oA9p3nspajEtYwsQiMdURLcAQ\"",
      "id": "MpHP7NxwDIZ4v9LGWyx57aI_5dDJE4PW",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-02-02T16:33:07.953Z",
        "trackKind": "standard",
        "language": "sk",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/7gERfTFZ3A_zWsNoxjwWR0av6sA\"",
      "id": "vvV5Whe6EHa0DAKipDDdkPkrr4ZSddXPVnkcKay08Uw=",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-05-01T21:44:58.222Z",
        "trackKind": "standard",
        "language": "zh-CN",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/U4yE0JUgIqizBOLphon5Nesn04o\"",
      "id": "ymvDzem3dRKZpT_d37M1p6xf1Tg3qHiHPJqYKiuthB0=",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-05-01T21:43:14.956Z",
        "trackKind": "standard",
        "language": "zh-Hans",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/8WCIS0L0YFb5rjHjKl3nJ9JTcu8\"",
      "id": "ymvDzem3dRKZpT_d37M1p4pCmiUawPGXTGPhHKptcqs=",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-05-01T21:44:04.105Z",
        "trackKind": "standard",
        "language": "zh-Hant",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/pi-IEqGB5LzcVayleM3r4Wk5tRY\"",
      "id": "vvV5Whe6EHa0DAKipDDdkGLYOz5FfWKP85Yxb1FotZM=",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-05-01T21:46:26.935Z",
        "trackKind": "standard",
        "language": "zh-TW",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    },
    {
      "kind": "youtube#caption",
      "etag": "\"XI7nbFXulYBIpL0ayR_gDh3eu1k/pQOm04hpGNBuUInDwPcutKHkWSg\"",
      "id": "MpHP7NxwDIbnl_FaO1FLwBXxSoy9Fh3U",
      "snippet": {
        "videoId": "f-wWBGo6a2w",
        "lastUpdated": "2018-05-01T21:47:53.879Z",
        "trackKind": "standard",
        "language": "zh",
        "name": "",
        "audioTrackType": "unknown",
        "isCC": false,
        "isLarge": false,
        "isEasyReader": false,
        "isDraft": false,
        "isAutoSynced": false,
        "status": "serving"
      }
    }
  ]
}

https://developers.google.com/apis-explorer/#p/youtube/v3/youtube.captions.list?part=id&videoId=f-wWBGo6a2w&_h=3&

https://developers.google.com/apis-explorer/#p/youtube/v3/youtube.captions.download?id=ymvDzem3dRLeGFRG4tgivCl20PIRXLQ8kDKPRdAP6Sg%3D&_h=4&


And seems youtube-dl has us covered already:

> youtube-dl --skip-download --write-sub --all-subs f-wWBGo6a2w
[youtube] f-wWBGo6a2w: Downloading webpage
[youtube] f-wWBGo6a2w: Downloading video info webpage
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.hr.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.el.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.fr.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.en.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.zh.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.pt.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.ru.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.zh-Hans.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.zh-TW.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.zh-Hant.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.sk.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.zh-CN.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.pl.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.cs.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.ro.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.es-ES.vtt
[info] Writing video subtitles to: Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.es.vtt
> cat "Biblical Series I - Introduction to the Idea of God-f-wWBGo6a2w.en.vtt" | less

WEBVTT
Kind: captions
Language: en

00:00:00.000 --> 00:00:09.040
[CLASSICAL MUSIC]

00:00:09.340 --> 00:00:29.020
[APPLAUSE AND CHEERS]

00:00:29.100 --> 00:00:32.300
Well, thank you all very much for coming to this.

00:00:32.300 --> 00:00:37.900
It's really shocking to me that you don't have anything better to do on a Tuesday night. [AUDIENCE LAUGHTER]

00:00:38.800 --> 00:00:40.960
No, but seriously, though, it is.

00:00:40.960 --> 00:00:52.240
I mean, it's very strange in some sense that there's so many of you here to listen to a sequence of lectures on the psychological significance of the Biblical stories.

00:00:52.320 --> 00:01:02.120
It's something I've wanted to do for a long time, but it still does surprise me that there's a ready audience for it.

00:01:03.220 --> 00:01:07.440
So that's good, so we'll see how it goes.

00:01:08.920 --> 00:01:11.280
I'll start with this because this is the right question.

00:01:11.280 --> 00:01:13.720
The right question is why bother doing this.

00:01:13.720 --> 00:01:16.380
And I don't mean why should I bother doing it.

(Benjamin Lupton) #7

Now the next step will be injecting it all into discourse.

API Docs:

https://docs.discourse.org

Create:

https://docs.discourse.org/#tag/Topics%2Fpaths%2F~1posts.json%2Fpost

https://docs.discourse.org/#tag/Posts%2Fpaths%2F~1posts.json%2Fpost

Update:

https://docs.discourse.org/#tag/Topics%2Fpaths%2F~1t~1{slug}~1{id}.json%2Fput

https://docs.discourse.org/#tag/Posts%2Fpaths%2F~1posts~1{id}%2Fput

Search:

https://docs.discourse.org/#tag/Search

https://docs.discourse.org/#tag/Categories%2Fpaths%2F~1c~1{id}.json%2Fget


(Benjamin Lupton) #8

So just went to add the english transcript to Bible 1, and got this:

Body is limited to 64000 characters; you entered 211434.

Went to update it, however the max that is allowed is 99000:

max_post_length: Value must be between 0 and 99000.

So not sure how to proceed.


(Benjamin Lupton) #9

Okay, solution here is:


#10

Hi all, just spotted this, so for the record, I manually downloaded all the lecture files, plus many outside his channel, built a database and indexed each file with meta data including channel, series (like playlist), etc, and built code that effectively serializes the content but maintains the boxes of text, that is, the indexing of the timestamps applied to the serialized lecture content, and a search that looks in single and side-by-side boxes.

I can search all of his lectures, plus additional web sites, and then later after a search is performed, it also provides a list of the different channels (ie Transliminal, Ruben Report, Pangburn Philosophy… etc) to then isolate search results.

I built it for the grand global community at large, and started way back when the other guy who created the daemon life search, his search engine went down for a period of time because he wasn’t maintaining it. So, that triggered me to start building.

And in November I mentioned I wanted to bring others on to help, because some chaos…


#11

I asked Jordan Peterson, privately in person, after one of the biblical-stories lectures, when he might be looking for help for the online university because I was interested in that, but have since determined that he uses some opaque HR methods, I could advertise to him and his team and I am fairly comfortable with an assumption of being firewalled. The only time there was a touchpoint between him and my projects was when he tweeted a link and a favourable remark to the facebook group I built (JBP Liberal discussion group), that was September 2017.

I have seen at least one other doing labours like this mentally crumble in resentment from being ignored. But I’m driven by LOGOS/SOPHIA, and building for everyone. That’s my observation, philosophy and rationale… and why I’m wanting to bring the source code over to this site.