Looking for Senior AWS Serverless Architects & Engineers?
Let's TalkHow I converted an AWS whitepaper pdf to audio mp3 using Amazon Polly and S3…
The AWS Certified Solutions Architect — Professional exam is infamously difficult, not just because it covers such a wide range of topics, including serverless, but because it also expects a high level of intuition on how to approach various use cases. Seasoned pros fail this exam multiple times. 🤭
One of the top test-prep recommendations I’ve heard from certified professionals, besides getting hands-on experience, is to read the recommended whitepapers, listed under “Exam Preparation” in the exam blueprint.
As I mentioned in my article about the Developer Associate certification, AWS whitepapers for Kindle from Amazon.com make it easier to make steady progress with reading these whitepapers.
However, I mostly stopped reading books ten years ago and have been listening to books ever since. Unfortunately, the AWS Whitepapers are not yet on Audible.
So I decided to start uploading them to SoundCloud to help me and other auditive AWS learners like me. If you’d like to know how to take a pdf and convert it to mp3, read on. Otherwise, here are the two mp3's:
For now, we’ll do this manually. In the future, we may cover how to automate this with Lambda.
- Download the pdf using the AWS whitepapers page or an exam blueprint.
- Convert it to .txt using something like zamzar.com/convert/pdf-to-txt.
- Delete the Notices, Contents, and the sentence about feedback.
- You may want to remove headers, so they don’t interrupt the flow.
- Remove the page numbers using a regex: /Page \d* of \d*/gi. regexr.com is useful if that didn’t make sense.
- If the whitepaper is 100 to 200k characters long — longer than like 45 pages — then split it in half, into a Part 1 and Part 2, as I did with the SoundCloud tracks above.
- Create a bucket in S3 or decide on which existing bucket you want to use.
- Assuming you’ve already set up your AWS CLI, use the following command:
You can use the AWS Console for Polly to experiment with your favorite voice-id and language-code. Polly even has Welsh and Indian English.
You can use the CLI Command Reference to find the CLI codes for various countries.
As you can see below, I used a British voice using --language-code en-GB.
You may have also noticed that I used three forward slashes after the file protocol. That’s because I’m on a Mac. For Windows, it may be just two. The easiest way to figure it out is by opening your favorite browser and then Open File... and choosing the .txt file you want to use. The browser will open the file and you can just copy the address from the address bar and paste it into your CLI.
9. Wait a couple of minutes.
10. Go to the S3 console or use the CLI and download your mp3. Put it on a playlist and listen on your phone as you do the dishes or fold the laundry.