This article was originally posted on WomenLearnThai.com.
Intensive Reading vs. Extensive Reading…
Extensive reading is a language learning technique characterized by reading a lot, at or slightly below your current level of proficiency, without looking up unknown words. If the level of books or texts chosen is appropriate, unknown words or grammatical structures can be inferred from context. Extensive reading is basically reading for pleasure, but it is very beneficial in terms of solidifying existing knowledge, acquiring new vocabulary, increasing reading speed, and (depending on what you read) expanding cultural understanding. The nice thing about extensive reading is that it is fun (if you like reading, of course), with language learning being just a by-product. Focus is on meaning, not on language. Extensive reading is often neglected in language schools because it has to be done alone and can’t be assessed or tested.
Intensive reading, on the other hand, is slow, careful reading of a short text. Here, the focus is on understanding (almost) every word, every sentence. Often the text is beyond your current reading ability, but because you go slowly, you can tackle it. Intensive reading can be used to familiarize yourself with new vocabulary, to study vocabulary related to a specific topic, or to find information. It is certainly less fun than extensive reading, but it can have an important role in language learning. As a matter of fact, intensive reading is often the only reading activity used in classroom settings, and it is heavily used by self-learners as well.
I have seen recommendations to balance the amount of time spent on extensive reading vs. intensive reading at a ratio of about 4:1, which seems quite reasonable to me. In this blog post, however, instead of championing the extensive reading cause, I want to talk about intensive reading assisted by a freely available open-source software.
Intensive reading is quite time-consuming, most of which is spent looking up vocabulary, taking notes, searching for notes, and looking up the same words again. Unless you’re extremely well organized, you will find that you look up many words more than once when you encounter them again in a new text. Some time is also often spent on highlighting new words and expressions, or otherwise visually structuring the text. This has inspired some people to write software dealing with those more tedious tasks in order to make intensive reading easier. One of those software projects is the Foreign Language Text Reader (FLTR) which is open-source and can be installed and configured quite easily.
Foreign Language Text Reader…
FLTR basically works as follows: You load a text. The text is then displayed for reading, but words come color-coded. Words never seen before are blue, unknown words take shades between red and yellow/green, and known words are a pale green. While going through the text, you will mark new words as either known or unknown. If they are unknown, you can look them up in up to three online dictionaries with a single mouse click. Then you annotate the words (translations, explanations, pronunciation etc.), and this information is stored. When you encounter the word again, it will show up in its color code (there are five or six of them, from unknown to well known), and hovering over the word will reveal the notes you typed (or rather copied) in earlier. As time progresses, FLTR will learn which words you know and which you don’t, and will help you to focus on new and unknown words.
In this picture, the mouse is hovering over เครื่องกล.
What’s cool about this? Firstly, you look up words only once, and then you can review them by just hovering over those words. Secondly, instead of leafing through paper dictionaries, or typing words into an online search mask, a single mouse click will look them up. Thirdly, the color coding helps you to identify what’s new, what you’ve seen before but is still unfamiliar etc. Instead of reading over those words, they stand out a bit and remind you of their existence. The color coding is also a good visualization of how difficult the text is going to be. Lots of blue and red words means work ahead.
There are also testing options as well as the possibility to export terms to Anki, but I haven’t used those features and can’t comment on them.
Setting up FLTR is pretty straight-forward, with simple and clear instructions. Language configuration is also simple, options include setting font and font size and specifying up to three dictionaries for automated look-up (if the website allows that). Below you’ll find a screen-shot of my settings. I link to the monolingual Royal Institute Dictionary (doesn’t support automated look-up), Google image search and a longdo dictionary containing many Thai-Thai definitions. (I don’t use translations, but if you do, you’ve got many more choices).
The only problem with Thai is the following: Thai doesn’t uses spaces to separate words. FLTR, however, relies on spaces to identify words. So, unlike with languages like French or Indonesian that use spaces to indicate word boundaries, we need to prepare (‘parse’) the text before uploading it to FLTR.
A Thai Parser…
I haven’t been able to find a Thai parser on the web. It wouldn’t even have occurred to me to write my own parser, but a visitor to my website Thai Recordings told me that he wrote one, and that gave me the idea (thanks! :)). Coming up with a basic parser is actually quite simple – if you have some programming skills, you can do it yourself within a few hours. The parser requires a list of words (I use the FLTR vocabulary file for that), and inserts zero-width spaces into the Thai text. Zero-width spaces are invisible, but are recognized by FLTR. It was very important to me to find a space character that is invisible, because I’m so used to reading Thai without spaces that I get confused when I have to read spaced out Thai.
I use Python, which comes with my Mac, and have a terminal open to process texts:
Here’s what the parser does:
- Read in dictionary D (uses the FLTR vocabulary file, which is a tab separated text file)
- Read in the text
- For every ‘sentence’ S (set of Thai characters between two spaces) of the text, set i = j = 1 and do until i reaches the end of S:
- Define the snippet X = S(i, j), i.e., the characters in S between positions i and j
- If X is a word in D, note down this particular snippet
- If j has reached the end of S, go to 7, otherwise set j = j+1 and go to 4
- If snippets have been identified as words: choose the longest of those, insert zero-width spaces accordingly, set i to the index of the character right after that word, and start over at 4
- If no snippets have been identified as words, set i = j = i+1 and start over at 4
The parser finds the longest word, and then restarts on the remainder. If no words have been found, it starts with the second, then third, etc., character, and finds the first word in the middle of the ‘sentence’. The more words the parser has in its dictionary, the more likely it is that new words are isolated between known words. Those words then will show up in blue in FLTR and can be marked according to whether they are already known or still unknown. Once they have been marked, they’re in the database and increase parsing accuracy.
This parser is not perfect. It doesn’t work very well in the beginning: If new words come in chunks, a manual update of the database might be required to resolve that. It also can’t distinguish between มา-กลับ and มาก-ลับ. The first issue disappears over time, but the second stays (and would require semantic parsing to be resolved). If you have ideas on how to deal with those issues, please let me know in the comments!
FLTR is a great little piece of software. It supports intensive reading and facilitates vocabulary work (whether monolingual or using translation). Look-ups are one click away, notes (or translations) are stored and show up when hovering over the word, and the color coding can be a useful visual aid. The only inconvenience is the necessity to have a parser, but a basic parser is not too difficult to write yourself.