JagPDF
Prev Up Home Next

3.3.  Text

JagPDF supports the following font types:

  • Standard 14 Fonts - A set of Type 1 fonts which needn't be embedded into a PDF document. Using these fonts significantly reduces the size of a PDF document in comparison with other font types.
  • TrueType, OpenType - The OpenType font format is a backward compatible extension to the TrueType font format. TrueType is fully supported including subsetting. As far as OpenType is concerned, there are two flavors:
    • With TrueType outlines - fully supported including subsetting.
    • With PostScript outlines - supported without subsetting.
[Note]

For certain font formats, the PDF specification recommends embedding of font files into PDF documents. Font subsetting is a technique that embeds only a subset of a font. A font subset is a portion of a regular font file that contains information only about the glyphs that are actually used in a document. Font subsetting results in smaller PDF documents.

Let's look at the following line which places a string on a page:

canvas.text(50, 760, "Text")

The text is shown using the current font in the graphics state. As we did not specify a font a default one is used. The default font is specified by 'fonts.default' option (see Profile). The following code snippet changes the current font:

courier = doc.font_load("standard; name=Courier; size=14")
canvas.text_font(courier)
canvas.text(50, 780, "Courier Text")

Now that we have loaded a standard font with font_load() we can use it anywhere in the document with text_font(). Let's look at the example of loading a font file:

dejavu = doc.font_load("file=DejaVuSans.ttf; size=14")
canvas.text_font(dejavu)
canvas.text(50, 800, "DejaVu Sans")

The current font is reset to a default font (i.e 'fonts.default') at the beginning of each page.

Font Matching

On Windows, it is possible to select a font by specifying some of its attributes:

verdana = doc.font_load("name=Verdana; size=14; bold; enc=windows-1252")
canvas.text_font(verdana)
canvas.text(50, 800, "Verdana Text")

It invokes the Windows system font selection mechanism which selects a font from the system font database according to our font specification.

Font Selection Summary

Examples

pdficon_small font selection, pdficon_small Windows font matching

Font Selection Reference

font_load(), text()

PDF Reference

the PDF Reference, chapter Text

We should distinguish between the following encoding types:

  • A character encoding encodes letters, numerals, and other symbols as code points.
  • A font encoding is a correspondence between code points and glyph descriptions. Every font has a built-in encoding.

To show text correctly, the character encoding of text being shown must match the current font's built-in encoding. PDF allows changing the font's built-in encoding. This can be useful for several reasons like for instance in case of text which is encoded by a different encoding.

When a font is loaded by font_load() we can change the font built-in encoding by using the 'enc' option. Note, that we must ensure that the character set (or at least the subset we use in our document) defined by the new encoding is a subset of the character set which is present in the font.

JagPDF internally supports a subset of encodings listed here. For more details see font_load().

Refer to Language Specific Notes for details on how JagPDF handles strings passed from Python and Java.

Before we proceed, let's define a unicode variable which we will be using in the rest of this section. The human readable form of this string is žluťoučký kůň úpěl.

unicode_text = u'\u017elu\u0165ou\u010dk\u00fd k\u016f\u0148 \u00fap\u011bl'

The Type 1 font format supports only 8-bit encodings. However, JagPDF allows to use Unicode text with this type of font.

In the following example we will load the standard Helvetica font, change its built-in encoding to the ISO-8859-2 encoding and set it as the current font:

font = doc.font_load("standard; name=Helvetica; size=14; enc=iso-8859-2")
canvas.text_font(font)

Now, we can show our ISO-8859-2 encoded text on the page:

text = unicode_text.encode('iso-8859-2')
canvas.text(50, 800, text)

Here is an example of using Unicode text with a standard font.

font = doc.font_load("standard; name=Helvetica; size=14; enc=utf-8")
canvas.text_font(font)
canvas.text(50, 780, unicode_text)
[Note]

Even though JagPDF allows to change the font built-in encoding to UTF-8, we are in practice limited to the Latin alphabet, because of the character set used by the standard fonts.

If no encoding is specified then a font built-in encoding is used. Character sets, built-in encodings and other details about the standard 14 fonts can be found here.

The TrueType and OpenType formats support various encodings. We can specify any encoding JagPDF internally supports.

In the following example we will load DejaVuSans.ttf (assuming that it resides in the current directory), specify the windows-1250 encoding and set it as the current font:

font = doc.font_load("file=DejaVuSans.ttf; size=14; enc=windows-1250")
canvas.text_font(font)

Now, we can show our windows-1250 encoded text on the page:

text = unicode_text.encode('windows-1250')
canvas.text(50, 800, text)

We can also specify the UTF-8 encoding:

font = doc.font_load("file=DejaVuSans.ttf; size=14; enc=utf-8")
canvas.text_font(font)
text = unicode_text.encode('utf8')
canvas.text(50, 780, text)

Python unicode objects are automatically converted to UTF-8 when passed to JagPDF's text showing functions so we can use unicode_text here as well:

canvas.text(50, 760, unicode_text)

If no encoding is specified then the windows-1252 encoding is used.

Examples

pdficon_small encoding

Encoding Reference

font_load(), text()

PDF Reference

the PDF Reference, chapter Text

So far we have used only a simple method for showing text. We will generalize it by introducing a text object. The following line should be familiar to us:

canvas.text(50, 800, "text")

It is actually a shortcut for the following sequence:

canvas.text_start(50, 800)
canvas.text("text")
canvas.text_end()

Here, we have started a text object with text_start() and moved the origin of text space by (50, 800) Then we shown a text string and closed the text object with text_end().

A text object encloses a sequence of operations that show text, move the text position or adjust the text state. The text state is described in detail in section text state. Here we will mention two of its parameters which are defined only within a text object:

  • Text matrix - together with some of the text state parameters defines text space
  • Text line matrix - captures the value of text matrix at beginning on a line of text.

The text coordinates are interpreted in text space. The transformation from text space to user space is specified by the combination of the text matrix and several text state parameters. A text-showing operation shows the first glyph of a text string at the origin of the text space. Initially, the origin of text space corresponds to the origin of user space, just translated by values passed to text_start(). The text matrix is updated by both text-showing and text position moving operations. Text space is described in detail in the PDF Reference, chapter Text | Text Objects | Text Space Details

There are certain restrictions imposed on a text object. Text objects cannot be nested. The following table lists canvas operations that can appear in a text object:

Category

Description

Text State

All operations described in section Text State.

Text Positioning

text_translate_line()

Text Showing

Showing operations that do not specify coordinates, i.e text(), text(), text() and text().

Graphics State

Any operations modifying graphics state except state_save(), state_restore() and user space transformations.

So when should we use a text object? Because text operations inside a text object update the text matrix a text object is suitable for a sequence of text strings like for instance a paragraph of a text.

To illustrate, let's look at the following example

canvas.text_start(50, 800)
for n in ['1st', '2nd', '3rd']:
    canvas.text(n)
    canvas.text(' line')
    canvas.text_translate_line(0, -15)
canvas.text_end()

In this example we have shown three lines of text within a text object. We used text_translate_line() to move to the next line.

The following example is similar. It shows multi-line text, each text is shown in a different color.

canvas.text_start(50, 800)
for perc in range(10, 100, 10):
    canvas.color('f', perc/100.0)
    canvas.text('gray %d%%' % perc)
    canvas.text_translate_line(5, -15)
canvas.text_end()

See other examples of using a text object in the Font Information section.

Examples

pdficon_small text

Text Object Reference

text_start(), text_end(), text_translate_line(), text(), text(), text(), text()

PDF Reference

the PDF Reference, chapter Text | Text Objects

The text state is a subset of the graphics state parameters which are related to text operations. The parameters are initialized with default values at the beginning of each page. Text state changes made by the following operations can appear outside a text object and are retained across text objects:

Text Font

We are already familiar with text_font() which sets the the current font.

Character Spacing
canvas.text_character_spacing(0.25)
canvas.text(50, 640, "character spacing")
Horizontal Scaling
canvas.text_horizontal_scaling(50)
canvas.text(50, 660, "horizontal scaling")
Text Rise
canvas.text_start(50, 680)
canvas.text("text rise ")
canvas.text_rise(5)
canvas.text("superscripted ")
canvas.text_rise(-5)
canvas.text("subscripted")
canvas.text_end()
Text Rendering Mode
font = doc.font_load('standard; name=Helvetica; size=36')
canvas.text_font(font)
canvas.color('f', 0.5)
canvas.text_rendering_mode('f')
canvas.text(50, 720, "fill text")
canvas.text_rendering_mode('s')
canvas.text(50, 760, "stroke text")
canvas.text_rendering_mode('fs')
canvas.text(50, 800, "fill and stroke text")

Refer to text_rendering_mode() to see all available rendering modes.

Text State Summary

Examples

pdficon_small text state

Text State Reference

text_character_spacing(), text_horizontal_scaling(), text_rendering_mode(), text_rise(), text_font(),

PDF Reference

the PDF Reference, chapter Text | Text State Parameters and Operators

PDF allows individual glyph positioning. When showing a string we can specify additional values for individual glyphs. These values adjust the text position. The value is expressed in thousandths of a unit of text space. It is subtracted from the current horizontal coordinate.

To illustrate, let's look at the following example:

canvas.text(50, 750, "AWAY")
canvas.text(50, 765, "AWAY", [-220, -220, -195], [1, 2, 3])

The first string is shown according to glyph metrics found in the current font. In the second case, we passed two lists along with the string. The first list specifies offsets, the second one their positions in the string. Let's describe what exactly happens. First, A is painted. Then the text position is moved by 220 units to the right and W is painted. The same is repeated for each offset value in the list. So distance between A and W, W and A and A and Y is 220, 220 and 195 units respectively. In the next example we will change width of space between two words to 300 units:

canvas.text(50, 785, "AWAY again")
canvas.text(50, 800, "AWAY again", [-300], [4])
[Note]

When kerning is turned on, both the glyph and kerning adjustments can be specified for the same position. In such case, the glyph adjustment is preferred.

Glyph Positioning Summary

Examples

pdficon_small glyph positioning

Individual Glyph Positioning Reference

text(), text(), text(), text()

PDF Reference

the PDF Reference, chapter Text | Text Objects | Text-Showing Operators

Advanced layout engines (e.g. ICU Layout Engine) usually output glyph indices instead of Unicode code points. The reason is that a font can contain glyphs that do not have corresponding Unicode code points. A typical example are complex scripts fonts.

Let's show an example of how JagPDF supports showing of text specified by glyph indices:

dejavu = doc.font_load("file=DejaVuSans.ttf; size=14")
canvas.text_font(dejavu)
# glyph indices for 'g', 'l', 'y', 'p', 'h', and 's'
glyphs = [74, 79, 92, 83, 75, 86]
canvas.text_glyphs(50, 800, glyphs)
canvas.text_glyphs(50, 780, glyphs, [-130.0, -130.0], [2, 3])

Note that the second text_glyphs() individually positions the 2nd and 3rd glyph. We can also retrieve width of the glyph at a given index:

# width of the glyph at index 74, i.e. 'g'
dejavu.glyph_width(74)
Glyph Indices Summary

Examples

pdficon_small glyph indices

Glyph Indices Reference

text_glyphs(), text_glyphs(),

An instance of class Font retrieved from font_load() provides, besides unique font identification, also information about the font. Let's show an example which explores font properties.

font = doc.font_load('standard; size=14; name=Helvetica')
family = font.family_name()
is_bold = font.is_bold()
is_italic = font.is_italic()
size = font.size()
baseline_distance = font.height()
ascender = font.ascender()
descender = font.descender()
bbox = (font.bbox_xmin(), font.bbox_ymin(), \
        font.bbox_xmax(), font.bbox_ymax())

In the next example we will illustrate font metrics visually. Let's define a function drawing metrics for a given glyph.

def show_glyph_metrics(x, y, font, char):
    advance = font.advance(char)
    # bounding box
    bbox_w = font.bbox_xmax() - font.bbox_xmin()
    bbox_h = font.bbox_ymax() - font.bbox_ymin()
    canvas.rectangle(x+font.bbox_xmin(), y+font.bbox_ymin(), bbox_w, bbox_h)
    # advance
    canvas.move_to(x, y)
    canvas.line_to(x + advance, y)
    # ascender & descender
    canvas.rectangle(x, y + font.descender(), \
                     advance, font.ascender() - font.descender())
    # show it
    canvas.path_paint('s')
    canvas.text(x, y, char)

Note that all numeric values retrieved from Font are in default user space units. Now we can load a font and show glyph metrics e.g. for 'g':

f = doc.font_load('standard; size=60; name=Helvetica')
canvas.text_font(f)
show_glyph_metrics(140, 740, f, 'g')

The result of the above example can be found in the PDF file referenced from the Summary of this section.

Font properties are useful when we want to perform more advanced text formatting. In the following example we will show how to use font properties to implement various line alignment styles. Let's start with defining required line width:

line_width = 558

Now we will load a font and retrieve width of a text string which is going to be aligned:

font = doc.font_load('standard; size=16; name=Times-Roman')
txt = "Everything should be made as simple as possible, but no simpler."
text_width = font.advance(txt)

So let's align our text to the left - there is nothing special about it:

canvas.text_start(20, 420)
canvas.text_font(font)
canvas.text(txt)
canvas.text_translate_line(0, font.height())

Now we will calculate padding which represents amount of whitespace that should be added to our text string to match line_width.

gap = line_width - text_width
padding = -1000.0 / font.size() * gap

We can align the string to the right by offsetting the first glyph by padding:

canvas.text(txt, [padding], [0])
canvas.text_translate_line(0, font.height())

Or we can align on the center:

canvas.text(txt, [padding / 2.0], [0])
canvas.text_translate_line(0, font.height())

Finally, a justified text string can be achieved by distributing padding evenly among spaces:

spaces = [i for i in range(len(txt)) if txt[i] == ' ']
num_spaces = len(spaces)
canvas.text(txt, num_spaces * [padding / num_spaces], spaces)
canvas.text_end()

In this example we will construct a very primitive text formatter capable of various paragraph alignment styles.

The first thing we need for achieving of our goal is to implement a line breaking function which for given paragraph and line width finds line breaks and calculates width of individual lines.

def line_breaking(font, para, width):
    lines = []
    space = font.advance(' ')
    line, curr_width = [], 0
    for word in para.split():
        adv = font.advance(word)
        if curr_width + space + adv > width:
            lines.append((' '.join(line), curr_width - space))
            line, curr_width = [], 0
        curr_width += space + adv
        line.append(word)
    if line:
        lines.append((' '.join(line), curr_width - space))
    return lines

Now we can implement a text formating function. We will find line breaks for each paragraph and align it according to alignment.

def format_text(x, y, canvas, font, text, width, alignment):
    canvas.text_font(font)
    canvas.text_start(x, y)
    for para in text.split('\n'):
        lines = line_breaking(font, para, width)
        for line, line_width in lines:
            gap = width - line_width
            padding = -1000.0 / font.size() * gap
            if alignment == 'left':
                canvas.text(line)
            elif alignment == 'right':
                canvas.text(line, [padding], [0])
            elif alignment == 'centered':
                canvas.text(line, [padding / 2.0], [0])
            elif alignment == 'justified':
                spaces = [i for i in range(len(line)) if line[i] == ' ']
                num_spaces = len(spaces)
                if not num_spaces or line == lines[-1][0]:
                    canvas.text(line)
                else:
                    canvas.text(line, num_spaces * [padding / num_spaces], spaces)
            canvas.text_translate_line(0, -font.height())
        canvas.text_translate_line(0, 1.5 * -font.height())
    canvas.text_end()

Now that we have our text formatter, we can try it out. long_text() is a function returning a multi-paragraph text string.

font = doc.font_load('standard; size=10; name=Helvetica')
format_text(20, 822, canvas, font, long_text(), 558, "justified")

Examples

pdficon_small font info

Individual Glyph Positioning Reference

class Font

Prev Up Home Next