- Segment the Text region of the image. Try for text region detection, you can achieve this by applying morphological operations, such as dilation and erosion, to the binary image.
- You can apply filter based on the area, aspect ratio, or any other characteristic that is relevant to test cases.
- Measure font size: Once the text identification done in the text regions, you can further analyze the bounding box dimensions to estimate the font size. You can calculate the height, width, or diagonal length of the bounding box to approximate the font size in pixels.
How do you know the font size in an image?
144 views (last 30 days)
KALYAN ACHARJYA on 8 Jun 2023
Edited: KALYAN ACHARJYA on 9 Jun 2023
One Way (Limitation based on Input Data): Steps
Please note that the segmentation steps, like morphological operations, and text region identification might vary depending on your image and desired results. You may have to change those steps or parameters as per desired results.
DGM on 8 Jun 2023
Edited: DGM on 8 Jun 2023
The short answer is that you probably can't -- at least not accurately, consistently, or in the manner that's probably expected. The lack of stated constraints or requirements only make the problem more complicated and less likely to have a satisfactory solution.
You have text in an image. That could be any number of things.
It could be plain grid-aligned text, but that doesn't mean that it's necessarily going to be easy to segment
There's nothing saying it's grid-aligned. The cases where one wants to know about the fonts used in a design are also cases where it's likely that the designer did something awful with the text.
There's also nothing saying that the text is a synthetic component of the image. Was this paper printed in 14pt or 16pt?
Or maybe you expect a miracle.
In all cases, you need to be able to
- isolate the text from the background
- orient the text so that the characters have consistent orientation and scale
- get information about its scale with respect to whatever the desired "size" metric is
It's easy to come up with cases where #1 is difficult.
Orienting the text might be possible, but probably difficult. Would you be able to tell the difference between text that's been shaped by baseline displacement and text that's been shaped by deforming the characters?
If the text has been scaled, would you know? Would you have any reference to indicate what it once was? If you had some clean synthetic text like this, which part of the text is indicative of the "size"? Is it 40pt text that's been scaled up at one end, or is it 56pt text that's been scaled down at one end? Is it 72pt text that's been scaled both ways and then resized afterward? Similarly, if it's a photo, is there any spatial calibration information available?
Would you be able to do any of these things automatically with any degree of reliability?
Let's now assume all our images can be reduced to clean binarized text that's grid-aligned. Now it's worth asking at this point what "size" actually means. What are the units (px, pt, in, mm)? Are we talking about the size of the text within the image space, or the size of the text within some physical space represented in a photo? Are we talking about the nominal size of the font (i.e. the em height), or are we talking about the size of the particular text (i.e. the bounding box of a word)?
Let's make another simplifying assumption. Let's say that our text is purely synthetic and all we care about is the em height in pixels in the image space. We don't need any resolution information or anything. We should just be able to measure the height of the characters directly in the image, right? ... right?
The answer is no. The "size" of a font is its body height or em height. This is often approximately the height between descending and ascending features (or approximately 1.4 times the cap height), but that's often not the case, and the relationship between the em height and character features is inconsistent between fonts. In order to get a better estimate, you would need to identify the font, and you would need knowledge of that particular font's characteristics. Can it be done? Sure, but it's not a simple task.
If instead you wanted to settle for a simpler approximation based on ascender-descender distance or cap size, then you would need a large enough sample of characters to even get that information. Would you be able to programmatically determine whether you do have enough characters? Would you be able to tell if a bounding box is defined by cap height or ascender/descender height? Do you know whether the ascenders rise above cap height for the given font? Do you know where the baseline is?
Open up the following image.
These are three fairly mundane fonts of the same size and weight. The height of the yellow rectangle is 1 em. The other four rectangles describe the nominal distance between ascenders and descenders (green), the height of ascenders above the baseline (blue), the cap height (purple), and the x-height (orange). These are sized relative only to the first font sample.
Note that ascender and descender heights may vary within a font, and the relationships between cap height, x-height, and ascender height will vary between fonts. None of these things are equal to em height, and while the differences seem subtle in these three examples, the introduction of a script font will throw all subtlety out the window.
I'm aware that the original question was insincere, and that those motives justified its complete lack of specific details. That said, I'll take one last swing and point out that nobody said we were talking strictly about latin script.