Package src :: Package inspectors :: Module odt_inspector

[frames] | no frames]

Module odt_inspector

source code

The module provides the methods for inspecting odt files.

Author: Vili Auvinen, Juho Tammela, Olli Kauppinen

Functions

[hide private]

_getStyleElementByStyleName(documentDict, styleName)
Gets style element by the given style name.

source code

_getStyleElementByDisplayName(documentDict, styleName)
Gets the style element by the given style name.

source code

_getStyleDisplayNameByStyleName(documentDict, styleName)
Gets the style display name by the given style name.

source code

_getMasterPageStyleElement(documentDict, masterPageStyleName)
Get the master page style element by the given master page style name.

source code

_getPageLayoutElement(documentDict, pageLayoutName)
Gets page layout element by the given page layout name.

source code

getPageMarginals(documentDict)
Get the page marginals.

source code

getPageSize(documentDict)
Get the page size.

source code

_getUsedMasterPageElements(documentDict)
Get the all master page elements which are used in document.

source code

_getDefaultStyleElement(documentDict, styleFamily)
Get the default style element by the given style family.

source code

_getMasterPageStyleName(documentDict, styleElement)
Get the master page style name by the given style element.

source code

checkEmptyParagraphs(documentDict)
Checks the empty paragraphs from document.

source code

checkDoubleWhitespaces(documentDict)
Checks double spaces.

source code

checkTabs(documentDict)
Checks tabulators from the document.

source code

checkAsterisk(documentDict)
Checks asterisk from the document.

source code

_getDocumentParagraphs(documentDict)
Gets all the paragraphs from the document.

source code

_getListOfUsedStyleElements(documentDict)
Gets all used style elements.

source code

_getListOfUsedStyleNames(documentDict)
Gets all used style names.

source code

_checkPageBreakStyleElement(styleElement)
Checks if the given style element contains the page break.

source code

_getPageBreakStyleNames(documentDict)
Gets the names of the styles which contains the page break.

source code

_getAllStyleNamesWithDifferentMasterPage(documentDict)
Gets all the style names which changes the master page.

source code

_getSectionBreakElements(documentDict)
Gets section break elements from the document.

source code

_getTOC(documentDict)
Gets table of content.

source code

checkTocContent(documentDict)
Compares document headings to the TOC entries.

source code

checkTOC(documentDict)
Checks if the document contains the table of contents.

source code

checkIndex(documentDict)
Checks if the document have the alphabetical index.

source code

_getIndexContentFromDocument(documentDict)
Gets marked alphabetical index entries from the document.

source code

_getIndexContent(documentDict)
Gets the alphabetical index content.

source code

checkIndexContent(documentDict)
Compares the document marked texts to the alphabetical index entries.

source code

_getHeadingList(documentDict)
Gets all headings from the document and the used outline level.

source code

checkTable(documentDict)
Checks if the document has a table.

source code

_getTableDict(documentDict)
Gets tables in dictionary.

source code

checkPageNumberFromFooterAndHeader(documentDict, masterPageElement, element)
Checks page number format by given element and master page element.

source code

getAuthorAndPageNumberFormat(documentDict, masterPageElement)
Gets the author and the number format from the header and the footer.

source code

checkHeadingNumbering(documentDict, errorIdsAndPositions)
Checks the outline style.

source code

_getImagePaths(documentDict)
Gets the image paths.

source code

checkImages(documentDict)
Checks if the document contains an image.

source code

checkList(documentDict)
Checks if the document contains a list.

source code

printLists(documentDict)
Prints the lists of the document.

source code

getObjectPaths(documentDict)
Gets objects paths.

source code

_isStyleUsed(documentDict, styleName)
Checks if the given style is used in the document.

source code

getStyle(documentDict, styleName)
Get style defination attributes by given style name.

source code

_getStyleAttributes(documentDict, styleAttributeList)
Searches if the style have the wanted attribute if it have then replace attribute value, otherwise keep old value.

source code

_getParentStyleList(documentDict, styleName)
Gets the parent style list for the given style name.

source code

_checkParentStyle(documentDict, styleName)
Checks if the style have a parent style.

source code

checkEndnotesAndFootnotes(documentDict)
Checks the end- and the footnotes.

source code

checkImageCaptions(documentDict)
Checks the caption and the reference of the image.

source code

checkCoverPage(documentDict)
Checks that the front page is done correctly

source code

getPageNumberFormatAndAuthor(documentDict, section)
Gets the page number format and the author name from the document.

source code

checkHeadersAndFooters(documentDict)
Checks that the headers and the footers of the document are made correctly.

source code

_getSectionElements(documentDict, section)
Gets the elements of the wanted section.

source code

checkSections(documentDict, errorList)
Checks that the document sections have been made correctly.

source code

_getMeta(documentDict)
Gets all the meta information.

source code

getMetaAuthor(documentDict)
Gets the author, who have last modified the document.

source code

getMetaTitle(documentDict)
Gets document title from the meta information.

source code

getMetaEdited(documentDict)
Gets the last modified date and time from the meta.

source code

checkStyleUsage(documentDict, errorIdsAndPositions)
Goes through all the elements in the document which have used any style.

source code

Variables

[hide private]

__package__ = 'src.inspectors'

Function Details

[hide private]

_getStyleElementByStyleName(documentDict, styleName)

source code

Gets style element by the given style name. It searches first from the file content.xml if doesn't find then searches from the file style.xml.

Returns:: The style element if the style exists, otherwise returns None.

_getStyleElementByDisplayName(documentDict, styleName)

source code

Gets the style element by the given style name. It searches first from content.xml if doesn't find then searches from style.xml.

Returns:: The element of the style display name if it exists, otherwise returns None.

Note: XML: <style:style style:name="Text_20_body" style:display-name="Text body"> <style:style style:name="tutkielma">

Display name --> style name " "-->"_20_" "_"-->"_5f_"

_getStyleDisplayNameByStyleName(documentDict, styleName)

source code

Gets the style display name by the given style name. It uses _getStyleElementByStyleName method to find the style. If the style doesn't have display-name attribute then the display name is just the style name.

Returns:: The style display name.

Note: XML example:

Display name --> style name " "-->"_20_" "_"-->"_5f_"

_getMasterPageStyleElement(documentDict, masterPageStyleName)

source code

Get the master page style element by the given master page style name.

Returns:: The master page element.

_getPageLayoutElement(documentDict, pageLayoutName)

source code

Gets page layout element by the given page layout name.

Returns:: The page layout.

getPageMarginals(documentDict)

source code

Get the page marginals. Searches for only from used master pages.

Returns:: The page marginals. If the marginals are different between the used pages, then return false.

See Also: convertCmOrInDictToString

getPageSize(documentDict)

source code

Get the page size.

Returns:: The converted page size. If the size is different between the used pages, then returns False.

See Also: convertCmOrInDictToString

_getUsedMasterPageElements(documentDict)

source code

Get the all master page elements which are used in document. 'Standard' master page style used if there is no other definitions.

Returns:: The list of the used master page elements.

_getDefaultStyleElement(documentDict, styleFamily)

source code

Get the default style element by the given style family.

Parameters:

styleFamily - gets wanted default style.
Style family can be paragraph, graphic, table or table-row.

Returns:

default style element.

Note:

Every style is based on style family.

_getMasterPageStyleName(documentDict, styleElement)

source code

Get the master page style name by the given style element.

Returns:: The master page name, if master page is '' then return 'Standard'.

checkEmptyParagraphs(documentDict)

source code

Checks the empty paragraphs from document. getDocumentPararaphs method gets all paragraphs to be checked for. An empty paragraph is permitted after the table of content and in page break elements.

Returns:: The number of the empty paragraphs if efound, otherwise returns False.

checkDoubleWhitespaces(documentDict)

source code

Checks double spaces. Checks if the document has text:s tag.

Returns:: The amount of the double spaces.

Note: XML example:

<text:s text:c="2"/> --> 3 spaces

<text:s/> --> 2 spaces

checkTabs(documentDict)

source code

Checks tabulators from the document. getDocumentPararaphs method gets the all paragraphs to be checked for.

Returns:: The number of the tabulators if found, otherwise returns False.

checkAsterisk(documentDict)

source code

Checks asterisk from the document. getDocumentPararaphs method gets all paragraphs to check for.

Returns:: The number of the asterisks if found, otherwise returns False.

_getDocumentParagraphs(documentDict)

source code

Gets all the paragraphs from the document. Including all text-p(text paragraphs) and text-h (headings) elements. It is used in checkTabs and checkEmptyParagraphs.

Returns:: The list of the used elements.

_getListOfUsedStyleElements(documentDict)

source code

Gets all used style elements. In the file content.xml the element office:body contains the used styles.

Returns:: The element list of the used styles.

_getListOfUsedStyleNames(documentDict)

source code

Gets all used style names. Gets the parent style of PI-style (I is integer value) like P1 is Heading_20_1.

Returns:: The list of all the style names.

_checkPageBreakStyleElement(styleElement)

source code

Checks if the given style element contains the page break.

Returns:: The style element if contains the page break, otherwise returns False.

_getPageBreakStyleNames(documentDict)

source code

Gets the names of the styles which contains the page break.

Returns:: The list of page break style names.

_getAllStyleNamesWithDifferentMasterPage(documentDict)

source code

Gets all the style names which changes the master page. The master page will change when a style has master-page-name attribute and its is nonempty. If is empty ("") then master page is standard and if has no attribute with same as previous master page.

Returns:: The dictionary of the styles which changes master page.

Note: masterPageDict: contains a key as a style name and value as a master page name.

_getSectionBreakElements(documentDict)

source code

Gets section break elements from the document. Finds all the elements (including text, list, heading...) which chance the section.

Returns:: The list of the elements which changes the section.

_getTOC(documentDict)

source code

Gets table of content. Each TOC entry is own entry in tocList.

Returns:: The list of the elements in table of content.

checkTocContent(documentDict)

source code

Compares document headings to the TOC entries.

Returns:: True if all entries matches otherwise returns an error message.

checkTOC(documentDict)

source code

Checks if the document contains the table of contents.

Returns:: True if there is the table of content, otherwise returns False.

checkIndex(documentDict)

source code

Checks if the document have the alphabetical index.

Returns:: True if the alphabetical index exists otherwise returns False.

_getIndexContentFromDocument(documentDict)

source code

Gets marked alphabetical index entries from the document.

Returns:: The content list of the alphabetical index entries.

_getIndexContent(documentDict)

source code

Gets the alphabetical index content. Each alphabetical index entry is an own entry in the list.

Returns:: The list of the alphabetical index content.

checkIndexContent(documentDict)

source code

Compares the document marked texts to the alphabetical index entries.

Returns:: True if all entries matches otherwise returns an error code.

_getHeadingList(documentDict)

source code

Gets all headings from the document and the used outline level. Each heading is an own entry in the list.

Returns:: The dictionary ['headings'] contains a list of headings and ['level'] contains the value of the highest used heading outline level.

checkTable(documentDict)

source code

Checks if the document has a table.

Returns:: True if there is a table and False if not.

_getTableDict(documentDict)

source code

Gets tables in dictionary. Every table is own entry in tablesDict (key = table1,table2...) Every tableDict has table's cell address as key (A1,A2...) and cell value as dictionary's value.

Returns:: The dictionary of the table dictionaries.

checkPageNumberFromFooterAndHeader(documentDict, masterPageElement, element)

source code

Checks page number format by given element and master page element.

Parameters:

masterPageElement - the master page element to look for.
element - a footer or a header element.

Returns:

The number format if it exists, otherwise returns False.

The number format is optionally in the element (footer or header). If the number format is not in the element then the page-layout element defines number format.

getAuthorAndPageNumberFormat(documentDict, masterPageElement)

source code

Gets the author and the number format from the header and the footer.

Parameters:

masterPageElement - the master page element to look for.

Returns:

The dictionary which contains the author and the page number format.

checkHeadingNumbering(documentDict, errorIdsAndPositions)

source code

Checks the outline style. Level is highest used headings outline level. Normally Heading 1 should be 1 and Heading 2 should be 2.

Returns:: True if ok, False if not.

Note: XML example:

<text:outline-style style:name="Outline">

<text:outline-level-style text:level="1" style:num-format="1">

</style:list-level-properties>

</text:outline-level-style>

<text:outline-level-style text:level="2" style:num-format="1" text:display-levels="2">

</style:list-level-properties>

</text:outline-level-style>

<text:outline-level-style text:level="3" style:num-format="">

</style:list-level-properties>

</text:outline-level-style>

...

</text:outline-style>

_getImagePaths(documentDict)

source code

Gets the image paths. Checks if the document have an image. Images are located in the picture folder.

Returns:: The founded paths of the images in the list, otherwise returnsFalse.

checkImages(documentDict)

source code

Checks if the document contains an image.

Returns:: True if there is an image, otherwise False.

checkList(documentDict)

source code

Checks if the document contains a list.

Returns:: True if there is a list, otherwise False.

printLists(documentDict)

source code

Prints the lists of the document.

To Do: getListContent

getObjectPaths(documentDict)

source code

Gets objects paths. Searches if the document have an image.

Returns:: The object path list if founds an image, otherwise an error message

_isStyleUsed(documentDict, styleName)

source code

Checks if the given style is used in the document.

Returns:: True if used, otherwise return False.

getStyle(documentDict, styleName)

source code

Get style defination attributes by given style name. parentStyleList is for executing the inheritation of styles.

Returns:: The style defination dictionary.

Notes:

Inheritation of the styles:
default paragraph -style-> standard-style -> style(Text body) -> P-style -> T-style
XML example (styles.xml):
<style:style style:name="Standard" style:family="paragraph" style:class="text">

<style:paragraph-properties fo:orphans="2" fo:widows="2" style:writing-mode="lr-tb"/>

<style:text-properties style:use-window-font-color="true" style:font-name="Courier New" fo:font-size="10pt" fo:language="fi" fo:country="FI" style:font-name-asian="Times New Roman" style:font-size-asian="10pt" style:font-name-complex="Times New Roman" style:font-size-complex="10pt" style:language-complex="ar" style:country-complex="SA"/>

</style:style>

<style:style style:name="Text_20_body" style:display-name="Text body" style:family="paragraph" style:parent-style-name="Standard" style:class="text" style:master-page-name="">

<style:paragraph-properties fo:margin-left="1cm" fo:margin-right="0cm" fo:margin-top="0.247cm" fo:margin-bottom="0.247cm" fo:text-indent="0cm" style:auto-text-indent="false" style:page-number="auto" fo:break-before="auto" fo:break-after="auto"/>

<style:text-properties style:font-name="Tahoma"/>

</style:style
XML example (content.xml):
<style:style style:name="P2" style:family="paragraph" style:parent-style-name="Text_20_body">

<style:paragraph-properties fo:text-align="start" style:justify-single-word="false"/>

</style:style>

_getStyleAttributes(documentDict, styleAttributeList)

source code

Searches if the style have the wanted attribute if it have then replace attribute value, otherwise keep old value. The style attribute list contains all the relevant style information.

Returns:: The list of the styles attributes.

_getParentStyleList(documentDict, styleName)

source code

Gets the parent style list for the given style name.

Returns:: The list of parent styles (lists first entry is style itself).

_checkParentStyle(documentDict, styleName)

source code

Checks if the style have a parent style.

Returns:: The parent style name.

checkEndnotesAndFootnotes(documentDict)

source code

Checks the end- and the footnotes.

Returns:: True if there is endnote or footnote in the document, otherwise False.

checkImageCaptions(documentDict)

source code

Checks the caption and the reference of the image.

Returns:: True if the document images have caption and reference, otherwise False.

checkCoverPage(documentDict)

source code

Checks that the front page is done correctly

Parameters:

title - True if the title in cover page is the same as in the document meta.
name - True if the cover page contains the same author name as in the document meta.
email - True if the cover page contains e-mail address.

Returns:

The cover definitions in a dictionary.

getPageNumberFormatAndAuthor(documentDict, section)

source code

Gets the page number format and the author name from the document.

Parameters:

section - can have a value 'cover', 'toc' or 'text'.

Returns:

The dictionary which contains the author and the page number information.

checkHeadersAndFooters(documentDict)

source code

Checks that the headers and the footers of the document are made correctly.

Assumes that the document has three sections:

the cover section,
the table of contents section or the toc section and
the actual content section or the text section.

See Also: checkSections method must pass in order to run this method

Places findings in the headerAndFooterDict as key-boolean pairs:

'frontPage' was there headers or footers in the cover section.
'tocPageNumbering' is there a page numbering in the toc section.
'differentPageNumbering' is the page numbering different in the cover and text sections.
'nameInToc' is the last modifiers name in toc section header or footer.
'nameInText' is the last modifiers name in text section header or footer.
'pageNumbering' is there a page numbering in the text section.
'tocNumStart' does the toc section page numbering start at 1.
'textNumStart' does the text section page numbering start at 1.

_getSectionElements(documentDict, section)

source code

Gets the elements of the wanted section. The section break elements changes the section. Searches trough the whole document. Adds each element to right section in sectionElements dictionary. When finds section break element then changes the dictionary to next section. First list elements to cover-section. Second list elements to toc-section. And last list element to text-section. Document have to have atleast 3 sections.

Returns:: The section elements in the list.

checkSections(documentDict, errorList)

source code

Checks that the document sections have been made correctly. If the amount of the section breaks is not over 3 then return the error message list.

Returns:: True if the sections are ok, return errorList if not ok.

_getMeta(documentDict)

source code

Gets all the meta information.

Returns:: All the meta in the dictionary.

See Also: ooo_meta_inspector.getMeta

getMetaAuthor(documentDict)

source code

Gets the author, who have last modified the document.

Returns:: The last modified author.

getMetaTitle(documentDict)

source code

Gets document title from the meta information.

Returns:: The title which have defined in meta information.

getMetaEdited(documentDict)

source code

Gets the last modified date and time from the meta.

Returns:: The last modified date in ISO 8601 standard (yyyy-mm-ddThh:mm:ss)

checkStyleUsage(documentDict, errorIdsAndPositions)

source code

Goes through all the elements in the document which have used any style. Checks that elements are using the correct styles (i.e. not Standard or Default style) and that no manual style definitions are made (like T1).