Package src :: Package inspectors :: Module odt_inspector
[hide private]
[frames] | no frames]

Module odt_inspector

source code

The module provides the methods for inspecting odt files.


Author: Vili Auvinen, Juho Tammela, Olli Kauppinen

Functions [hide private]
 
_getStyleElementByStyleName(documentDict, styleName)
Gets style element by the given style name.
source code
 
_getStyleElementByDisplayName(documentDict, styleName)
Gets the style element by the given style name.
source code
 
_getStyleDisplayNameByStyleName(documentDict, styleName)
Gets the style display name by the given style name.
source code
 
_getMasterPageStyleElement(documentDict, masterPageStyleName)
Get the master page style element by the given master page style name.
source code
 
_getPageLayoutElement(documentDict, pageLayoutName)
Gets page layout element by the given page layout name.
source code
 
getPageMarginals(documentDict)
Get the page marginals.
source code
 
getPageSize(documentDict)
Get the page size.
source code
 
_getUsedMasterPageElements(documentDict)
Get the all master page elements which are used in document.
source code
 
_getDefaultStyleElement(documentDict, styleFamily)
Get the default style element by the given style family.
source code
 
_getMasterPageStyleName(documentDict, styleElement)
Get the master page style name by the given style element.
source code
 
checkEmptyParagraphs(documentDict)
Checks the empty paragraphs from document.
source code
 
checkDoubleWhitespaces(documentDict)
Checks double spaces.
source code
 
checkTabs(documentDict)
Checks tabulators from the document.
source code
 
checkAsterisk(documentDict)
Checks asterisk from the document.
source code
 
_getDocumentParagraphs(documentDict)
Gets all the paragraphs from the document.
source code
 
_getListOfUsedStyleElements(documentDict)
Gets all used style elements.
source code
 
_getListOfUsedStyleNames(documentDict)
Gets all used style names.
source code
 
_checkPageBreakStyleElement(styleElement)
Checks if the given style element contains the page break.
source code
 
_getPageBreakStyleNames(documentDict)
Gets the names of the styles which contains the page break.
source code
 
_getAllStyleNamesWithDifferentMasterPage(documentDict)
Gets all the style names which changes the master page.
source code
 
_getSectionBreakElements(documentDict)
Gets section break elements from the document.
source code
 
_getTOC(documentDict)
Gets table of content.
source code
 
checkTocContent(documentDict)
Compares document headings to the TOC entries.
source code
 
checkTOC(documentDict)
Checks if the document contains the table of contents.
source code
 
checkIndex(documentDict)
Checks if the document have the alphabetical index.
source code
 
_getIndexContentFromDocument(documentDict)
Gets marked alphabetical index entries from the document.
source code
 
_getIndexContent(documentDict)
Gets the alphabetical index content.
source code
 
checkIndexContent(documentDict)
Compares the document marked texts to the alphabetical index entries.
source code
 
_getHeadingList(documentDict)
Gets all headings from the document and the used outline level.
source code
 
checkTable(documentDict)
Checks if the document has a table.
source code
 
_getTableDict(documentDict)
Gets tables in dictionary.
source code
 
checkPageNumberFromFooterAndHeader(documentDict, masterPageElement, element)
Checks page number format by given element and master page element.
source code
 
getAuthorAndPageNumberFormat(documentDict, masterPageElement)
Gets the author and the number format from the header and the footer.
source code
 
checkHeadingNumbering(documentDict, errorIdsAndPositions)
Checks the outline style.
source code
 
_getImagePaths(documentDict)
Gets the image paths.
source code
 
checkImages(documentDict)
Checks if the document contains an image.
source code
 
checkList(documentDict)
Checks if the document contains a list.
source code
 
printLists(documentDict)
Prints the lists of the document.
source code
 
getObjectPaths(documentDict)
Gets objects paths.
source code
 
_isStyleUsed(documentDict, styleName)
Checks if the given style is used in the document.
source code
 
getStyle(documentDict, styleName)
Get style defination attributes by given style name.
source code
 
_getStyleAttributes(documentDict, styleAttributeList)
Searches if the style have the wanted attribute if it have then replace attribute value, otherwise keep old value.
source code
 
_getParentStyleList(documentDict, styleName)
Gets the parent style list for the given style name.
source code
 
_checkParentStyle(documentDict, styleName)
Checks if the style have a parent style.
source code
 
checkEndnotesAndFootnotes(documentDict)
Checks the end- and the footnotes.
source code
 
checkImageCaptions(documentDict)
Checks the caption and the reference of the image.
source code
 
checkCoverPage(documentDict)
Checks that the front page is done correctly
source code
 
getPageNumberFormatAndAuthor(documentDict, section)
Gets the page number format and the author name from the document.
source code
 
checkHeadersAndFooters(documentDict)
Checks that the headers and the footers of the document are made correctly.
source code
 
_getSectionElements(documentDict, section)
Gets the elements of the wanted section.
source code
 
checkSections(documentDict, errorList)
Checks that the document sections have been made correctly.
source code
 
_getMeta(documentDict)
Gets all the meta information.
source code
 
getMetaAuthor(documentDict)
Gets the author, who have last modified the document.
source code
 
getMetaTitle(documentDict)
Gets document title from the meta information.
source code
 
getMetaEdited(documentDict)
Gets the last modified date and time from the meta.
source code
 
checkStyleUsage(documentDict, errorIdsAndPositions)
Goes through all the elements in the document which have used any style.
source code
Variables [hide private]
  __package__ = 'src.inspectors'
Function Details [hide private]

_getStyleElementByStyleName(documentDict, styleName)

source code 

Gets style element by the given style name. It searches first from the file content.xml if doesn't find then searches from the file style.xml.

Returns:
The style element if the style exists, otherwise returns None.

_getStyleElementByDisplayName(documentDict, styleName)

source code 

Gets the style element by the given style name. It searches first from content.xml if doesn't find then searches from style.xml.

Returns:
The element of the style display name if it exists, otherwise returns None.

Note: XML: <style:style style:name="Text_20_body" style:display-name="Text body"> <style:style style:name="tutkielma">

Display name --> style name " "-->"_20_" "_"-->"_5f_"

_getStyleDisplayNameByStyleName(documentDict, styleName)

source code 

Gets the style display name by the given style name. It uses _getStyleElementByStyleName method to find the style. If the style doesn't have display-name attribute then the display name is just the style name.

Returns:
The style display name.

Note: XML example:

<style:style style:name="Text_20_body" style:display-name="Text body">

<style:style style:name="tutkielma">

Display name --> style name " "-->"_20_" "_"-->"_5f_"

_getMasterPageStyleElement(documentDict, masterPageStyleName)

source code 

Get the master page style element by the given master page style name.

Returns:
The master page element.

_getPageLayoutElement(documentDict, pageLayoutName)

source code 

Gets page layout element by the given page layout name.

Returns:
The page layout.

getPageMarginals(documentDict)

source code 

Get the page marginals. Searches for only from used master pages.

Returns:
The page marginals. If the marginals are different between the used pages, then return false.

See Also: convertCmOrInDictToString

getPageSize(documentDict)

source code 

Get the page size.

Returns:
The converted page size. If the size is different between the used pages, then returns False.

See Also: convertCmOrInDictToString

_getUsedMasterPageElements(documentDict)

source code 

Get the all master page elements which are used in document. 'Standard' master page style used if there is no other definitions.

Returns:
The list of the used master page elements.

_getDefaultStyleElement(documentDict, styleFamily)

source code 

Get the default style element by the given style family.

Parameters:
  • styleFamily - gets wanted default style.

    Style family can be paragraph, graphic, table or table-row.

Returns:
default style element.

Note:

Every style is based on style family.

<style:style style:name="Heading_20_1" style:display-name="Heading 1" style:family="paragraph">

_getMasterPageStyleName(documentDict, styleElement)

source code 

Get the master page style name by the given style element.

Returns:
The master page name, if master page is '' then return 'Standard'.

checkEmptyParagraphs(documentDict)

source code 

Checks the empty paragraphs from document. getDocumentPararaphs method gets all paragraphs to be checked for. An empty paragraph is permitted after the table of content and in page break elements.

Returns:
The number of the empty paragraphs if efound, otherwise returns False.

checkDoubleWhitespaces(documentDict)

source code 

Checks double spaces. Checks if the document has text:s tag.

Returns:
The amount of the double spaces.

Note: XML example:

<text:s text:c="2"/> --> 3 spaces

<text:s/> --> 2 spaces

checkTabs(documentDict)

source code 

Checks tabulators from the document. getDocumentPararaphs method gets the all paragraphs to be checked for.

Returns:
The number of the tabulators if found, otherwise returns False.

checkAsterisk(documentDict)

source code 

Checks asterisk from the document. getDocumentPararaphs method gets all paragraphs to check for.

Returns:
The number of the asterisks if found, otherwise returns False.

_getDocumentParagraphs(documentDict)

source code 

Gets all the paragraphs from the document. Including all text-p(text paragraphs) and text-h (headings) elements. It is used in checkTabs and checkEmptyParagraphs.

Returns:
The list of the used elements.

_getListOfUsedStyleElements(documentDict)

source code 

Gets all used style elements. In the file content.xml the element office:body contains the used styles.

Returns:
The element list of the used styles.

_getListOfUsedStyleNames(documentDict)

source code 

Gets all used style names. Gets the parent style of PI-style (I is integer value) like P1 is Heading_20_1.

Returns:
The list of all the style names.

_checkPageBreakStyleElement(styleElement)

source code 

Checks if the given style element contains the page break.

Returns:
The style element if contains the page break, otherwise returns False.

_getPageBreakStyleNames(documentDict)

source code 

Gets the names of the styles which contains the page break.

Returns:
The list of page break style names.

_getAllStyleNamesWithDifferentMasterPage(documentDict)

source code 

Gets all the style names which changes the master page. The master page will change when a style has master-page-name attribute and its is nonempty. If is empty ("") then master page is standard and if has no attribute with same as previous master page.

Returns:
The dictionary of the styles which changes master page.

Note: masterPageDict: contains a key as a style name and value as a master page name.

_getSectionBreakElements(documentDict)

source code 

Gets section break elements from the document. Finds all the elements (including text, list, heading...) which chance the section.

Returns:
The list of the elements which changes the section.

_getTOC(documentDict)

source code 

Gets table of content. Each TOC entry is own entry in tocList.

Returns:
The list of the elements in table of content.

checkTocContent(documentDict)

source code 

Compares document headings to the TOC entries.

Returns:
True if all entries matches otherwise returns an error message.

checkTOC(documentDict)

source code 

Checks if the document contains the table of contents.

Returns:
True if there is the table of content, otherwise returns False.

checkIndex(documentDict)

source code 

Checks if the document have the alphabetical index.

Returns:
True if the alphabetical index exists otherwise returns False.

_getIndexContentFromDocument(documentDict)

source code 

Gets marked alphabetical index entries from the document.

Returns:
The content list of the alphabetical index entries.

_getIndexContent(documentDict)

source code 

Gets the alphabetical index content. Each alphabetical index entry is an own entry in the list.

Returns:
The list of the alphabetical index content.

checkIndexContent(documentDict)

source code 

Compares the document marked texts to the alphabetical index entries.

Returns:
True if all entries matches otherwise returns an error code.

_getHeadingList(documentDict)

source code 

Gets all headings from the document and the used outline level. Each heading is an own entry in the list.

Returns:
The dictionary ['headings'] contains a list of headings and ['level'] contains the value of the highest used heading outline level.

checkTable(documentDict)

source code 

Checks if the document has a table.

Returns:
True if there is a table and False if not.

_getTableDict(documentDict)

source code 

Gets tables in dictionary. Every table is own entry in tablesDict (key = table1,table2...) Every tableDict has table's cell address as key (A1,A2...) and cell value as dictionary's value.

Returns:
The dictionary of the table dictionaries.

checkPageNumberFromFooterAndHeader(documentDict, masterPageElement, element)

source code 

Checks page number format by given element and master page element.

Parameters:
  • masterPageElement - the master page element to look for.
  • element - a footer or a header element.
Returns:
The number format if it exists, otherwise returns False.

The number format is optionally in the element (footer or header). If the number format is not in the element then the page-layout element defines number format.

getAuthorAndPageNumberFormat(documentDict, masterPageElement)

source code 

Gets the author and the number format from the header and the footer.

Parameters:
  • masterPageElement - the master page element to look for.
Returns:
The dictionary which contains the author and the page number format.

checkHeadingNumbering(documentDict, errorIdsAndPositions)

source code 

Checks the outline style. Level is highest used headings outline level. Normally Heading 1 should be 1 and Heading 2 should be 2.

Returns:
True if ok, False if not.

Note: XML example:

<text:outline-style style:name="Outline">

<text:outline-level-style text:level="1" style:num-format="1">

<style:list-level-properties text:list-level-position-and-space-mode="label-alignment">

<style:list-level-label-alignment text:label-followed-by="listtab" text:list-tab-stop-position="0.762cm" fo:text-indent="-0.762cm" fo:margin-left="0.762cm"/>

</style:list-level-properties>

</text:outline-level-style>

<text:outline-level-style text:level="2" style:num-format="1" text:display-levels="2">

<style:list-level-properties text:list-level-position-and-space-mode="label-alignment">

<style:list-level-label-alignment text:label-followed-by="listtab" text:list-tab-stop-position="1.016cm" fo:text-indent="-1.016cm" fo:margin-left="1.016cm"/>

</style:list-level-properties>

</text:outline-level-style>

<text:outline-level-style text:level="3" style:num-format="">

<style:list-level-properties text:list-level-position-and-space-mode="label-alignment">

<style:list-level-label-alignment text:label-followed-by="listtab" text:list-tab-stop-position="1.27cm" fo:text-indent="-1.27cm" fo:margin-left="1.27cm"/>

</style:list-level-properties>

</text:outline-level-style>

...

</text:outline-style>

_getImagePaths(documentDict)

source code 

Gets the image paths. Checks if the document have an image. Images are located in the picture folder.

Returns:
The founded paths of the images in the list, otherwise returnsFalse.

checkImages(documentDict)

source code 

Checks if the document contains an image.

Returns:
True if there is an image, otherwise False.

checkList(documentDict)

source code 

Checks if the document contains a list.

Returns:
True if there is a list, otherwise False.

printLists(documentDict)

source code 

Prints the lists of the document.

To Do: getListContent

getObjectPaths(documentDict)

source code 

Gets objects paths. Searches if the document have an image.

Returns:
The object path list if founds an image, otherwise an error message

_isStyleUsed(documentDict, styleName)

source code 

Checks if the given style is used in the document.

Returns:
True if used, otherwise return False.

getStyle(documentDict, styleName)

source code 

Get style defination attributes by given style name. parentStyleList is for executing the inheritation of styles.

Returns:
The style defination dictionary.
Notes:
  • Inheritation of the styles:

    default paragraph -style-> standard-style -> style(Text body) -> P-style -> T-style

  • XML example (styles.xml):

    <style:style style:name="Standard" style:family="paragraph" style:class="text">

    <style:paragraph-properties fo:orphans="2" fo:widows="2" style:writing-mode="lr-tb"/>

    <style:text-properties style:use-window-font-color="true" style:font-name="Courier New" fo:font-size="10pt" fo:language="fi" fo:country="FI" style:font-name-asian="Times New Roman" style:font-size-asian="10pt" style:font-name-complex="Times New Roman" style:font-size-complex="10pt" style:language-complex="ar" style:country-complex="SA"/>

    </style:style>

    <style:style style:name="Text_20_body" style:display-name="Text body" style:family="paragraph" style:parent-style-name="Standard" style:class="text" style:master-page-name="">

    <style:paragraph-properties fo:margin-left="1cm" fo:margin-right="0cm" fo:margin-top="0.247cm" fo:margin-bottom="0.247cm" fo:text-indent="0cm" style:auto-text-indent="false" style:page-number="auto" fo:break-before="auto" fo:break-after="auto"/>

    <style:text-properties style:font-name="Tahoma"/>

    </style:style

  • XML example (content.xml):

    <style:style style:name="P2" style:family="paragraph" style:parent-style-name="Text_20_body">

    <style:paragraph-properties fo:text-align="start" style:justify-single-word="false"/>

    </style:style>

_getStyleAttributes(documentDict, styleAttributeList)

source code 

Searches if the style have the wanted attribute if it have then replace attribute value, otherwise keep old value. The style attribute list contains all the relevant style information.

Returns:
The list of the styles attributes.

_getParentStyleList(documentDict, styleName)

source code 

Gets the parent style list for the given style name.

Returns:
The list of parent styles (lists first entry is style itself).

_checkParentStyle(documentDict, styleName)

source code 

Checks if the style have a parent style.

Returns:
The parent style name.

checkEndnotesAndFootnotes(documentDict)

source code 

Checks the end- and the footnotes.

Returns:
True if there is endnote or footnote in the document, otherwise False.

checkImageCaptions(documentDict)

source code 

Checks the caption and the reference of the image.

Returns:
True if the document images have caption and reference, otherwise False.

checkCoverPage(documentDict)

source code 

Checks that the front page is done correctly

Parameters:
  • title - True if the title in cover page is the same as in the document meta.
  • name - True if the cover page contains the same author name as in the document meta.
  • email - True if the cover page contains e-mail address.
Returns:
The cover definitions in a dictionary.

getPageNumberFormatAndAuthor(documentDict, section)

source code 

Gets the page number format and the author name from the document.

Parameters:
  • section - can have a value 'cover', 'toc' or 'text'.
Returns:
The dictionary which contains the author and the page number information.

checkHeadersAndFooters(documentDict)

source code 

Checks that the headers and the footers of the document are made correctly.

Assumes that the document has three sections:

  1. the cover section,
  2. the table of contents section or the toc section and
  3. the actual content section or the text section.

See Also: checkSections method must pass in order to run this method

Places findings in the headerAndFooterDict as key-boolean pairs:

  • 'frontPage' was there headers or footers in the cover section.
  • 'tocPageNumbering' is there a page numbering in the toc section.
  • 'differentPageNumbering' is the page numbering different in the cover and text sections.
  • 'nameInToc' is the last modifiers name in toc section header or footer.
  • 'nameInText' is the last modifiers name in text section header or footer.
  • 'pageNumbering' is there a page numbering in the text section.
  • 'tocNumStart' does the toc section page numbering start at 1.
  • 'textNumStart' does the text section page numbering start at 1.

_getSectionElements(documentDict, section)

source code 

Gets the elements of the wanted section. The section break elements changes the section. Searches trough the whole document. Adds each element to right section in sectionElements dictionary. When finds section break element then changes the dictionary to next section. First list elements to cover-section. Second list elements to toc-section. And last list element to text-section. Document have to have atleast 3 sections.

Returns:
The section elements in the list.

checkSections(documentDict, errorList)

source code 

Checks that the document sections have been made correctly. If the amount of the section breaks is not over 3 then return the error message list.

Returns:
True if the sections are ok, return errorList if not ok.

_getMeta(documentDict)

source code 

Gets all the meta information.

Returns:
All the meta in the dictionary.

See Also: ooo_meta_inspector.getMeta

getMetaAuthor(documentDict)

source code 

Gets the author, who have last modified the document.

Returns:
The last modified author.

getMetaTitle(documentDict)

source code 

Gets document title from the meta information.

Returns:
The title which have defined in meta information.

getMetaEdited(documentDict)

source code 

Gets the last modified date and time from the meta.

Returns:
The last modified date in ISO 8601 standard (yyyy-mm-ddThh:mm:ss)

checkStyleUsage(documentDict, errorIdsAndPositions)

source code 

Goes through all the elements in the document which have used any style. Checks that elements are using the correct styles (i.e. not Standard or Default style) and that no manual style definitions are made (like T1).