Skip to main content

Table 2 Equations used in SEAM

From: Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system

C-value (a) [18]

\( \left\{\begin{array}{l}\kern12.5em lo{g}_2\left|a\right|\cdot f(a),\kern2em \left|\alpha\ is\ not\ nested\right.\hfill \\ {}lo{g}_2\left|a\right|\left(f(a)-\frac{1}{P\left({T}_{\alpha}\right)}{\displaystyle \sum_{b\epsilon {T}_{\alpha }}f(b)}\right),\kern1em \left| otherwise\right.\hfill \end{array}\right. \)

where:

\( \alpha \) is the candidate string

f(.) is its frequency of occurrence in the corpus

Τa is the set of extracted candidate terms that contain a

Pa) Is the number of these candidate terms

Termhood (a) \( \log \left(\frac{P\left( vote= yes\right)}{P\left( vote= no\right)}\right) \) [53]

= −0.7836 +

0.7541* FirstPOS _ ADJECTIVE –

1.3722* FirstPOS _ ADVERB +

0.3541* FirstPOS _ NOUN +

1.4182 * FirstPOS _ VERB –

0.7722 * LastPOS _ ADJECTIVE +

2.2576 * LastPOS _ ADVERB +

0.0285 * LastPOS_NOUN +

0.6038 * LastPOS _ VERB +

1.2899 * NP _ VALUE +

1.0475 * REPEAT _ SUP _ GREATER _ MEDIAN +

0.8417 * REPEAT _ SUB _ GREATER _ MEDIAN +

0.8422 * DISTINCT _ PERHOST _ GREATER _ THAN _ MEDIAN

where:

POS is Part of Speech tag

REPEAT_SUP is number of supra (candidate terms containing a) = Pa)

REPEAT_SUB is subgroup (candidate terms that are contained within a) = P (Αt)

NP_VALUE is a a noun phrase

DISTINCT_PER_HOST is equivalent to document frequency

MEDIAN is calculated for the whole document set

TF-IDF = wi,j = TFi,j x IDFi [43]

\( T{F}_{i,j}=\frac{f_{i,j}}{ma{x}_z{f}_{z,j}} \)

where:

TFi,j is term frequency for keyword ki in document dj

fi,j is the number of times ki appears in dj

maxzfz,j is the maximum frequency across all keywords kz in dj

\( ID{F}_i= log\frac{N}{n_i} \)

where:

IDFi is the inverse document frequency for keyword ki

N is the total number of documents in the corpus

nj is the number of documents that ki appears in

Cosine similarity [43] \( cosine\left(\overrightarrow{w_c},\overrightarrow{w_s}\right)=\frac{\overrightarrow{w_c}\cdot \overrightarrow{w_s}}{\overrightarrow{w_c}\times \overrightarrow{w_s}} \)

\( =\frac{{\displaystyle {\sum}_{i=1}^K}{w}_{i,c}{w}_{i,s}}{\sqrt{{\displaystyle {\sum}_{i=1}^K}{w}_{i,c}^2}\sqrt{{\displaystyle {\sum}_{i=1}^K}{w}_{i,s}^2}} \)

where

wi,j is defined above