Skip to main content

Table 2 Equations used in SEAM

From: Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system

C-value (a) [18] \( \left\{\begin{array}{l}\kern12.5em lo{g}_2\left|a\right|\cdot f(a),\kern2em \left|\alpha\ is\ not\ nested\right.\hfill \\ {}lo{g}_2\left|a\right|\left(f(a)-\frac{1}{P\left({T}_{\alpha}\right)}{\displaystyle \sum_{b\epsilon {T}_{\alpha }}f(b)}\right),\kern1em \left| otherwise\right.\hfill \end{array}\right. \)
where:
\( \alpha \) is the candidate string
f(.) is its frequency of occurrence in the corpus
Τa is the set of extracted candidate terms that contain a
Pa) Is the number of these candidate terms
Termhood (a) \( \log \left(\frac{P\left( vote= yes\right)}{P\left( vote= no\right)}\right) \) [53] = −0.7836 +
0.7541* FirstPOS _ ADJECTIVE –
1.3722* FirstPOS _ ADVERB +
0.3541* FirstPOS _ NOUN +
1.4182 * FirstPOS _ VERB –
0.7722 * LastPOS _ ADJECTIVE +
2.2576 * LastPOS _ ADVERB +
0.0285 * LastPOS_NOUN +
0.6038 * LastPOS _ VERB +
1.2899 * NP _ VALUE +
1.0475 * REPEAT _ SUP _ GREATER _ MEDIAN +
0.8417 * REPEAT _ SUB _ GREATER _ MEDIAN +
0.8422 * DISTINCT _ PERHOST _ GREATER _ THAN _ MEDIAN
where:
POS is Part of Speech tag
REPEAT_SUP is number of supra (candidate terms containing a) = Pa)
REPEAT_SUB is subgroup (candidate terms that are contained within a) = P (Αt)
NP_VALUE is a a noun phrase
DISTINCT_PER_HOST is equivalent to document frequency
MEDIAN is calculated for the whole document set
TF-IDF = wi,j = TFi,j x IDFi [43] \( T{F}_{i,j}=\frac{f_{i,j}}{ma{x}_z{f}_{z,j}} \)
where:
TFi,j is term frequency for keyword ki in document dj
fi,j is the number of times ki appears in dj
maxzfz,j is the maximum frequency across all keywords kz in dj
\( ID{F}_i= log\frac{N}{n_i} \)
where:
IDFi is the inverse document frequency for keyword ki
N is the total number of documents in the corpus
nj is the number of documents that ki appears in
Cosine similarity [43] \( cosine\left(\overrightarrow{w_c},\overrightarrow{w_s}\right)=\frac{\overrightarrow{w_c}\cdot \overrightarrow{w_s}}{\overrightarrow{w_c}\times \overrightarrow{w_s}} \) \( =\frac{{\displaystyle {\sum}_{i=1}^K}{w}_{i,c}{w}_{i,s}}{\sqrt{{\displaystyle {\sum}_{i=1}^K}{w}_{i,c}^2}\sqrt{{\displaystyle {\sum}_{i=1}^K}{w}_{i,s}^2}} \)
where
wi,j is defined above