ノート/テキストマイニング
訪問者数 1768      最終更新 2009-12-11 (金) 13:10:13

> ノート/テキストマイニング
> ノート/テキストマイニング/テキストマイニングとシソーラス
> ノート/テキストマイニング/PubMed解析
> ノート/テキストマイニング/MeSH
> ノート/テキストマイニング/Stanfordパーザーを使ってみる2

出発 Online Resources for Studying English Syntax, Words and Usage

到着 The Stanford Parser: A statistical parser

09/03/04 Stanford Parserを使ってみる

トライアルのページ: http://nlp.stanford.edu:8080/parser/index.jsp

PubMedのアブストラクトで実験。

入力

Lung cancer has become increasingly common in women, and gender differences
in the physiology and pathogenesis of the disease have suggested a role for
estrogens. In the lung recent data have shown local production of estrogens
from androgens via the action of aromatase enzyme and higher levels of
estrogen in tumor tissue as compared with surrounding normal lung tissue.

出力

Tagging  省略

Parse
(ROOT
  (S
    (S
      (NP (NNP Lung) (NN cancer))
      (VP (VBZ has)
        (VP (VBN become)
          (ADJP (RB increasingly) (JJ common))
          (PP (IN in)
            (NP (NNS women))))))
    (, ,)
    (CC and)
    (S
      (NP
        (NP (NN gender) (NNS differences))
        (PP (IN in)
          (NP
            (NP (DT the) (NN physiology)
              (CC and)
              (NN pathogenesis))
            (PP (IN of)
              (NP (DT the) (NN disease))))))
      (VP (VBP have)
        (VP (VBN suggested)
          (NP
            (NP
              (NP (DT a) (NN role))
              (PP (IN for)
                (NP (NNS estrogens)))
              (. .))
            (SBAR
              (S
                (PP (IN In)
                  (NP (DT the) (NN lung)))
                (NP (JJ recent) (NNS data))
                (VP (VBP have)
                  (VP (VBN shown)
                    (NP
                      (NP (JJ local) (NN production))
                      (PP (IN of)
                        (NP
                          (NP (NNS estrogens))
                          (PP (IN from)
                            (NP
                              (NP (NNS androgens))
                              (PP (IN via)
                                (NP
                                  (NP
                                    (NP (DT the) (NN action))
                                    (PP (IN of)
                                      (NP (JJ aromatase) (NN enzyme))))
                                  (CC and)
                                  (NP
                                    (NP (JJR higher) (NNS levels))
                                    (PP (IN of)
                                      (NP
                                        (NP (NN estrogen))
                                        (PP (IN in)
                                          (NP (NN tumor) (NN tissue)))))))))))))
                    (PP (IN as))
                    (PP (VBN compared)
                      (PP (IN with)
                        (S
                          (VP (VBG surrounding)
                            (NP (JJ normal) (NN lung) (NN tissue))))))))))))))
    (. .)))

Typed dependencies

nn(cancer-2, Lung-1)
nsubj(common-6, cancer-2)
aux(common-6, has-3)
cop(common-6, become-4)
advmod(common-6, increasingly-5)
prep_in(common-6, women-8)
nn(differences-12, gender-11)
nsubj(suggested-22, differences-12)
det(physiology-15, the-14)
prep_in(differences-12, physiology-15)
prep_in(differences-12, pathogenesis-17)
conj_and(physiology-15, pathogenesis-17)
det(disease-20, the-19)
prep_of(physiology-15, disease-20)
aux(suggested-22, have-21)
conj_and(common-6, suggested-22)
det(role-24, a-23)
dobj(suggested-22, role-24)
prep_for(role-24, estrogens-26)
det(lung-30, the-29)
prep_in(shown-34, lung-30)
amod(data-32, recent-31)
nsubj(shown-34, data-32)
aux(shown-34, have-33)
rcmod(role-24, shown-34)
amod(production-36, local-35)
dobj(shown-34, production-36)
prep_of(production-36, estrogens-38)
prep_from(estrogens-38, androgens-40)
det(action-43, the-42)
prep_via(androgens-40, action-43)
amod(enzyme-46, aromatase-45)
prep_of(action-43, enzyme-46)
amod(levels-49, higher-48)
prep_via(androgens-40, levels-49)
conj_and(action-43, levels-49)
prep_of(levels-49, estrogen-51)
nn(tissue-54, tumor-53)
prep_in(estrogen-51, tissue-54)
prep(shown-34, as-55)
prepc_compared_with(shown-34, surrounding-58)
amod(tissue-61, normal-59)
nn(tissue-61, lung-60)
dobj(surrounding-58, tissue-61)

Typed dependencies, collapsed

nn(cancer-2, Lung-1)
nsubj(common-6, cancer-2)
aux(common-6, has-3)
cop(common-6, become-4)
advmod(common-6, increasingly-5)
prep_in(common-6, women-8)
nn(differences-12, gender-11)
nsubj(suggested-22, differences-12)
det(physiology-15, the-14)
prep_in(differences-12, physiology-15)
conj_and(physiology-15, pathogenesis-17)
det(disease-20, the-19)
prep_of(physiology-15, disease-20)
aux(suggested-22, have-21)
conj_and(common-6, suggested-22)
det(role-24, a-23)
dobj(suggested-22, role-24)
prep_for(role-24, estrogens-26)
det(lung-30, the-29)
prep_in(shown-34, lung-30)
amod(data-32, recent-31)
nsubj(shown-34, data-32)
aux(shown-34, have-33)
rcmod(role-24, shown-34)
amod(production-36, local-35)
dobj(shown-34, production-36)
prep_of(production-36, estrogens-38)
prep_from(estrogens-38, androgens-40)
det(action-43, the-42)
prep_via(androgens-40, action-43)
amod(enzyme-46, aromatase-45)
prep_of(action-43, enzyme-46)
amod(levels-49, higher-48)
conj_and(action-43, levels-49)
prep_of(levels-49, estrogen-51)
nn(tissue-54, tumor-53)
prep_in(estrogen-51, tissue-54)
prep(shown-34, as-55)
prepc_compared_with(shown-34, surrounding-58)
amod(tissue-61, normal-59)
nn(tissue-61, lung-60)
dobj(surrounding-58, tissue-61)
         
Statistics

Tokens: 62
Time: 10.816 s

もう1つ同じような実験

入力は

High levels of aromatase expression are also maintained in metastases as
compared with primary tumors. Consistent with these findings, clinical studies
suggest that aromatase expression may be a useful predictive biomarker for
prognosis in the management of non-small cell lung cancer (NSCLC), the most
common form of lung malignancy. Low levels of aromatase associate with a
higher probability of long-term survival in older women with early stage
NSCLC. Treatment of lung NSCLC xenografts in vivo with an aromatase inhibitor
(exemestane) alone or combined with standard cisplatin chemotherapy elicits a
significant reduction in tumor progression as compared to paired controls.

出力は

Parse

(ROOT
  (S
    (NP
      (NP (JJ High) (NNS levels))
      (PP (IN of)
        (NP (JJ aromatase) (NN expression))))
    (VP (VBP are)
      (ADVP (RB also))
      (VP
        (VP (VBN maintained)
          (PP (IN in)
            (NP (NNS metastases)))
          (SBAR (IN as)
            (S
              (PP (VBN compared)
                (PP (IN with)
                  (FRAG
                    (NP (JJ primary) (NN tumors) (. .))
                    (: Consistent)
                    (S
                      (PP (IN with)
                        (NP (DT these) (NNS findings)))
                      (, ,)
                      (NP (JJ clinical) (NNS studies))
                      (VP (VBP suggest)
                        (SBAR (IN that)
                          (S
                            (NP (JJ aromatase) (NN expression))
                            (VP (MD may)
                              (VP (VB be)
                                (NP
                                  (NP (DT a) (JJ useful) (JJ predictive) (NN biomarker))
                                  (PP (IN for)
                                    (NP
                                      (NP (NN prognosis))
                                      (PP (IN in)
                                        (NP
                                          (NP (DT the) (NN management))
                                          (PP (IN of)
                                            (NP
                                              (NP
                                                (NP (JJ non-small) (NN cell) (NN lung) (NN cancer))
                                                (PRN (-LRB- -LRB-)
                                                  (NP (NNP NSCLC))
                                                  (-RRB- -RRB-)))
                                              (, ,)
                                              (NP
                                                (NP (DT the)
                                                  (ADJP (RBS most) (JJ common))
                                                  (NN form))
                                                (PP (IN of)
                                                  (NP (NN lung) (NN malignancy)))))))))))))))))
                    (. .))))
              (NP
                (NP (JJ Low) (NNS levels))
                (PP (IN of)
                  (NP (JJ aromatase) (NN associate)))
                (PP (IN with)
                  (NP
                    (NP (DT a) (JJR higher) (NN probability))
                    (PP (IN of)
                      (NP
                        (NP (JJ long-term) (NN survival))
                        (PP (IN in)
                          (NP
                            (NP (JJR older) (NNS women))
                            (PP (IN with)
                              (NP (JJ early) (NN stage) (NNP NSCLC) (. .) (NNP Treatment)))))))))
                (PP (IN of)
                  (NP (NN lung) (NN NSCLC))))
              (VP (VBZ xenografts)
                (ADVP (FW in) (FW vivo))
                (PP (IN with)
                  (NP
                    (NP (DT an) (JJ aromatase) (NN inhibitor))
                    (PRN (-LRB- -LRB-)
                      (NP (NN exemestane))
                      (-RRB- -RRB-))))
                (ADVP (RB alone))))))
        (CC or)
        (VP (VBN combined)
          (PP (IN with)
            (NP (JJ standard) (NN cisplatin) (NN chemotherapy) (NNS elicits))
            (ADVP
              (NP
                (NP (DT a) (JJ significant) (NN reduction))
                (PP (IN in)
                  (NP (NN tumor) (NN progression))))
              (RB as))))
        (PP (VBN compared)
          (PP (TO to)
            (NP (JJ paired) (NNS controls))))))
    (. .)))

Typed dependencies

amod(levels-2, High-1)
nsubjpass(maintained-8, levels-2)
nsubjpass(combined-92, levels-2)
amod(expression-5, aromatase-4)
prep_of(levels-2, expression-5)
auxpass(maintained-8, are-6)
advmod(maintained-8, also-7)
prep_in(maintained-8, metastases-10)
mark(xenografts-80, as-11)
prep(xenografts-80, compared-12)
dep(compared-12, with-13)
amod(tumors-15, primary-14)
dep(suggest-24, tumors-15)
dep(suggest-24, Consistent-17)
det(findings-20, these-19)
prep_with(suggest-24, findings-20)
amod(studies-23, clinical-22)
nsubj(suggest-24, studies-23)
dep(with-13, suggest-24)
complm(biomarker-33, that-25)
amod(expression-27, aromatase-26)
nsubj(biomarker-33, expression-27)
aux(biomarker-33, may-28)
cop(biomarker-33, be-29)
det(biomarker-33, a-30)
amod(biomarker-33, useful-31)
amod(biomarker-33, predictive-32)
ccomp(suggest-24, biomarker-33)
prep_for(biomarker-33, prognosis-35)
det(management-38, the-37)
prep_in(prognosis-35, management-38)
amod(cancer-43, non-small-40)
nn(cancer-43, cell-41)
nn(cancer-43, lung-42)
prep_of(management-38, cancer-43)
abbrev(cancer-43, NSCLC-45)
det(form-51, the-48)
advmod(common-50, most-49)
amod(form-51, common-50)
appos(cancer-43, form-51)
nn(malignancy-54, lung-53)
prep_of(form-51, malignancy-54)
amod(levels-57, Low-56)
nsubj(xenografts-80, levels-57)
amod(associate-60, aromatase-59)
prep_of(levels-57, associate-60)
det(probability-64, a-62)
amod(probability-64, higher-63)
prep_with(levels-57, probability-64)
amod(survival-67, long-term-66)
prep_of(probability-64, survival-67)
amod(women-70, older-69)
prep_in(survival-67, women-70)
amod(Treatment-76, early-72)
nn(Treatment-76, stage-73)
nn(Treatment-76, NSCLC-74)
prep_with(women-70, Treatment-76)
nn(NSCLC-79, lung-78)
prep_of(levels-57, NSCLC-79)
advcl(maintained-8, xenografts-80)
dep(vivo-82, in-81)
advmod(xenografts-80, vivo-82)
det(inhibitor-86, an-84)
amod(inhibitor-86, aromatase-85)
prep_with(xenografts-80, inhibitor-86)
appos(inhibitor-86, exemestane-88)
advmod(xenografts-80, alone-90)
conj_or(maintained-8, combined-92)
amod(elicits-97, standard-94)
nn(elicits-97, cisplatin-95)
nn(elicits-97, chemotherapy-96)
prep_with(combined-92, elicits-97)
det(reduction-100, a-98)
amod(reduction-100, significant-99)
dep(as-104, reduction-100)
nn(progression-103, tumor-102)
prep_in(reduction-100, progression-103)
dep(combined-92, as-104)
amod(controls-108, paired-107)
prep_compared_to(maintained-8, controls-108)

Typed dependencies, collapsed

amod(levels-2, High-1)
nsubjpass(maintained-8, levels-2)
amod(expression-5, aromatase-4)
prep_of(levels-2, expression-5)
auxpass(maintained-8, are-6)
advmod(maintained-8, also-7)
prep_in(maintained-8, metastases-10)
mark(xenografts-80, as-11)
prep(xenografts-80, compared-12)
dep(compared-12, with-13)
amod(tumors-15, primary-14)
dep(suggest-24, tumors-15)
dep(suggest-24, Consistent-17)
det(findings-20, these-19)
prep_with(suggest-24, findings-20)
amod(studies-23, clinical-22)
nsubj(suggest-24, studies-23)
dep(with-13, suggest-24)
complm(biomarker-33, that-25)
amod(expression-27, aromatase-26)
nsubj(biomarker-33, expression-27)
aux(biomarker-33, may-28)
cop(biomarker-33, be-29)
det(biomarker-33, a-30)
amod(biomarker-33, useful-31)
amod(biomarker-33, predictive-32)
ccomp(suggest-24, biomarker-33)
prep_for(biomarker-33, prognosis-35)
det(management-38, the-37)
prep_in(prognosis-35, management-38)
amod(cancer-43, non-small-40)
nn(cancer-43, cell-41)
nn(cancer-43, lung-42)
prep_of(management-38, cancer-43)
abbrev(cancer-43, NSCLC-45)
det(form-51, the-48)
advmod(common-50, most-49)
amod(form-51, common-50)
appos(cancer-43, form-51)
nn(malignancy-54, lung-53)
prep_of(form-51, malignancy-54)
amod(levels-57, Low-56)
nsubj(xenografts-80, levels-57)
amod(associate-60, aromatase-59)
prep_of(levels-57, associate-60)
det(probability-64, a-62)
amod(probability-64, higher-63)
prep_with(levels-57, probability-64)
amod(survival-67, long-term-66)
prep_of(probability-64, survival-67)
amod(women-70, older-69)
prep_in(survival-67, women-70)
amod(Treatment-76, early-72)
nn(Treatment-76, stage-73)
nn(Treatment-76, NSCLC-74)
prep_with(women-70, Treatment-76)
nn(NSCLC-79, lung-78)
prep_of(levels-57, NSCLC-79)
advcl(maintained-8, xenografts-80)
dep(vivo-82, in-81)
advmod(xenografts-80, vivo-82)
det(inhibitor-86, an-84)
amod(inhibitor-86, aromatase-85)
prep_with(xenografts-80, inhibitor-86)
appos(inhibitor-86, exemestane-88)
advmod(xenografts-80, alone-90)
conj_or(maintained-8, combined-92)
amod(elicits-97, standard-94)
nn(elicits-97, cisplatin-95)
nn(elicits-97, chemotherapy-96)
prep_with(combined-92, elicits-97)
det(reduction-100, a-98)
amod(reduction-100, significant-99)
dep(as-104, reduction-100)
nn(progression-103, tumor-102)
prep_in(reduction-100, progression-103)
dep(combined-92, as-104)
amod(controls-108, paired-107)
prep_compared_to(maintained-8, controls-108)
          
Statistics

Tokens: 109
Time: 47.074 s

PythonからStanford Parserを使う

ノート/PythonからJavaを呼出す を参照。


トップ   編集 凍結 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS
Last-modified: 2009-12-11 (金) 13:10:13 (2780d)