ノート/テキストマイニング
訪問者数 2408 最終更新 2009-12-11 (金) 13:10:13
> ノート/テキストマイニング
> ノート/テキストマイニング/テキストマイニングとシソーラス
> ノート/テキストマイニング/PubMed解析
> ノート/テキストマイニング/MeSH
> ノート/テキストマイニング/Stanfordパーザーを使ってみる2
出発 Online Resources for Studying English Syntax, Words and Usage
到着 The Stanford Parser: A statistical parser
トライアルのページ: http://nlp.stanford.edu:8080/parser/index.jsp
PubMedのアブストラクトで実験。
入力
Lung cancer has become increasingly common in women, and gender differences in the physiology and pathogenesis of the disease have suggested a role for estrogens. In the lung recent data have shown local production of estrogens from androgens via the action of aromatase enzyme and higher levels of estrogen in tumor tissue as compared with surrounding normal lung tissue.
出力
Tagging 省略 Parse (ROOT (S (S (NP (NNP Lung) (NN cancer)) (VP (VBZ has) (VP (VBN become) (ADJP (RB increasingly) (JJ common)) (PP (IN in) (NP (NNS women)))))) (, ,) (CC and) (S (NP (NP (NN gender) (NNS differences)) (PP (IN in) (NP (NP (DT the) (NN physiology) (CC and) (NN pathogenesis)) (PP (IN of) (NP (DT the) (NN disease)))))) (VP (VBP have) (VP (VBN suggested) (NP (NP (NP (DT a) (NN role)) (PP (IN for) (NP (NNS estrogens))) (. .)) (SBAR (S (PP (IN In) (NP (DT the) (NN lung))) (NP (JJ recent) (NNS data)) (VP (VBP have) (VP (VBN shown) (NP (NP (JJ local) (NN production)) (PP (IN of) (NP (NP (NNS estrogens)) (PP (IN from) (NP (NP (NNS androgens)) (PP (IN via) (NP (NP (NP (DT the) (NN action)) (PP (IN of) (NP (JJ aromatase) (NN enzyme)))) (CC and) (NP (NP (JJR higher) (NNS levels)) (PP (IN of) (NP (NP (NN estrogen)) (PP (IN in) (NP (NN tumor) (NN tissue))))))))))))) (PP (IN as)) (PP (VBN compared) (PP (IN with) (S (VP (VBG surrounding) (NP (JJ normal) (NN lung) (NN tissue)))))))))))))) (. .))) Typed dependencies nn(cancer-2, Lung-1) nsubj(common-6, cancer-2) aux(common-6, has-3) cop(common-6, become-4) advmod(common-6, increasingly-5) prep_in(common-6, women-8) nn(differences-12, gender-11) nsubj(suggested-22, differences-12) det(physiology-15, the-14) prep_in(differences-12, physiology-15) prep_in(differences-12, pathogenesis-17) conj_and(physiology-15, pathogenesis-17) det(disease-20, the-19) prep_of(physiology-15, disease-20) aux(suggested-22, have-21) conj_and(common-6, suggested-22) det(role-24, a-23) dobj(suggested-22, role-24) prep_for(role-24, estrogens-26) det(lung-30, the-29) prep_in(shown-34, lung-30) amod(data-32, recent-31) nsubj(shown-34, data-32) aux(shown-34, have-33) rcmod(role-24, shown-34) amod(production-36, local-35) dobj(shown-34, production-36) prep_of(production-36, estrogens-38) prep_from(estrogens-38, androgens-40) det(action-43, the-42) prep_via(androgens-40, action-43) amod(enzyme-46, aromatase-45) prep_of(action-43, enzyme-46) amod(levels-49, higher-48) prep_via(androgens-40, levels-49) conj_and(action-43, levels-49) prep_of(levels-49, estrogen-51) nn(tissue-54, tumor-53) prep_in(estrogen-51, tissue-54) prep(shown-34, as-55) prepc_compared_with(shown-34, surrounding-58) amod(tissue-61, normal-59) nn(tissue-61, lung-60) dobj(surrounding-58, tissue-61) Typed dependencies, collapsed nn(cancer-2, Lung-1) nsubj(common-6, cancer-2) aux(common-6, has-3) cop(common-6, become-4) advmod(common-6, increasingly-5) prep_in(common-6, women-8) nn(differences-12, gender-11) nsubj(suggested-22, differences-12) det(physiology-15, the-14) prep_in(differences-12, physiology-15) conj_and(physiology-15, pathogenesis-17) det(disease-20, the-19) prep_of(physiology-15, disease-20) aux(suggested-22, have-21) conj_and(common-6, suggested-22) det(role-24, a-23) dobj(suggested-22, role-24) prep_for(role-24, estrogens-26) det(lung-30, the-29) prep_in(shown-34, lung-30) amod(data-32, recent-31) nsubj(shown-34, data-32) aux(shown-34, have-33) rcmod(role-24, shown-34) amod(production-36, local-35) dobj(shown-34, production-36) prep_of(production-36, estrogens-38) prep_from(estrogens-38, androgens-40) det(action-43, the-42) prep_via(androgens-40, action-43) amod(enzyme-46, aromatase-45) prep_of(action-43, enzyme-46) amod(levels-49, higher-48) conj_and(action-43, levels-49) prep_of(levels-49, estrogen-51) nn(tissue-54, tumor-53) prep_in(estrogen-51, tissue-54) prep(shown-34, as-55) prepc_compared_with(shown-34, surrounding-58) amod(tissue-61, normal-59) nn(tissue-61, lung-60) dobj(surrounding-58, tissue-61) Statistics Tokens: 62 Time: 10.816 s
もう1つ同じような実験
入力は
High levels of aromatase expression are also maintained in metastases as compared with primary tumors. Consistent with these findings, clinical studies suggest that aromatase expression may be a useful predictive biomarker for prognosis in the management of non-small cell lung cancer (NSCLC), the most common form of lung malignancy. Low levels of aromatase associate with a higher probability of long-term survival in older women with early stage NSCLC. Treatment of lung NSCLC xenografts in vivo with an aromatase inhibitor (exemestane) alone or combined with standard cisplatin chemotherapy elicits a significant reduction in tumor progression as compared to paired controls.
出力は
Parse (ROOT (S (NP (NP (JJ High) (NNS levels)) (PP (IN of) (NP (JJ aromatase) (NN expression)))) (VP (VBP are) (ADVP (RB also)) (VP (VP (VBN maintained) (PP (IN in) (NP (NNS metastases))) (SBAR (IN as) (S (PP (VBN compared) (PP (IN with) (FRAG (NP (JJ primary) (NN tumors) (. .)) (: Consistent) (S (PP (IN with) (NP (DT these) (NNS findings))) (, ,) (NP (JJ clinical) (NNS studies)) (VP (VBP suggest) (SBAR (IN that) (S (NP (JJ aromatase) (NN expression)) (VP (MD may) (VP (VB be) (NP (NP (DT a) (JJ useful) (JJ predictive) (NN biomarker)) (PP (IN for) (NP (NP (NN prognosis)) (PP (IN in) (NP (NP (DT the) (NN management)) (PP (IN of) (NP (NP (NP (JJ non-small) (NN cell) (NN lung) (NN cancer)) (PRN (-LRB- -LRB-) (NP (NNP NSCLC)) (-RRB- -RRB-))) (, ,) (NP (NP (DT the) (ADJP (RBS most) (JJ common)) (NN form)) (PP (IN of) (NP (NN lung) (NN malignancy))))))))))))))))) (. .)))) (NP (NP (JJ Low) (NNS levels)) (PP (IN of) (NP (JJ aromatase) (NN associate))) (PP (IN with) (NP (NP (DT a) (JJR higher) (NN probability)) (PP (IN of) (NP (NP (JJ long-term) (NN survival)) (PP (IN in) (NP (NP (JJR older) (NNS women)) (PP (IN with) (NP (JJ early) (NN stage) (NNP NSCLC) (. .) (NNP Treatment))))))))) (PP (IN of) (NP (NN lung) (NN NSCLC)))) (VP (VBZ xenografts) (ADVP (FW in) (FW vivo)) (PP (IN with) (NP (NP (DT an) (JJ aromatase) (NN inhibitor)) (PRN (-LRB- -LRB-) (NP (NN exemestane)) (-RRB- -RRB-)))) (ADVP (RB alone)))))) (CC or) (VP (VBN combined) (PP (IN with) (NP (JJ standard) (NN cisplatin) (NN chemotherapy) (NNS elicits)) (ADVP (NP (NP (DT a) (JJ significant) (NN reduction)) (PP (IN in) (NP (NN tumor) (NN progression)))) (RB as)))) (PP (VBN compared) (PP (TO to) (NP (JJ paired) (NNS controls)))))) (. .))) Typed dependencies amod(levels-2, High-1) nsubjpass(maintained-8, levels-2) nsubjpass(combined-92, levels-2) amod(expression-5, aromatase-4) prep_of(levels-2, expression-5) auxpass(maintained-8, are-6) advmod(maintained-8, also-7) prep_in(maintained-8, metastases-10) mark(xenografts-80, as-11) prep(xenografts-80, compared-12) dep(compared-12, with-13) amod(tumors-15, primary-14) dep(suggest-24, tumors-15) dep(suggest-24, Consistent-17) det(findings-20, these-19) prep_with(suggest-24, findings-20) amod(studies-23, clinical-22) nsubj(suggest-24, studies-23) dep(with-13, suggest-24) complm(biomarker-33, that-25) amod(expression-27, aromatase-26) nsubj(biomarker-33, expression-27) aux(biomarker-33, may-28) cop(biomarker-33, be-29) det(biomarker-33, a-30) amod(biomarker-33, useful-31) amod(biomarker-33, predictive-32) ccomp(suggest-24, biomarker-33) prep_for(biomarker-33, prognosis-35) det(management-38, the-37) prep_in(prognosis-35, management-38) amod(cancer-43, non-small-40) nn(cancer-43, cell-41) nn(cancer-43, lung-42) prep_of(management-38, cancer-43) abbrev(cancer-43, NSCLC-45) det(form-51, the-48) advmod(common-50, most-49) amod(form-51, common-50) appos(cancer-43, form-51) nn(malignancy-54, lung-53) prep_of(form-51, malignancy-54) amod(levels-57, Low-56) nsubj(xenografts-80, levels-57) amod(associate-60, aromatase-59) prep_of(levels-57, associate-60) det(probability-64, a-62) amod(probability-64, higher-63) prep_with(levels-57, probability-64) amod(survival-67, long-term-66) prep_of(probability-64, survival-67) amod(women-70, older-69) prep_in(survival-67, women-70) amod(Treatment-76, early-72) nn(Treatment-76, stage-73) nn(Treatment-76, NSCLC-74) prep_with(women-70, Treatment-76) nn(NSCLC-79, lung-78) prep_of(levels-57, NSCLC-79) advcl(maintained-8, xenografts-80) dep(vivo-82, in-81) advmod(xenografts-80, vivo-82) det(inhibitor-86, an-84) amod(inhibitor-86, aromatase-85) prep_with(xenografts-80, inhibitor-86) appos(inhibitor-86, exemestane-88) advmod(xenografts-80, alone-90) conj_or(maintained-8, combined-92) amod(elicits-97, standard-94) nn(elicits-97, cisplatin-95) nn(elicits-97, chemotherapy-96) prep_with(combined-92, elicits-97) det(reduction-100, a-98) amod(reduction-100, significant-99) dep(as-104, reduction-100) nn(progression-103, tumor-102) prep_in(reduction-100, progression-103) dep(combined-92, as-104) amod(controls-108, paired-107) prep_compared_to(maintained-8, controls-108) Typed dependencies, collapsed amod(levels-2, High-1) nsubjpass(maintained-8, levels-2) amod(expression-5, aromatase-4) prep_of(levels-2, expression-5) auxpass(maintained-8, are-6) advmod(maintained-8, also-7) prep_in(maintained-8, metastases-10) mark(xenografts-80, as-11) prep(xenografts-80, compared-12) dep(compared-12, with-13) amod(tumors-15, primary-14) dep(suggest-24, tumors-15) dep(suggest-24, Consistent-17) det(findings-20, these-19) prep_with(suggest-24, findings-20) amod(studies-23, clinical-22) nsubj(suggest-24, studies-23) dep(with-13, suggest-24) complm(biomarker-33, that-25) amod(expression-27, aromatase-26) nsubj(biomarker-33, expression-27) aux(biomarker-33, may-28) cop(biomarker-33, be-29) det(biomarker-33, a-30) amod(biomarker-33, useful-31) amod(biomarker-33, predictive-32) ccomp(suggest-24, biomarker-33) prep_for(biomarker-33, prognosis-35) det(management-38, the-37) prep_in(prognosis-35, management-38) amod(cancer-43, non-small-40) nn(cancer-43, cell-41) nn(cancer-43, lung-42) prep_of(management-38, cancer-43) abbrev(cancer-43, NSCLC-45) det(form-51, the-48) advmod(common-50, most-49) amod(form-51, common-50) appos(cancer-43, form-51) nn(malignancy-54, lung-53) prep_of(form-51, malignancy-54) amod(levels-57, Low-56) nsubj(xenografts-80, levels-57) amod(associate-60, aromatase-59) prep_of(levels-57, associate-60) det(probability-64, a-62) amod(probability-64, higher-63) prep_with(levels-57, probability-64) amod(survival-67, long-term-66) prep_of(probability-64, survival-67) amod(women-70, older-69) prep_in(survival-67, women-70) amod(Treatment-76, early-72) nn(Treatment-76, stage-73) nn(Treatment-76, NSCLC-74) prep_with(women-70, Treatment-76) nn(NSCLC-79, lung-78) prep_of(levels-57, NSCLC-79) advcl(maintained-8, xenografts-80) dep(vivo-82, in-81) advmod(xenografts-80, vivo-82) det(inhibitor-86, an-84) amod(inhibitor-86, aromatase-85) prep_with(xenografts-80, inhibitor-86) appos(inhibitor-86, exemestane-88) advmod(xenografts-80, alone-90) conj_or(maintained-8, combined-92) amod(elicits-97, standard-94) nn(elicits-97, cisplatin-95) nn(elicits-97, chemotherapy-96) prep_with(combined-92, elicits-97) det(reduction-100, a-98) amod(reduction-100, significant-99) dep(as-104, reduction-100) nn(progression-103, tumor-102) prep_in(reduction-100, progression-103) dep(combined-92, as-104) amod(controls-108, paired-107) prep_compared_to(maintained-8, controls-108) Statistics Tokens: 109 Time: 47.074 s
ノート/PythonからJavaを呼出す を参照。