¥Î¡¼¥È
Á°¤Î¥Ú¡¼¥¸
ˬÌä¼Ô¿ô¡¡4147¡¡¡¡¡¡¡¡¡¡¡¡ºÇ½ª¹¹¿·¡¡2007-11-23 (¶â) 17:10:50

³ÆÏÀʸ¤«¤éAbstract¤òÃê½Ð¤·¤è¤¦

¤½¤Î¤¿¤á¤Ë¤Ï¡¢Á°ÊǤΥ¢¥¯¥»¥¹»þ¤Î¥â¡¼¥Éretmode¤ò¡¢¥Æ¥­¥¹¥Ètext¤Ç¤Ï¤Ê¤¯XML(xml)¤Ë¤¹¤ë¡£

XML¥â¡¼¥É¤ÇÆÀ¤é¤ì¤ë¥Ç¡¼¥¿¤ÎÎã

<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2007//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pubmed_070101.dtd">
<PubmedArticleSet>
<PubmedArticle>
    <MedlineCitation Owner="NLM" Status="Publisher">
        <PMID>1803 0674</PMID>
        <DateCreated>
            <Year>2007</Year>
            <Month>11</Month>
            <Day>21</Day>
        </DateCreated>
        <Article PubModel="Print-Electronic">
            <Journal>
                <ISSN IssnType="Electronic">1098-1004</ISSN>
                <JournalIssue CitedMedium="Internet">
                    <PubDate>
                        <Year>2007</Year>
                        <Month>Nov</Month>
                        <Day>20</Day>
                    </PubDate>
                </JournalIssue>
            </Journal>
            <ArticleTitle>RNA-based mutation analysis identifies an unusual MSH6 splicing defect and circumvents PMS2 pseudogene interference.</ArticleTitle>
            <Pagination>
                <MedlinePgn/>
            </Pagination>
            <Abstract>
                <AbstractText>Heterozygous germline mutations in one of the mismatch repair (°Ê²¼Î¬) </AbstractText>
            </Abstract>
            <Affiliation>Department of Medical Genetics, Medical University Vienna, Vienna, Austria.</Affiliation>
            <AuthorList>
                <Author>
                    <LastName>Etzler</LastName>
                    <FirstName>J</FirstName>
                    <Initials>J</Initials>
                </Author>
                <Author>
                    <LastName>Peyrl</LastName>
                    <FirstName>A</FirstName>
                    <Initials>A</Initials>
                </Author>
                <Author>
                    <LastName>Zatkova</LastName>
                    <FirstName>A</FirstName>
                    <Initials>A</Initials>
                </Author>
                <Author>
                    <LastName>Schildhaus</LastName>
                    <FirstName>H-U</FirstName>
                    <Initials>HU</Initials>
                </Author>
                <Author>
                    <LastName>Ficek</LastName>
                    <FirstName>A</FirstName>
                    <Initials>A</Initials>
                </Author>
                <Author>
                    <LastName>Merkelbach-Bruse</LastName>
                    <FirstName>S</FirstName>
                    <Initials>S</Initials>
                </Author>
                <Author>
                    <LastName>Kratz</LastName>
                    <FirstName>C P</FirstName>
                    <Initials>CP</Initials>
                </Author>
                <Author>
                    <LastName>Attarbaschi</LastName>
                    <FirstName>A</FirstName>
                    <Initials>A</Initials>
                </Author>
                <Author>
                    <LastName>Hainfellner</LastName>
                    <FirstName>J A</FirstName>
                    <Initials>JA</Initials>
                </Author>
                <Author>
                    <LastName>Yao</LastName>
                    <FirstName>S</FirstName>
                    <Initials>S</Initials>
                </Author>
                <Author>
                    <LastName>Messiaen</LastName>
                    <FirstName>L</FirstName>
                    <Initials>L</Initials>
                </Author>
                <Author>
                    <LastName>Slavc</LastName>
                    <FirstName>I</FirstName>
                    <Initials>I</Initials>
                </Author>
                <Author>
                    <LastName>Wimmer</LastName>
                    <FirstName>K</FirstName>
                    <Initials>K</Initials>
                </Author>
            </AuthorList>
            <Language>ENG</Language>
            <PublicationTypeList>
                <PublicationType>JOURNAL ARTICLE</PublicationType>
            </PublicationTypeList>
            <ArticleDate DateType="Electronic">
                <Year>2007</Year>
                <Month>11</Month>
                <Day>20</Day>
            </ArticleDate>
        </Article>
        <MedlineJournalInfo>
            <MedlineTA>Hum Mutat</MedlineTA>
            <NlmUniqueID>9215429</NlmUniqueID>
        </MedlineJournalInfo>
    </MedlineCitation>
    <PubmedData>
        <History>
            <PubMedPubDate PubStatus="pubmed">
                <Year>2007</Year>
                <Month>11</Month>
                <Day>22</Day>
                <Hour>9</Hour>
                <Minute>0</Minute>
            </PubMedPubDate>
            <PubMedPubDate PubStatus="medline">
                <Year>2007</Year>
                <Month>11</Month>
                <Day>22</Day>
                <Hour>9</Hour>
                <Minute>0</Minute>
            </PubMedPubDate>
        </History>
        <PublicationStatus>aheadofprint</PublicationStatus>
        <ArticleIdList>
            <ArticleId IdType="doi">10.1002/humu.20657</ArticleId>
            <ArticleId IdType="pubmed">18030674</ArticleId>
        </ArticleIdList>
    </PubmedData>
</PubmedArticle>

</PubmedArticleSet>

¤³¤ì¤òÃê½Ð¤¹¤ë¤¿¤á¤Ë¤Ï¡¢¥Ñ¥¿¡¼¥ó¥Þ¥Ã¥Á¥ó¥°¤òº£¤Ò¤È¤Äºî¤êľ¤¹É¬Íפ¬¤¢¤ë¡£ ¤Þ¤º¤Ï­¡¥Õ¥£¡¼¥ë¥ÉÃæ¤Ë¶õÇò¤ä¥Þ¥¤¥Ê¥¹¡¢¥Ô¥ê¥ª¥É¤Ê¤É¤ò´Þ¤à¤Î¤Ç¤½¤ì¤ò¤â¼è¤ê½Ð¤¹¤³¤È¡£¥Æ¥¹¥È¥×¥í¥°¥é¥à¤È¤·¤Æ¤Ïfiletestabstract.py

mref = re.compile('.*<PubDate>(.+)</PubDate>.*<ArticleTitle>(.+)</ArticleTitle>.*<AbstractText>(.+)</AbstractText>.*<AuthorList>(.+)</AuthorList>', re.S)
a = mref.search(efetch_result)
print a
pubdate = []
articletitle = []
abstracttext = []
authorlist = []

if a:
  pubdate.append(a.group(1))
  articletitle.append(a.group(2))
  abstracttext.append(a.group(3))
  authorlist.append(a.group(4))
else:
  pubdate.append("")
  articletitle.append("")
  abstracttext.append("")
  authorlist.append("")

print "[PubDate]" ; print pubdate[0]
print "[Title]" ;  print articletitle[0]
print "[Abstract]"; print abstracttext[0]
print "[Authorlist]"; print authorlist[0]

ÆÀ¤é¤ì¤¿·ë²Ì¤Ï

<_sre.SRE_Match object at 0x00ADE860>
[PubDate]
                       <Year>2007</Year>
                       <Month>Nov</Month>
                       <Day>20</Day>
[Title]
RNA-based mutation analysis ¡ÊÃæÎ¬¡Ë PMS2 pseudogene interference.
[Abstract]
Heterozygous germline mutations ¡ÊÃæÎ¬¡ËWiley-Liss, Inc.
[Authorlist]
                <Author>
                   <LastName>Etzler</LastName>
                   <FirstName>J</FirstName>
                   <Initials>J</Initials>
               </Author>
               <Author>
                   <LastName>Peyrl</LastName>
                   <FirstName>A</FirstName>
                   <Initials>A</Initials>
               </Author>
               ¡Ê°Ê²¼Î¬¡Ë

¤³¤Î»ÅÁȤò¡¢Á°¥Ú¡¼¥¸¤Î¥×¥í¥°¥é¥à¤Ë²Ã¤¨¤ë¤È¡¢¤³¤Î¤è¤¦¤Ê¤³¤È¤Ë¤Ê¤ë¡£Á´ÂÎ¤Ï filegetlist_access_abstract.py

if count>3:
  count = 3   #  For experiments, I limit "count" up to 3 papers.

## Next, get the abstract of each article
articles = []

for i in range(int(count)):
  efetch = utils + "/efetch.fcgi?rettype=" + report + "&retmode=xml&retstart=" + str(i) + \
    "&retmax=" + "1" + "&db=" + db + "&query_key=" + querykey + "&WebEnv=" + webenv
  f = urllib.urlopen(efetch)
  efetch_result = f.read()
##  print efetch_result
  onearticle = []
  aref = re.compile('.*<PubDate>(.+)</PubDate>.*<ArticleTitle>(.+)</ArticleTitle>' + \
     '.*<AbstractText>(.+)</AbstractText>.*<AuthorList>(.+)</AuthorList>', re.S)
  a = aref.search(efetch_result)
  if a:
    onearticle.append(a.group(1))
    onearticle.append(a.group(2))
    onearticle.append(a.group(3))
    onearticle.append(a.group(4))
  else:
    onearticle.append("")
    onearticle.append("")
    onearticle.append("")
    onearticle.append("")
  articles.append(onearticle)

for i in range(int(count)):
  print "--[" + str(i) + "]---------"
  print articles[i][0]
  print articles[i][1]
  print articles[i][2]
  print articles[i][3]

¤³¤ì¤Î½ÐÎϤϡ¢fileabstractout.txt¤Î¤è¤¦¤Ë¤Ê¤ë¡£


źÉÕ¥Õ¥¡¥¤¥ë: filegetlist_access_abstract.py 1366·ï [¾ÜºÙ] fileabstractout.txt 1526·ï [¾ÜºÙ] filetestabstract.py 1437·ï [¾ÜºÙ]

¥È¥Ã¥×   ÊÔ½¸ Åà·ë º¹Ê¬ ¥Ð¥Ã¥¯¥¢¥Ã¥× źÉÕ Ê£À½ ̾Á°Êѹ¹ ¥ê¥í¡¼¥É   ¿·µ¬ °ìÍ÷ ñ¸ì¸¡º÷ ºÇ½ª¹¹¿·   ¥Ø¥ë¥×   ºÇ½ª¹¹¿·¤ÎRSS
Last-modified: 2007-11-23 (¶â) 17:10:50 (5553d)