![]() |
¥Î¡¼¥È/Python¤½¤Î£³https://pepper.is.sci.toho-u.ac.jp:443/pepper/index.php?%A5%CE%A1%BC%A5%C8%2FPython%A4%BD%A4%CE%A3%B3 |
![]() |
¥Î¡¼¥È
Á°¤Î¥Ú¡¼¥¸
ˬÌä¼Ô¿ô¡¡4147¡¡¡¡¡¡¡¡¡¡¡¡ºÇ½ª¹¹¿·¡¡2007-11-23 (¶â) 17:10:50
¤½¤Î¤¿¤á¤Ë¤Ï¡¢Á°ÊǤΥ¢¥¯¥»¥¹»þ¤Î¥â¡¼¥Éretmode¤ò¡¢¥Æ¥¥¹¥Ètext¤Ç¤Ï¤Ê¤¯XML(xml)¤Ë¤¹¤ë¡£
XML¥â¡¼¥É¤ÇÆÀ¤é¤ì¤ë¥Ç¡¼¥¿¤ÎÎã
<?xml version="1.0"?> <!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2007//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/pubmed_070101.dtd"> <PubmedArticleSet> <PubmedArticle> <MedlineCitation Owner="NLM" Status="Publisher"> <PMID>1803 0674</PMID> <DateCreated> <Year>2007</Year> <Month>11</Month> <Day>21</Day> </DateCreated> <Article PubModel="Print-Electronic"> <Journal> <ISSN IssnType="Electronic">1098-1004</ISSN> <JournalIssue CitedMedium="Internet"> <PubDate> <Year>2007</Year> <Month>Nov</Month> <Day>20</Day> </PubDate> </JournalIssue> </Journal> <ArticleTitle>RNA-based mutation analysis identifies an unusual MSH6 splicing defect and circumvents PMS2 pseudogene interference.</ArticleTitle> <Pagination> <MedlinePgn/> </Pagination> <Abstract> <AbstractText>Heterozygous germline mutations in one of the mismatch repair (°Ê²¼Î¬) </AbstractText> </Abstract> <Affiliation>Department of Medical Genetics, Medical University Vienna, Vienna, Austria.</Affiliation> <AuthorList> <Author> <LastName>Etzler</LastName> <FirstName>J</FirstName> <Initials>J</Initials> </Author> <Author> <LastName>Peyrl</LastName> <FirstName>A</FirstName> <Initials>A</Initials> </Author> <Author> <LastName>Zatkova</LastName> <FirstName>A</FirstName> <Initials>A</Initials> </Author> <Author> <LastName>Schildhaus</LastName> <FirstName>H-U</FirstName> <Initials>HU</Initials> </Author> <Author> <LastName>Ficek</LastName> <FirstName>A</FirstName> <Initials>A</Initials> </Author> <Author> <LastName>Merkelbach-Bruse</LastName> <FirstName>S</FirstName> <Initials>S</Initials> </Author> <Author> <LastName>Kratz</LastName> <FirstName>C P</FirstName> <Initials>CP</Initials> </Author> <Author> <LastName>Attarbaschi</LastName> <FirstName>A</FirstName> <Initials>A</Initials> </Author> <Author> <LastName>Hainfellner</LastName> <FirstName>J A</FirstName> <Initials>JA</Initials> </Author> <Author> <LastName>Yao</LastName> <FirstName>S</FirstName> <Initials>S</Initials> </Author> <Author> <LastName>Messiaen</LastName> <FirstName>L</FirstName> <Initials>L</Initials> </Author> <Author> <LastName>Slavc</LastName> <FirstName>I</FirstName> <Initials>I</Initials> </Author> <Author> <LastName>Wimmer</LastName> <FirstName>K</FirstName> <Initials>K</Initials> </Author> </AuthorList> <Language>ENG</Language> <PublicationTypeList> <PublicationType>JOURNAL ARTICLE</PublicationType> </PublicationTypeList> <ArticleDate DateType="Electronic"> <Year>2007</Year> <Month>11</Month> <Day>20</Day> </ArticleDate> </Article> <MedlineJournalInfo> <MedlineTA>Hum Mutat</MedlineTA> <NlmUniqueID>9215429</NlmUniqueID> </MedlineJournalInfo> </MedlineCitation> <PubmedData> <History> <PubMedPubDate PubStatus="pubmed"> <Year>2007</Year> <Month>11</Month> <Day>22</Day> <Hour>9</Hour> <Minute>0</Minute> </PubMedPubDate> <PubMedPubDate PubStatus="medline"> <Year>2007</Year> <Month>11</Month> <Day>22</Day> <Hour>9</Hour> <Minute>0</Minute> </PubMedPubDate> </History> <PublicationStatus>aheadofprint</PublicationStatus> <ArticleIdList> <ArticleId IdType="doi">10.1002/humu.20657</ArticleId> <ArticleId IdType="pubmed">18030674</ArticleId> </ArticleIdList> </PubmedData> </PubmedArticle> </PubmedArticleSet>
¤³¤ì¤òÃê½Ð¤¹¤ë¤¿¤á¤Ë¤Ï¡¢¥Ñ¥¿¡¼¥ó¥Þ¥Ã¥Á¥ó¥°¤òº£¤Ò¤È¤Äºî¤êľ¤¹É¬Íפ¬¤¢¤ë¡£
¤Þ¤º¤Ï¡¥Õ¥£¡¼¥ë¥ÉÃæ¤Ë¶õÇò¤ä¥Þ¥¤¥Ê¥¹¡¢¥Ô¥ê¥ª¥É¤Ê¤É¤ò´Þ¤à¤Î¤Ç¤½¤ì¤ò¤â¼è¤ê½Ð¤¹¤³¤È¡£¥Æ¥¹¥È¥×¥í¥°¥é¥à¤È¤·¤Æ¤Ïtestabstract.py
mref = re.compile('.*<PubDate>(.+)</PubDate>.*<ArticleTitle>(.+)</ArticleTitle>.*<AbstractText>(.+)</AbstractText>.*<AuthorList>(.+)</AuthorList>', re.S) a = mref.search(efetch_result) print a pubdate = [] articletitle = [] abstracttext = [] authorlist = [] if a: pubdate.append(a.group(1)) articletitle.append(a.group(2)) abstracttext.append(a.group(3)) authorlist.append(a.group(4)) else: pubdate.append("") articletitle.append("") abstracttext.append("") authorlist.append("") print "[PubDate]" ; print pubdate[0] print "[Title]" ; print articletitle[0] print "[Abstract]"; print abstracttext[0] print "[Authorlist]"; print authorlist[0]
ÆÀ¤é¤ì¤¿·ë²Ì¤Ï
<_sre.SRE_Match object at 0x00ADE860> [PubDate] <Year>2007</Year> <Month>Nov</Month> <Day>20</Day> [Title] RNA-based mutation analysis ¡ÊÃæÎ¬¡Ë PMS2 pseudogene interference. [Abstract] Heterozygous germline mutations ¡ÊÃæÎ¬¡ËWiley-Liss, Inc. [Authorlist] <Author> <LastName>Etzler</LastName> <FirstName>J</FirstName> <Initials>J</Initials> </Author> <Author> <LastName>Peyrl</LastName> <FirstName>A</FirstName> <Initials>A</Initials> </Author> ¡Ê°Ê²¼Î¬¡Ë
¤³¤Î»ÅÁȤò¡¢Á°¥Ú¡¼¥¸¤Î¥×¥í¥°¥é¥à¤Ë²Ã¤¨¤ë¤È¡¢¤³¤Î¤è¤¦¤Ê¤³¤È¤Ë¤Ê¤ë¡£Á´ÂΤÏ
getlist_access_abstract.py
if count>3: count = 3 # For experiments, I limit "count" up to 3 papers. ## Next, get the abstract of each article articles = [] for i in range(int(count)): efetch = utils + "/efetch.fcgi?rettype=" + report + "&retmode=xml&retstart=" + str(i) + \ "&retmax=" + "1" + "&db=" + db + "&query_key=" + querykey + "&WebEnv=" + webenv f = urllib.urlopen(efetch) efetch_result = f.read() ## print efetch_result onearticle = [] aref = re.compile('.*<PubDate>(.+)</PubDate>.*<ArticleTitle>(.+)</ArticleTitle>' + \ '.*<AbstractText>(.+)</AbstractText>.*<AuthorList>(.+)</AuthorList>', re.S) a = aref.search(efetch_result) if a: onearticle.append(a.group(1)) onearticle.append(a.group(2)) onearticle.append(a.group(3)) onearticle.append(a.group(4)) else: onearticle.append("") onearticle.append("") onearticle.append("") onearticle.append("") articles.append(onearticle) for i in range(int(count)): print "--[" + str(i) + "]---------" print articles[i][0] print articles[i][1] print articles[i][2] print articles[i][3]
¤³¤ì¤Î½ÐÎϤϡ¢abstractout.txt¤Î¤è¤¦¤Ë¤Ê¤ë¡£