PTU - Polskie Towarzystwo Urologiczne

Technology of searching information in the bibliographic database MedLine
Artykuł opublikowany w Urologii Polskiej 2008/61/4.

autorzy

Ihor Shadyorkin1, Viktoriya Shadyorkina2, Alexander Shulyak3, Marian Tarchynets4
1FGU „Research Institute for Urology of Rusmedtechnologies”, Moscow, Russia
2District Clinical Oncological Dispensary, Krasnodar, Russia
3Lviv National Medical University named after Danylo Halytskyy, Lviv, Ukraine
4Tripharma Ilac Sanayi A.S. Representative Office, Kiev, Ukraine

słowa kluczowe

biomedical information Internet MedLine database

Every educational and scientific work begins with literature search on the problem of interest. In the age of development of information and computer technologies a possibility to make this search significantly faster, easier and more comfortable has emerged. The wealth of biomedical information generated during the long years has been accumulated in a number of major scientific centers and is stored in computer databases.

Taking into consideration the great interest towards such services and significant expenses on collection and maintenance of such information databases, many organizations allow access to their services on a commercial basis. Along with these there are centers that offer their services free of charge and without substantial limitations, which is achieved through government support directed at development of science.

MedLine is the largest bibliographic database that covers over 75% of the world’s biomedical periodicals, a project of the American National Library of Medicine [1]. It holds leadership among alike databases on the amount and quality of information offered, the service being free of charge and convenient.

At the moment this article was written, MedLine database contained citations from articles in 5835 world’s journals. The latter includes the vast number of journals of urological and andrological subjects, including such periodicals as Andrologia, BJU International, British Journal of Urology, European Urology, International Journal of Andrology, The Journal of Sexual Medicine, The Journal of Urology, Urologiia (Moscow, Russia), and many others.

The MedLine database is represented by several sections. The MedLine itself comprises nearly 12 million citations (a citation is a short information on a publication) and article abstracts, published in international biomedical journal since 1966. OldMedLine contains 1,7 million citations and article abstracts, published between 1950 and 1965. PreMedline contains bibliographic information and abstracts of publications not yet entered to Medline, which is refreshed on daily basis. Approximately half a million of new citations and abstracts are added to MedLine annually.

In order to create the system of information search the database itself is not enough, an access interface is required. The most convenient is Internet-based access. That was the purpose of the PubMed system, which, in turn, is a bibliographic component of Entrez search system [1].

PubMed is not the only service allowing for search in the MedLine database. Among the most known search engines the following should be noted: Ovid, Scopus, CDL MEDLINE (earlier - MELVYL MEDLINE). Each of the systems mentioned has its own peculiarities, which open multiple opportunities of scientific work with the aggregated information.

PubMed is a free resource. It was created and supported by the NCBI (National Center for Biotechnology Information) and NLM (U.S. National Library of Medicine), both being the components of the National Institutes of Health, USA.

The PubMed search system gives the ability to search not only in the bibliographic database of MedLine, but also in the number of other resources: e-books, clinical trial materials and other periodicals adjacent to the medical specialty.

The PubMed search service is represented by two types of interface. The main interface is available at http://www. ncbi.nlm.nih.gov/sites/entrez?db=pubmed, and also at http:// www.pubmed.com/ (both links referring to the same web-site. The light interface (plain text), which is accessible at the link «Text Version» from the main interface, or by typing the path in your browser: http://www.ncbi.nlm.nih.gov/entrez/queryd. fcgi?linkbar=plain.

On entrance to the PubMed web-site you will instantly get to the main window of the program, from which queries to MedLine database can be placed: The PubMed system provides the following possibilities:

  • Forming of simple queries to MedLine database
  • Forming of complex queries with multiple instruments, which allow for fine regulation of parameters.
  • Creation of queries based on key words using the controlled thesaurus MeSH®
  • Creation of personal page that provides certain additional options: saving queries and search results, additional search options, additional service settings, creation of automated search with results sent to your e-mail (monitoring of MedLine database).
  • Obtaining of search results in the form of citations or abstracts in different electronic formats, which later can be sent to an e-mail.
  • Transitions at the offered links to the web-pages of electronic representations of the journals, where full-text copies of articles can be obtained. Some publishing houses offer such an opportunity free of charge, but in most cases you will have to pay for a full-text article.
  • Document delivery services
  • Help system with convenient video-clips showing the main stages of working with PubMed.

All the above mentioned services are provided free of charge.

In order to form a simple query to MedLine database it is enough to enter a search word to the window “query” and, without changing anything in the other options of the web-site, click the «Go» button. In case of simple search conditions the latter should be enough, but complex tasks call for a more detailed study of the query language used in the PubMed system.

The basis for this language is the Boolean logic, which is a mathematical system called in honor of the English mathematician George Boole. This logic connects terms by means of several language constructions – logical operators : AND, OR, NOT.

By default, if you typed a query that consists of several words, the system breaks it into individual words and inserts AND operator before each word. For instance, if you typed «cancer penis» in the query string, the search engine will present your query as «cancer AND penis» before requesting a search, which is translated from the machine language as «search the database for the information on cancer AND penis, contained in the same article».

The AND operator, placed between the words, means that in the returned query there must be the citation of the article, where all words from query, connected with AND will be found.

The OR operator means that the user is searching for information, where one or the other word will be found, but it is not required that they should be all in the same article. For example, if you want to find all articles on penile cancer or urethral cancer, you must place OR operator among these, search items.

The NOT operator gives a command to the system to return information, where the term located on the right in the query string will not be present. For example, the objective is to find articles on the ultrasound study of the prostate gland, which do not involve the cancer thereof. The query in that case will look the following way: «prostate AND ultrasound NOT cancer».

Queries can be grouped using brackets. For instance, the previous query may look the following way: «(prostate AND ultrasound) NOT cancer». Queries can be very complex due to combinations of operators, coupling of words into groups and insertion of groups into each other.

If you want the words in the search results to be strictly following each other and in the very sequence you entered them in the query, you have to place them in quotation marks “prostate cancer”. If a part of the word is unknown or you want to find all lexical forms of the term, you can use the asterisk operator «*» instead the changeable part of the word, for example «uro*» will stand for all words beginning with “uro-”. The system provides a useful opportunity to indicate where to search the required words in the article citation. There are several instruments for that.

In intuitively understandable is the use of search limitation (the «Limits» tab on the search page). On this additional page you can indicate the exact authors, whose materials you look for; the name of the journal; the contents of terms in the abstract; the date and the language of the publication; the type of the article and special limiting tags, which will be described later on. After you determine the margins of the search, you can click the «Go» button in order for the system to start the database search with new parameters.

Another instrument allowing to narrow the search margins is the search tags. These are the strictly defined letter abbreviations for search commands, placed in square brackets. For example, [AU] – is a tag, which is an abbreviated name of the «Author» field in the article citation. The tags are placed on the right from the word entered to the query and determine the belonging of this word to a certain category. The word «Debruyne[AU]» with a tag [AU] will mean that we want Debruyne to be one of the authors of the article in the query.

A very flexible search system is the controlled dictionary of terms MeSH [2]. This dictionary is a hierarchical dictionary of 16 main branches, such as Anatomy; Organisms; Diseases; Chemical Substances and Drugs; Diagnostics, Treatment, Therapeutic Technique and Equipment; Healthcare; Geography terms and certain others. This dictionary contains 25 thousand main terms, 172 thousand additional terms and around 100 thousand accessory terms. The higher ‘branches’ in the hierarchy of the dictionary diverge into smaller ones, which clarify a certain definition. Thus, for example, the term ‘prostatitis’ belongs to the branch “diseases › male urogenital diseases › prostatic diseases › prostatitis”.

Every citation after being placed to MedLine database is matched to the terms, contained in this dictionary, or in other words the article is classified. From the database point of view this process is called indexation of articles. The article, as a rule, is not matched to one term, but too many. Usually it is 5-25 terms per one article.

There is a «MeSH Database» link at the main PubMed page at the search page using MeSH database. At this page you can enter a keyword or key phrase, which, in your understanding is the key one for information search. The system will then display all connected terms from the MeSH dictionary and you will be able to perform a more thorough review of possible search options, while moving through dictionary branches. This will allow you to either narrow or expand your query.

On the right from each MeSH term there is an active link «Links». As you click it, you’ll get a drop-down menu where you can choose the source of the search and proceed to the results page. On this page you will see the citations of the articles connected to this term.

A vast team of computer engineers works on the creation of the dictionary with specific knowledge in the field of medicine, indexation of articles performed with technologies of artificial intellect [3]. For this reason searching with MeSH is probably the most effective method of searching for information in MedLine database.

After composing and executing a query, the PubMed system will display on the main page the search results as a list of article citations found. In the upper part of this page there are instruments, which allow managing citations. You can perform the following actions with this panel:

  • Change display format of citations. Information can be given in a short form, an abstract can be shown, search results can be presented in the XML format, a number of other useful options is available.
  • Chang the number of citations viewed on one page.
  • Sort results by date, by authors, by journal name.
  • Send the article to the printer, save it as text file, copy it to clipboard, send it via e-mail, broadcast it via RSS channel.
  • One can perform navigation through pages by clicking the «Next» link ()(next page) or by entering the number of pages to the window and clicking the «Page» button.

Below the navigation menu there is a page, containing the found citations. By default, (the «Summary» display mode) an individual citation is represented by a pictograph, which gives a visual display of whether the found citation contains an abstract and whether links to free full-text material are available in the Internet. The system labels every quotation with a reference number in the row. The citation contains the following fields that are highlighted on Fig. 4:

  • the authors of the article, as an active link. By clicking this link one can get to the page containing all citations of the articles of a given author available in MedLine;
  • the name of the article;
  • the short name of the journal where the article was published;
  • the date the article was published;
  • the issue number where the article was published.

Additional options are placed on this page in tabs. At the tab «Limits» there is a possibility to narrow the search results, what we described above.

At the tab «Preview/Index» you can add extra limiting terms and indexes to your query, including tags and logical search operators [4]. As you press the «Preview» button, you can get a preliminary number of citations that will be found by the query you created. While transferring via the active link to the number of citations found, one can study the obtained results in more detail. Thus, this instrument allows to promptly form the tactics and strategy of the search without overloading the page by citations. Different variants of the queries remain seen on one page.

The «History» tab saves the results of the recent queries you made in the Pubmed system. By clicking the «Clear History» button you can clear that list.

The «Clipboard» tab is a convenient instrument for temporary storage of queries that works as an analogy of clipboard in the Windows operating system. By selecting the required citations on the search results page one can place them to the clipboard for temporary storage.

In order to do this, select the «Clipboard» option in the «Send to» drop-down menu. Then the citations will be placed for temporary storage and will be available in this tab even eight hours after you log out from the system. Clipboard can contain not more than 500 citations. Citations of the articles placed there remain highlighted with green color both in the very clipboard and in the search results.

The «Details» tab allows detailed viewing of queries to the database and editing these queries in the event of complex and cumbersome constructions.

The «Authority Index» tab is the place where you can enter a searched word and see the found authors by the indexes thereof, created in the system. By clicking the name of the author in the link one can proceed to the page with a bibliographic list of citations of the articles.

In order to get to the page with detailed description of one of the articles one has to click a pictograph with a detailed description of one of the articles which looks as a small sheet, which stands for a journal.

If a citation contains an abstract, one can view its description on this page (Fig. 5). In this very place, on the right from the abstract, there is a link to the full-text version and thematic links which the developers of the search engine considered semantically connected to the article.

On the single citation page there is also a set of options that are similar to those on the general page of search results. Using that page one can change the display of the citations and save it, place citations to the clipboard or send them by e-mail .

The additional opportunities are opened with the launch of a personal page with the PubMed system (it is referred to on the web-site as «My NCBI»). In order to do this one has to complete a simple registration process by clicking the «Register» link, located in the top right corner of the PubMed main page. After that you will get your login (username) and password that will allow you to authorize on the WEB-site using the «Sign In» link (Fig. 6).

Registration is optional on that web-site, but it can make frequent information searches in MedLine easier. After logging to the personal page (point A on Figure 7) one has an option to save search results. To do so, complete the search and click the «Save Search» link (point B on Figure 7).

In a separate window the system will prompt one to enter the name of the search being saved to allow subsequent identification thereof (Fig. 8). In the same window one can set up automatic delivery of new query results to one’s e-mail address.

After saving search results there will a number of accumulated queries that will be stored on the personal page. You can re-access the found materials by clicking the “Query” link (point A on Figure 9), check the date of the last update (point B on Figure 9), customize automatic e-mail delivery (point C on Figure 9) or check manually what new has appeared in the MedLine database by checking the boxes near query and clicking the button «What’s New for Selected» (point D on Figure 9).

Despite all the variety and the powerful abilities of PubMed the required information is often not so easy to find. In many cases it results from poor knowledge of the English language by many urologists and andrologists. However, there are certain problems that cannot be overcome given the current state of information technologies. This is primarily related to accumulation of an immense bulk of information, which is often poorly structured, lacking uniformity and imprecise [5,6,7]. A few articles that a researcher actually needs can be drowned in a vast ocean of information, especially if the studied issue is highly specific.

In order to solve such problems the whole flow of information must be organized and structured, and, which is the most important and the most difficult – a mutual semantic connection of individual fragments of knowledge must be created [5,6,7].

That is the way the development of MedLine database consistently follows. The creation of managed vocabulary of terms MeSH and indexation of citations according to the created structure will allow for subsequent self-description of stored data with possible programmed analysis. The National Library of Medicine of the United States has united the efforts of many scientific trends by creating the UMLS (Unified Medical Language System) which is the basis for medical knowledge databases that are different in principle from conventional databases.

The practical value of creating databases is the ability of the user to post questions to the system in a convenient language. This will generate a response adequate to the expectations of the researcher, no matter complex the subjects is.

piśmiennictwo

  1. PubMed Help / National Center for Biotechnology 1. Information, U.S.
  2. National Library of Medicine [Электронный ресурс] – 8600 Rockville Pike,
  3. Bethesda, MD 20894, 2004-2007. – Режим доступа: http://www.ncbi.nlm.
  4. nih.gov/books/bv.fcgi?rid=helppubmed, свободный. – Загл. с экрана.
  5. Medical Subject Headings (MeSH) / U.S. National Library of Medicine
  6. [Электронный ресурс] – 8600 Rockville Pike, Bethesda, MD 20894, 2003-
  7. 2007. – Режим доступа: http://www.nlm.nih.gov/mesh/, свободный.
  8. – Загл. с экрана.
  9. Ruch P, Geisbuhler A, Gobeill J et al: Using discourse analysis to improve
  10. text categorization in MEDLINE. Medinfo 2007, 12 (Pt 1), 710-715.
  11. Герасевич ВА, Аветисов АР: MedLine.Практическое руководство.
  12. Белорусс кий медицинский журнал 2004, 1, с. 26-27.
  13. Андрейчиков АВ, Андрейчикова ОН: Интеллектуальные информационные системы: Учебник. Финансы и статистика, 2006, с. 92-103.
  14. Берсегян МС: Технология анализа данных Data Mining, Visual Mining, Text Mining, OLAP АА Берсегян, МС Куприянов, ВВ Степаненко, ИИ Холод. – 2-е изд., перераб., и доп. – СПб.: БХВ-Петербург, 2007. –с. 206-223.
  15. Чубукова ИА. Data Mining: Учебное пособие И. А. Чубукова. – М.: Интернет-
  16. Университет Информационных Технологий; БИНОМ. Лаборатория знаний, 2006. – c.47-61.: ил., табл. – (Серия «Основы информационных
  17. технологий»).