Adventures in metadata analysis (bibliometrics) with Ruby and the Journal of Pediatric Surgery....

Charles L. Snyder

I analyzed metadata regarding the preceding 10 years of articles published in the Journal of Pediatric Surgery.  The intent was to ascertain a) where the articles came from geographically: country, zip code (for US sources), institution b) data regarding authors-average number of authors per paper per year, "top 25" authors and their number of publications, c) article content by keyword analysis-most common keywords, "top 25" most common topics/keywords, d) analysis of content by title and abstract-searching for the most common words found in both title and abstract, and their relative frequency. The journal was analyzed over a ten-year period.  Trends were looked for: a) number of articles per year, b) number of authors per article per year, c) site of origin of the articles over time, d) "popular topics" as they changed over time

Methods

The United States National Library of Medicine "PubMed" database was queried via EndNote software (Windows version 7, Thomson ISI).  This data was sorted by year, and exported into an XML file ("extensible markup language"). The text analysis program was written in the Ruby language (a high level scripting language) and used to search and parse the XML files. Regular expressions "Regex" were used in the queries.  For example, "/d{5}-d{4}/ =~ x" means 'search for any text string five digits in length, followed by a hyphen, with four additional digits at the end' (zip code search). The data was analyzed with Mathematica 5.1 (Wolfram Research, Peoria, IL).

Brief Results:

The following table is a list, in descending order, of the top 25 authors (quantitatively) from the last 10 years of JPS:

Author, Last Name Author, First Name Number of Articles
Miyano
 T.
71
Adzick
 N. S.
66
Harrison M. R. 64
Yamataka A. 62
Tanyel
 F. C.
62
Puri
 P.
59
Spitz
 L.
56
Okada
 A.
52
Kobayashi
 H.
52
Buyukpamukcu
 N.
49
Pierro
 A.
48
Suita
 S.
43
Langer
 J. C.
41
Ciftci
 A. O.
40
Senocak
 M. E.
38
Coran
 A. G.
38
Lane
 G. J.
38
Grosfeld
 J. L.
38
Albanese
 C. T.
37
Laberge
 J. M.
36
Hirschl
 R. B.
36
Tam
 P. K.
36
Glick
 P. L.
34
Tovar
 J. A.
32

The top 25 countries from which the articles originated were as follows:

Country Number of Articles
USA
1710
Japan
498
Turkey
272
England
259
Canada
235
Germany
104
Italy
100
India
91
Netherlands
81
Australia
78
China
78
Ireland
74
France
66
Spain
66
UK
61
Israel
55
Taiwan
45
Hong Kong
42
Brazil
37
Finland
33
Scotland
25
Belgium
22
Switzerland
22
Mexico
21
Saudi Arabia
18

For articles from the United States, the top zip codes were analyzed:

ZipCode Num Articles
02115
65
94143
61
19104
60
48109
40
15213
36
45229
33
64108
31
90027
24
46202
23
14222
22
38105
19
77030
19
43205
18
02114
17
10032
17
90095
17
80218
16
29425
15
84113
14
10021
14
52242
13
23298
12
63104
11
35233
10
98105
10

There are a number of Zip code finders available - the coolest one is probably Ben Fry's Zipdecode. A number of other parameters were examined, and are available on request.