Next Generation Scientific Articles: Reflections on a changing landscape
Elsevier’s “Article of the Future” looks like an article from the past, with some embedded hyperlinks, some AJAX tabs, two basic social media elements, and not much else.
No doubt “Article of the Future” failed to convince the scientific community on several key areas. It does not mean Elsevier should abandon experimenting with their content, in fact they are experimenting with several other good features such as GenBank linking, ThermoML linking and most notably NextBio integration. Several ScienceDirect articles contain GenBank sequences, these are now linkable to the description of that particular sequence in GenBank. Most of mathematical content on ScienceDirect is now rendered as MathML. I find these kind of enhancement more innovative and appealing from a user perspective, and for me these are real feature that next generation scientific articles should deliver. Integration of NextBio technology with ScienceDirect interface delivers a number of powerful benefits. This integrations has enabled not only accelerated discoverability of key articles but it also provides more insight from existing content through NextBio’s ontology-based semantic framework, extracted correlation with existing knowledge base, and tight integration with publicly available data sources. In his article Michael Nielsen has mentioned several example mentioning how scientific publishing is becoming progressively tech savvy and he also questioned
how many scientific publishers are as knowledgeable about technology as Steve Jobs, Sergey Brin, or Larry Page?
In my opinion in future scientific publishers not necessary need to be purely technology oriented companies, that is where tech savvy companies like NextBio, ChemSpider and Talis will play major role by working side by side with traditional publishers. Technological outsourcing seems a quick way to upgrade current publishing system. You may also call it as vertical disintegration of services which will embrace the new opportunities for small innovative startups, but at the end next generation publishing technology will be modular and intergated. To divert this discussion from Elsevier I would also like to point out Elsevier is neither first group nor only group experimenting with scientific articles. In last one year PLoS, Nature, Science and RSC have implemented several major changes to their web interface. Actually RSC was one of very first publisher to adopt semantic rich publishing in chemical sciences, what they called as Project Prospect launched at the beginning of February 2007. Project Prospect has pioneered the use of ontologies (Gene Ontology, Sequence Ontology, Cell Ontolog and ChEBI), unique compound identifiers (InChI) and structural information (InChI, SMILES and CML) within the research articles. Project used Open Source Chemical Analysis Routines (OSCAR) for text mining in order to to attach structural information to chemical names. These enhancements have made identification and highlighting of key content more easier than ever and by end of April 2008 it was possible to do structure and substructure searching of compounds within enhanced articles RSC. Interestingly most of technology used for Project Prospect was open source and it was developed together with UK academics groups Unilever Centre for Molecular Informatics and the Computer Laboratory, both at the University of Cambridge. Earlier this year RSC has acquired ChemSpider which reflects RSC’s commitment to provide semantic and data rich scientific articles. Later this year Nature Chemistry followed the foot step of RSC, for example links to PubChem and ChemSpider, aggregating all molecular information in data rich compound section, support for InChI, InChIKey and SMILES, download in CML are few early implementation available in very first issue of Nature Chemistry. No doubt Nature Chemistry have done a great job with existing tools in t
heir very first issue. Egon Willighagen at Chem-bla-ics outlined a very brief analysis about Nature Chemistry features. He points out
Like many other chemistry journals, Nature Chemistry does not consider properties of the molecule interesting, and NMR spectra are hidden in the Supplementary Information.
Of course raw chemical spectra data is of prime importance. Raw data which is mostly hidden in either supplementary material or images of the article appears to be biggest barrier for a data rich publishing framework. Unfortunately text mining does not works well with images, although technically it is feasible to publish images as either as search enabled SVG format or by embedding additional information in images. Ironically despite being W3C standard SVG is not considered as major requirement for semantic rich publishing, I guess the road of semantic rich publishing goes beyond RDF and metadata. Also most of journals have no strong policy to encourage the authors to deposit the row data either in public repositories or make it available through journal website. This is something left on discretion of author’s convenience. Ideally original data should be linked with the figure in respective standard file formats. If I am not wrong then journal Molecular Systems Biology (MSB) has implemented a similar kind of pattern where graph image data can be downloaded as Microsoft excel file. Structured data section of Molecular Systems Biology which archive structured data files such as SBML is small but significant step towards data rich publishing. It is important to note that systems biology community is also pushing hard to develop a new kind model publication paradigm which will enable to publish a complete working implementation of the model encoded and annotated according to the various open standards. David Nickerson at National University of Singapore has developed several working prototypes of this concept. Apparently every thing is out there, what publisher really need is to adopt and implement the domain specific standards, ontologies and guidelines. One major area where PLoS family of journals are working hard is to enable key web 2.0 based community features such as annotations, discussion threads, commenting, trackbacks, ratings, and so called article-level metrics. For example Related Content section of PLoS articles aggregates related articles on the web, related blog posts, citation in Scopus, PubMed Central and Google Scholar. Annotations is one of the most wanted feature for next generation scientific articles especially shared community annotation and tagging. Last but not least, for god shake stop promoting PDF.
Conclusion: It is hard to predict what an article of future will look like, but like every one else I anticipate it should be data and semantic rich. Publishers need to experiment a lot more before they claim the tag line of “Article of the Future”. Such experiments should be tested in community wide assessments and not in board rooms.




interesting commentary, lots to think about
Next Generation Scientific Articles: Reflections on a changing landscape: Michael Nielsen wrote a very interesti.. http://bit.ly/zAZeW
Next Generation Scientific Articles: Reflections on a changing landscape: Michael Nielsen wrote a very interesti.. http://bit.ly/zAZeW
Nature Chemistry does indeed consider the properties of the molecule interesting. It is very easy to make pithy statements about where NMR (or other data) is ‘hidden’ in a chemistry manuscript, when you have no idea how scientific publishing actually works on a day-to-day basis – I have worked at three different journals, and getting authors to put their references in the correct format or submit their figures using the right file format is sometimes a Herculean challenge. If we required authors to submit raw spectroscopic data as a matter of course, this would be a nightmare. We certainly encourage authors to submit raw data, but the vast majority do not, and send us pdfs of NMR spectra and such instead. How about people stop criticising journals for hiding data, and try and persuade the people who could really make it happen to get on board – the authors!!
First of all there is no mere criticism (every one appreciated what publishers are doing with the articles), secondly and certainly I have no idea how scientific publishing actually works on a day-to-day basis, but I do have some common sense. Why there are more than 200 citation styles, what exactly is stopping the publishers to agree on a single citation style. NO they will not otherwise it will become easy for the authors. If Nature or Nature Chemistry think that they have enough experience to deal with these kind of issues then why they asked for feedback and inputs from the readers or why Elsevier will organize Article 2.0 contest?
Analysis of several recent high-profile blog posts: Next-gen scientific articles http://ow.ly/jUyf | (fisheye perspective)
Analysis of several recent high-profile blog posts: Next-gen scientific articles http://ow.ly/jUyf | (fisheye perspective)
Thanks for the “FishEye Perspective”. The article of the future is morphing quickly. In many cases the technologies already exist to enhance the articles and both you and Stuart are correct in your comments. You are correct to challenge in regards to “one format” for references. I’ve written over 100 peer-reviewed publications and that specific issue of reformatting citations is a depressing trial every time..and if resolved “someone” will win with their format and I judge that is part of the challenge.
I can agree with Stuart however that until you have “been” a publisher it’s easy to criticize from the outside. Having now moved from outside to inside publishing I am more aware of the challenges. And..I am only slightly immersed in the processes at present so I can only imagine what I am going to learn while at the RSC!
Stuart comments that authors are generally not willing to submit data. I think this is generally true. It is work to prepare data and binary file formats are not supported for even general spectral data and have to be converted to JCAMP for example. It’s possible, but it’s work. Some people will do it but they are few and far between. In my judgment many of the enhancements of the future will come from greater support from the authors and we have to provide tools to make it easy for them. We’re all busy and MORE work is unlikely to create a following. We’re building tools to make it easy for authors…
Thanks Antony for your comment. I also agree with Stuart comments that authors are generally not willing to submit data, but we are talking about top notch journals, at least the data which was used to create plots in figure section should be available for download. Time to time I use digitizer to get numerical values hidden in those plots, can’t we make it little simple by just asking author to provide the Excel file used to draw the plots. Community wide efforts in systems biology have made easier for authors to publish their model in SBML formats. Surprisingly Chemistry is a very old discipline compared to systems biology but best practices are still not well established . Hope this will not continue for longer.
Abhishek – apologies for the tone of my comment earlier today, I realise reading it back that it was a little sharper than it needed to be. But, I do maintain that it is easy to criticise publishers and journals – and some of it we deserve, but some of it we don’t. Or at least that’s what I believe – but I am obviously somewhat biased! However, we are willing to have a dialogue with the community – we do want input. We may not want (or be able) to act on all of it, but at least we’re asking! Other publishers are asking too – but there are some notable exceptions…
Tony makes a great point in that to encourage authors to submit data, we need to offer them tools to make it easy – we (and by ‘we’ I mean journals in general, not just Nature ones) make authors jump through a lot of hoops to submit a paper – probably too many hoops; but these things don’t change overnight – however much some of us on the ‘inside’ might wish them to. As long as journals are receptive to authors wanting to submit data – which I know Nature Chemistry and those at the RSC are, then at least there is a possibility for change; but it must be driven by the academic community. Whether it’s a top-notch journal or otherwise, it is not a practical model to force authors to submit data. The ‘make it easy’ argument is supported by the fact that the only type of data file we actually host as Supplementary Information are cif files – and these are standard when you get a crystal structure. The problem is that for other types of experimental data, there is no widely recognised universal standard (ask a bunch of practising chemists what JCAMP files are and I bet not many will know).
Perhaps it is because chemistry is an older discipline that it is harder for change to be embraced. Because systems biology is a newer field, it is not weighed down by ‘old-fashioned’ ways of doing things – or not so much anyway.
The other thing to consider is that while there are a number of people very vocal about publishing chemistry in new ways – this is still very much a small number. There is a large silent majority – what do they want? I do not say this to justify not changing anything, but I really do believe that this needs to be a community driven effort. If more people submit data to journals that are capable and willing to host it, then this practice will likely gain traction. As new generations of graduate students become academics and in turn their students know nothing other than the digital age, things will get easier. But, a journal cannot mandate this.
And as for references, I agree. But if you put the major chemistry publishers in a room and try and get them to agree on a format for references, it would get very messy…
No worries Stuart, Its quite natural-you guys are working hard to come out with best you can, may be our (the vocal guys) expectation are very high from Nature Chemistry. The whole idea to write this article was to put a commentary on how different publishers are working in different direction to deliver article of future. NO one knows what they will look like. Otherwise I strongly agree different points you detailed here. Most of authors are not so much tech savvy as we think, and that’s may be the reason there is a large silent in the community. In past many ideas and implementations have been dropped due to cold response from the community which was really disappointing. Now we all agree that we need to encourage the authors to submit raw data in standardized formats. How you will do that, get a rewarding systems in place first . For example many OA publishers offer concession in OA charges if the manuscript is formatted with EndNote or any other reference manager. So All we need is a rewarding system for the authors those are putting extra effort to submit raw data in standardized format. Reference styles and other issues between publishers can be resolved in same way as DOI or CrossRef was worked out in past.
Next Generation Scientific Articles: Reflections on a changing landscape http://bit.ly/45wA7i
Chemistry is one of the intangibles in a relationship and yet it can make or break it. There are a number of things you can do to “up the intensity” of a relationship, some depend on the state of the relationship at the moment and others depend on how committed you are to it.