Video details

DSpace 7: Angular UI for the leading repository platform | Bram Luyten

Angular
08.19.2020
English

Bram Luyten

Angular UI instead of JSPUI and XMLUI for DSpace 7? Bram Luyten tells you why at FOSDEM 2020.

Transcript

OK, good evening, everybody. I'm not sure if I 100 percent agree with the organizers policy to keep the very last and best presentation, the best presentation for the last. But given the fact that you're all still here, it's probably not that bad of a of a policy. I will try to do my best to make sure you're not wasting your time here. And hello, Dylan, nice to see you. For the past 13 years, I've been working with NGOs and academic institutions to help them to put their academic research and their digital objects online and accessible for free. I've been doing that together with a team of twenty eight people of Admire. We are a spinoff of the Kulwin University. And just in respect to our hosts, you will be you will be in case you live in our boat to very good, respectable universities. In my talk today, I will first it's I was a little bit afraid that Emily was going to say practically 100 percent of what I'm going to bring to you. So there's a little bit of overlap. But I will also comment a little bit on the state of academic publication today, before then talking about the displaced community and institutional repositories and how they fit into this picture. And the last part, that's the part that is a little bit technical, but I don't go into detail too much, is how the displaced community as a mature community, how they currently tackle a very big overhaul of discontinuing two existing UI and introducing a new UI to replace the two legacy ones. But first, because it's been such a long day, I know that maybe all of those papers and all the information is kind of a blur when you're back in your rooms or hotel rooms tonight. Is that if I just want to give you one idea to take away today is that everybody in the room or everybody not us admired, but everybody here really has the potential to accelerate scientific progress just by contributing to getting the results of scientific research both positive and negative, into the hands of more people quicker. And the speed. There compared to researchers who are just in their lab trying to make their claim in their scope as big as possible and just sit on their data for years. That's just something that makes me makes me cry. So but it's a very difficult trait of. So currently in that communication of results, how fast are we already going today or so, what's the what's the state of the art? Let's take you back to the pre Internet age where the publishers basically had marginal cost to print an issue of a journal and also take the distribution costs to get these journals out to the libraries in that kind of marginal cost scenario. They currently don't have that anymore because due to the Internet, it can go a lot a lot faster than than what it was then. Also, that I also wanted to mention in that old model, altar's didn't have to pay to get the research published, but also as it is today, I didn't get paid to send their research into the copyright of the publishers needed and the publishers made their margins or they recovered their costs by selling the subscriptions to the to the library's. When I go online today, and especially when we are looking at this threatening development of the coronavirus, I see prestigious journals and publishers making their content available for free, apparently also instantly. So if we think about the, um, the the discussions that we had about the review, apparently The Lancet has pages on their Elsevier has species on there. So somehow they have found a time to review or to make sure that this content is accessible online, which is an amazing development and kind of makes me wonder, like, OK, was there the problem is solved and it's even solved due to the generous help of the publishers, those benevolent actors in our in our space. But let me ask you this before I continue. Can you raise your hands if you know somebody today who has been diagnosed with the coronavirus antibody already would be kind of scary. But can you now raise your hands if you know somebody who's been diagnosed with a form of cancer? It's a lot more people, so when I want to read this recent article, February three, it's a very recent article in clinical oncology. And if I'm not privileged to be on the network of an institution that pays for this and I need to pay forty dollars for an article, and if you're approaching this as somebody who has a family member who's affected, my point of view is always like, look, it's only when I get to the PDF that I can assess whether it's a relevant or interesting in my case. So I kind of want to look at maybe hundreds of these PDF and maybe one of them will contain something useful. So it's totally not not not helping me or helping people who who don't have an institutional subscription. But when I go to the page of this journal, it actually says immediately under the title of the journal it says supports open access. And then I click on it and they see that support actually means that we allow authors the privilege to pay for to pay in article publishing charts. So basically, we as the publisher, we we do open access and close access. And if you as an author don't take the bait open access option, then you're the bad guy who doesn't want to set your Article three. That wouldn't really be a problem if these fees were like 50 dollars or maybe a couple of hundred dollars. But in some of these journals, it's like thousands, thousands of dollars for an article. And I was very happy to hear that that journal of open source software said that they don't charge either the alter and also not the reader. So that's great. So we're kind of in this in this weird hybrid commodify acts as a page like this, just Googling, hitting this page that I as a reader, I don't understand why for four articles, why there's like a green button says open access. I can download it. Then there's one that I can pay. It's not even tied to the to the identity of the journal anymore. So the question on whether we have solved the problem or not, I think there's still a space where improvements can be made. And one example, and I took the screenshot a couple of days ago, but apparently now there's there's already, I think, hundreds of results in bio archive, which is an example of pre-print server where that preprint culture in some academic disciplines is really catching on. But the same as the comments of the question of these people before of yeah, I wish we could do longer reviews to get to a higher quality. Is that I just saw a tweet from this afternoon is that the people from bio archive, they've actually now added an extra Bhanot as a warning because so many people are now even journalists are now going to these Korona reprints. And they had to add this extra banner to say, OK, it's nice that it's appeared so fast, but please watch out. This is a preprint that isn't reviewed by somebody else. So even though it's a positive initiative by the archive itself, could be abused by people that say, hey, I have proven that this homeopathic therapy works against five people that I treated with the coronavirus. I mean, it could be up there today. Who knows? One angle on this and a problem is the embargo problem that many publishers who published today have 12 months or up to 24 months of embargo. And there's a big initiative driven by the scientific funders called Plumpness, which really wants to bring that embargo to zero for the work that they fund. So that's a really encouraging trend. I don't know what the exact timeline will be, but let's be optimistic then in the next five years from all of the funds that are implements that we will see a zero embargo dates. So the next step is that I will bring this to kind of my world and to show you a little bit about the role of institutions and what they do with their institutional repositories in this field. By the way, I've heard the word repository mentioned so many different times today, all with different definitions, is that I will try to say institutional repository. And with and what I exactly mean by that is, for example, in the example of student theses is that a very famous repository, Apollo from Cambridge. And when they decided to put Stephen Hawking is up there, I think it's last year or two years ago, they got so many downloads that their D space repository wasn't able to survive the loads those first days. But today, if you go to it, you can you can access the pieces there together with the supplementary files. So this might have different colors. This is typical this space repository where you can find the publications and the associated metadata accessible for anyone and it doesn't stop at Accademia. So if you follow the news, I think it was last week, you have the annual recurring report from Oxfam who report on inequality and they as well, even though they it's a different model where they read the papers without any involvement from from publishing a review. They also put those PDF and their metadata into into this space, a repository. It also doesn't stop at PDF or text based materials, an example from the University of Exeter. I could open it up, but if you click that little view more button than you can scroll for half a minute down, because this is actually one item that has 20 terabytes of data all packaged up into zip files of 17 gigabytes. So if you put some storage solution under there, there is actually no theoretical limit as to how many how much data you can refer, of course, to upload challenge. Getting it in there or the download is another story. But the 17 gigabyte packages seem to seem to work pretty well. So the three examples before were all situations where the materials that are there openly and accessible for download, but unfortunately, because, of course, the academic privilege linked to high impact journals, it's still the case that the libraries who operate these repositories are still tied to the situation that their staff can also publish in in journals that have embargoes. So, for example, 8H says that in Germany they tend to submit the items and the PDF as soon as possible. And you can see the embargo dates that are on there. So it's not just so you can go in your Google calendar and say, aha, in November 2020, I will go back to the repository and I will get the paper. No, if you click on the download button, you get to request the copy feature and request. A copy feature is a technical feature around a privilege in academia that you as an individual researcher, wherever you have published, you can always share your work in a peer to peer fashion with your with your peers. So even though it's now a little bit more automated, that's OK. I can say I want access for a reason, X or Y. It still requires an individual to say, yes, you get you get this copy. But it's a very, very effective way to just to just make sure that the metadata and the title and everything gets out there and that if you press the button, wait a few days, you have a copy anyway, even though there's an embargo until November. So I've showed you different repositories in there and there, you adults, and the best thing is that you, as an interested user, you don't need to know these sites, you don't even need to know that they exist, because in the development of the D space project and also all the repositories out there, we have very good relations with Google scholar who does who gives us a very hard job actually, to keep our systems up for them, constantly hitting them with crawler traffic. But that's, of course, to make sure that they can reference the both the metadata as well as the direct PDF links in Google Scholar. So if the system completely works, you just get the PDF straight from Google Scholar, completely oblivious that there is such a thing as institutional repositories that do all of this for you. So a little bit of history on the space project and one of the reasons that I tried to submit to this to this day today is that. Ms. You just that it's back to normal, if it works, you shouldn't touch it, but I need to touch it to advance the slides is that as a community of over 12 years old, some of the original computers have moved on? It's like any project. Everybody needs developers and contributors. So if you have some experience in Java or in Angular now and if you hear something that is of interest, you're always welcome to to join. But the reason why this space today is the most successful institutional repository platform is, in my opinion, because I don't know, maybe it was all just pure luck is the fact that it had localization support very early in the process. So today we have there's a fork in China called sea space. There is a whole range of installed repositories in Taiwan based of that localization support and of course, the MIT licensing that that allows anybody to just do whatever they want with with the software. I'm going to go a little bit faster because I see that I only have five minutes left. I don't know what all that time went on, but to talk about the display seven version and where we're going is and also maybe as a learning of a project that is 10 years further and a little bit in feedback to ask a question like, yeah, how how do we tackle with evolution of the platform and what do we allow is that there used to be a tradition of time based releases in this space where basically any contributor could contribute or suggest something before a feature freeze deadline. And if it didn't break other features and if it was generally perceived as a good idea or not being in the way of somebody, any contribution could just get in there. So we started off in 2002 with one UI until Texas A&M University in the US decided, well, JSP Why? It's kind of getting old technology by now and we want to have something that can make can give specific look and feel on specific collections. So they came up with a new UI and it got accepted into the mainline. But the problem at that point was there was not 100 percent feature parity with the old UI. So it had interesting features, but it was not convincing enough for the entire community to say, despite all of our customizations in the trash. So the bad thing that well, the bad thing, it's it's bad or good depends on how you look at it, is that since that point, the D space project has actually split its community into two XML UI community and the UI community. And I personally believe that's a bad thing because two groups of people that deal with the same problem space have been working on creating the same features or dealing with the same problems introduced to two different ways. So that's why the new attempt on unifying these is also on a technology that is more modern and geared towards the future, is that everybody was really on the same page of we really want to create something that can replace the two of them for the entire community and not just create a third UI where we will have our community split up into three different different groups. So that's also the reason why this has been a quite lengthy process. If you see the timeline and that we went in twenty sixteen through a formal UI prototyping challenge where stakeholders in the community could say, we're fans of this technology, here are the use cases that we can build and demonstrate. And then we we as a as a company, we contributed a prototype on. No, but we we joined in the discourse with the community and we all aligned together on another technology. So we threw away our own prototype in Denver and we aligned on the fact that we were all going to build this on angular display seven together. So right now we're already on more code chametz than some of the previous major releases combined. And we are really in this final stretch to to get to the end. And there is one video here that I if it works, that I can just show you a bit of the search feature. Find out how to put it full screen. Is basically like the whole approach towards far-Sighted searching in metadata is made a lot more responsive, thanks to the fact that with with ANGULAR we can break it up into all of these individual components and that they can get their data individually instead of always relying on more page loads to to bring this data in. And maybe as a final thing and hooking on to the story of the neuroscience presentations in the morning, is that one big evolution that we're bringing here is that instead of the rigid model of having metadata and a few objects that represent a paper or a data set display, seven will also have rich entities to kind of make an entity model of anything that makes sense in your context. So when we think about that data Ladd's that we saw this morning where it's like, OK, we want data sets and files or some of these other neuro examples where it was a more intricate structure is that will be able to build that whole representation of entities in this Space seven, which is something that we're very excited about. So thank you very much for your attention today. I want to talk briefly about the implications for open access. Because you the law and this is a very good one, called the Berlin Declaration, um, that you had Stephen Hawkins, I think this is the. They describe this open access, and I wrote to them and complained that the copyright center was free to download but nothing else, and they have applied very politely and quickly and said, here is it. Here is a pretty good definition of open access. Particularly right now. So the question is what what definitions or the many definitions that are currently going around around open access, how we can get around that problem and around a different understanding? And the major issue that I see with this is that actually some of the traditional publishers have co-opted the term of open access. And the fashion is that every year you add a new color or a new kind of substance in front of it. So you have green open access gold, then you have diamond open access and whatever. And it's sometimes I feel like it's intentionally, intentionally made to to make it confusing and to make to put commercial offerings in there that that seem like unique or a twist on some things. But it's really a different thing. What do you say a result is free that you can just as a result or where you really have, for example, text and data mining rights or some kind of reproducibility or just that you that you can do more than just with, except for only the basic readership. So I'm not sure if we will end up in a situation where there will be one agreed definition of open access. Maybe you just have to find a very original color. You can say this is being open access dot org. You make a website and you say this is what it now means. And if you get enough traction, maybe pink open access is a future. Know, um, do you have a future in the space station to, um, to actually expose the real graphic API for typically research like websites where they want to provide to people, feed of their research researchers? That. Yeah, the question is whether DP seven will have APIs, so bibliographies can be easily integrated in other other websites. Absolutely. So in past versions, we already had a rest API that allowed this kind of behavior. So getting the publications by collection or community or others. But we really decided that we want to eat our own dog food and display seven has to be the new UI has to be a primary consumer of the rest API. So in this Space seven, we are really exposing all of the business logic, including workflows, everything that you can imagine, statistics on the on the downloads and everything. Everything is now also in the API. So in the rest API. So everything that the rest API can do as an application, you could also kind of transfer to or other applications by calling the API yourself. It's a really simple question, and if you haven't already, I apologize, but one of the things that. I have absolutely no idea why the software is called this space, it's just the name, but the funny, if you really go back in history, there's another package coming out of the University of Southampton called Imprint's. So in the beginning, it brings in this space where the two competitors and it's actually the same software developer and now his name just escapes me. The same person was at at the start of both of them rope. I will add it to my page on. Thank you very much.