by Maurizio Cimadamore
At: FOSDEM 2020 https://video.fosdem.org/2020/H.1302/bytebuffers.webm
❮p❯Abstract: Direct buffers are, to date, the only way to access foreign, off-heap memory. Despite their success, direct buffers suffer from some limitations --- stateful-ness, lack of addressing space, non-deterministic deallocation to name a few --- which makes them a less-than-ideal choice under certain workloads. In this talk we paint the path to the future: a safe, supported and efficient foreign memory access API for Java. By providing a more targeted solution to the problem of accessing foreign memory, not only developers will be freed by the above limitations - but they will also enjoy improved performances, as the new API is designed from the ground-up with JIT optimizations in mind - and all without sacrificing memory access safety.❮/p❯
Room: H.1302 (Depage) Scheduled start: 2020-02-01 11:20:00
This talk is about the Memory Access API, which is a new API, which we added as an incrementing API to Java 14 just as the game was coming crashing down. So basically on the very last name and the title of this talk is a deliberately inflammatory. I hope that by the end of this talk, I will have convinced you that they're all of their new memory ocracy. API is not to completely replace that by Baffour API. You can decide whether you want to use the new API with my buffer's or if you want to completely replace white buffer usage with the new memory API. So it's up to you. Hopefully you will ever get another tool in the toolbox that will help your work. So usual disclaimer. Don't believe a word they say. So there are a number of situations as to why people may want to reach for offline memory. Probably the primary one is to avoid all the costs associated with. Do you see? Now we have Shenandoah. We have ZG seas or we have much better. Do you see that we did in the past. Still, there are cases where, for example, when you want to do aerial time application, you may just want to entirely avoid GC poses. There are also other circumstances where using off memory may be necessary. For example, when you want to share memory across multiple processes or when you want to share memory with a native library. So is not an accident that we landed on the API when we were working on Project Panama, which is as Marshall before, all about kind of native interop. The Java de facto API for using this kind of ofri boxes is the Bay Buffer API. There are also other API as he'd done in the JDK. Com means gun safe is some Miskin. Safe is one of those. You can use that if you want. It's fast, but it's unsafe. So if the VM come crashing down, it's your fault. What about my buffer's? Well, my mother where I did in Java one point four. So they were part of the big push towards maffia oriented input output. They are each and stateful API. And I mean, one of the main driver of my buffer was to make it simple for you to write a pneumatic IO code so it does all the state internally to allow you to prevent the buffer overruns and on the runs helps you with things like charter setting, coding and the coding. And my buffer, crucially, can be allocated both on the Java. Yep. But also off the job. So you can actually allocate a slice of memory and the specific with a mike buffer is like a very typical example of my buffer used. If you want to read the contents of a file channel into a buffer and then we want to do a loop to read all the characters that we've read from the buffer. So when we allocate the buffer, the buffer will be empty at the beginning. Two notable things that we have quite a bit of variables here. There is a position which is initially set to zero, and then there is a capacity which is essentially how big this buffer is. In this case, it's 10 bytes and then there is a limit, which is another mutable part of the state of the mike buffer, which will be initially set to deliver to the capacity. The first thing we have to do is, well, we have to read some stuff from the channel, which means we are actually writing into the buffer. So here we have some characters and now we have to start reading them into it. In our application. So the first thing that we have to do is to flip the buffer from writing mode into reading mode, which means the position here will be set to zero and the limit will be set to the maximum, basically two to two to the position after the last character that's being read. So I can start doing my loop and read all the characters one by one until eventually will end up in a state where the position is identical to the limit. In this case, the predicate is remaining where the term falls, which means I will go out of my loop and then I have to get ready for yet another read from the file channel. So you have to call clear. And what does Clear do? Well, it will basically roset at the state of the mike buffer to the initial state. So position will go back to zero. The limit will go back to the capacity value. And so I can do another iteration. And that's basically how you work with buffers. Of course, if you want to do is a buffer that is allocated only if you wanted to use a buffer of heap. You just change a single line of the code here. Use the allocate their act instead of using the allocate method. So this is called a direct buffer and is associated with OFFIE. So with direct buffer, we actually have a new weapon as developers because we can write code that allows us to access off with memory, access to it, with memory, with my buffer is quite efficient because at the end of the day, by Buffer, it is implemented on top of unsafe. So we can still get advantage of all this issue that a movement intrinsics that we have the access is also safe because as we've seen the byte buffer of all these concept of capacity limit position. So every access will be within the boundaries of the bio buffer, otherwise we would get an exception. But how good are by Buffer? If we want to. Right. General can offer heap programs. Well, let's try to look at some numbers right here. I have a benchmark which is essentially allocating as labo of memory. Four hundred lights and then is setting one things inside. Does a big slab of memory the benchmark as being kind of cherry pick? Because I think this benchmark is characteristic of what happens a lot when you do native interop, which is something that we can't a lot of when we do Pan Am I. So you're looking at a small buffer of memory. You feel it and then you have to pass a pointer to this memory, to maybe some native function, and then you have to free the memory after the function returns. So this is maybe not use case that comes up a lot when you're doing IO, but this is something that you do quite typically when working with native libraries. So if you use unsafe, you get the seven throughputs or nine operation per microsecond, that's fine. Let's try to replace this code by using the byte buffer API, which are Superheat API. We can see that the throughput is almost NYNEX lower compared to unsafe. This is due to the extra safety that the Bay Buffer API provides, but is also due to a number of extra factors here. We can see that there are at least two factors that are hindering performances. The first is that I'm using their relative position in scheme. So I do a put end and I'm basically relying on that mutable position field. That will be incremented on every axis. And that slows things down a little bit. But the second the most important thing is that every Ebeye buffer has to be register with the GC cleaner for their offie memory to be allocated after we can prove that they're by Bulford is no longer referenced by anything in our application. And basically the agency has to do a lot of work here and these work shows up in the benchmark. In fact, if we change the map, the benchmark a little bit first to use the absolute Putina method. And secondly, more importantly, to use the unsafe in what cleaner method would actually allows us to free the memory explicitly without relying on the DC cleaner. We see that the performance rise a little bit is not as fast as unsafe, but there's a little bit better to be fair here. This benchmark. And to my buffering general, unsafe dozen zero memory by buffer do zero memory. I'm not allocating a very big chunk of memory here. So Ziering is not affecting performances too much. But still there is an extra cost here when using my buffer. And let's look at what happens in memory so we don't say four, cause that you see is basically not working all all the access. I don't say that out of people. And that's basically what you would expect. But with the first byte back, for example, that we've rolled. So just the naive byte buffer usage, did you see was actually spinning for five seconds during this benchmark, which is quite a lot considering that you wanted to use offie memory to get rid of DGC in the first place with the third benchmark. So these new version things get a little bit more under control. And did you see time goes back to zero, but still the performance is not quite as good as it could be. And the biggest problem here is that by buffer allocate direct, that first invocation is quite avy because they might buffer has to be registered with a cleaner even if we are not using the Keena in any way. And also, there is quite a complex state in order to track how much of a memory we are using, there is a limit. So there are a couple of atomic instructions in order to check whether we are allocating too much. And so this is quite expensive and it shows up in this particular allocation, intensive benchmark. So where does this leave the by Barford API? These are bad API. No, he's not a bad API. It's just that here we are. I think we are trying to use it in a way that was in the way in which way Buffer was meant. We're meant to be use at the beginning. Derek Mafa work very well. If you allocate, for example, a very big bank buffer and then you keep sharing it. And also because the cost of your operation typically dominates every other cost. All the stuff that they show you before doesn't really matter. Right. Unfortunately, though, by Dhafer fail to scale when considering kind of more general cases because you have no way to deterministically or at least the memory. So you're basically either relying on did you see or you use some unsafe operation in order. But but you still pay a lot upfront in order to locate the buffer. And then you then you have the two gigabyte limit, which is starting to hour, especially now that we have support for mapping, persist the memory files. So a persistent memory can probably be a little bit bigger than two gigabytes. And we have no way to access it using the my buffer API, because all the indices that we can specify are essentially. And then there is there limitation with the expressiveness of these API when it comes to accessing the memory, because you can either choose between sequential access. So but essentially one into the time or another absolute address scheme where I have to pass the offset all the time. There was no support for structural access. So if I have a struct in memory there, no way for me to say I want to access that particular field. I have to work offset manually in order to get to these on that location into memory. So we think that rather than investing more on the buyback API, the time has actually come to be a linear memory API from the ground up. And of course, these new API would be interoperable with Y Buffer, so you don't have to throw away all your code. But as I was discussing with Paul last week, we think that might buffer to reach their functional capacity, that some of these limitations, such as the Chugai Limits or their deterministic location, are very out to fix in the carbon byte buffer API. It will require a pretty big redesign of the entire API, which is probably not going to be very compatible. So it's broad benefits to start from scratch and to design a new API. This is what happens when my buffer fails to meet the expectations or Nettie's a big client of my buffer. It allocates a lot of my buffer and starting from version four, they are all in their own version of bi buffer called by both no pun intended. And this is based on a different allocation scheme. So they have also specialized allocator, which are users memories. So each location pool and it's essentially RJD Marlock implementation Braken in Java. And with these they were able to get a lot more scalability out of their buffer infrastructure. And this is unfortunately something we cannot support in Java today. So people have to reach out for different abstraction. So we'd like that code to come back to the JDK eventually or levels that slope. So enter the Memory Act S.P.I to a new API. It's a safe API. So the goal here and we will see that later even more is absolutely no VM crashes. So you shall never get FBN cash while trying to access of a memory using this API. It's as safe as my buffer. There are three key obstruction. The first is called memory segment, which is just a region of memory, contiguous bytes. Somewhere they can be on or off heap. The API is actually neutral as to whether the bites are stored and then we have addresses which are essentially offsets into segments. So as you can think of it, just as along this points to some location inside the segment. And then we have memory layout, which are optional description of the contents of memory. You can decide to use them or you can just decide to ignore them. But we will see what are the advantages of attaching a memory lay out to a segment. The if you look to the in the javadoc of this API, you will find no method called gatin or Putin. Nothing of the kind. And we have received some question when the request for review went out. It is not an omission because we forgot about them. It's because there are plenty of ways in order to get the data off a memory segment. You can, for example, take a memory segment and map it. It might buffer and then you can use the old goodbye buffer API to get the input and flow to the lungs and never look back at segments ever again. Or if you are if you want to reach lower level and you want to go down there, Brandle Rabbit, all you can actually create and those that are able to the reference memory. Using memory addresses. And this is actually a good option that we'll explore in a final part of this talk. So the main idea here is that we don't need to reinvent the wheel. We don't need to have a lot of access or for our memory, we can just leverage the good API that we already have. This is what the segment looks like. Let's imagine that we have an array of struct points where a point is to enter coordinates. A very simple thing. So we can imagine these are a memory to be flattened so that all the coordinates are essentially consecutive. So extra you're always you op2 X for Y for this segment. We'll have a natural spatial bone. So we will start from our base address, which is the which will point to zero. And then we will have a limited address which is the maximum under associated with the segment actually is the address of the first bite. That is outside the segment. So as long as the access occurs within the segment, everything is fine. If I have an address, I can add an offset to it and obtain a new address. So, for example, if I add sixteen to the two debates address, I will obtain a new address that, instead of pointing to a zero, would point to extra. And if I have a segment, I can also slice it. This is similar to the lighter operation that by Buffer also provides. So I can specify a new start address and a new length. And they will get the subsegment which will be contained into the original segment. So nothing too fancy here. The main thing me maybe to notice here is that this API is immutable. So there are not none of these bounds that you see here are immutable. You, whenever a new address is created, you actually create a new instance with a new offset. So nothing will actually mutate in memory, which will hopefully enable for better secure optimization in the future. The goal of the big goal of this segment to be our API and the big bet is that we want the API is able to do deterministic location, which means whenever you are sure that your memory is no longer going to be used, you should be able to explicitly freed. And the way this is done is that you essentially have a segment. You use it. And when you're done, you close the segment, of course, with power console's possibility. If you forget to close your segment, then the segment goes out of scope. You have a memory leak because now we have some memory of feet and that is not being clear to add. With that, the memory segments implement Delta Closable Interface, or you can use a memory segment with a try with our source construct. Hopefully that will reduce the occurrences where these leaks will occur. Other things we could do in order to improve on these is to do something similar to what NATHA has done, which is to add debugging kind of mode where we actually registered our cleaner and we keep track of when a segment goes out of scope and the closed method is not being called. So the way you work with segment, as I said, you don't need to do a lot, an awful lot. You can just allocate your segment of the right size. So here, if you want to allocate a segment that is big enough to contain our obstructive point array. We have to do a little bit of computation. There are four bytes for each team. There are two ins for each point. And then there are five points in the array. And then I can just drive by Buffer from the segment and then pretend that the segment doesn't even exist. We just use the BI for API to put Dean for the X coordinate and the Y coordinates into a loop. And then at the end of the try with the resources, when I close the bray's a clue, a close operation will happen. The memory associated with the segment will actually be released. So they gain anything by doing this round trip between memory segments and by buffer. Well, actually gain quite a bit because I got rid of that expensive allocate direct operation. They might buffer that we are creating now is just a view of the memory of the segment. So it's a much cheaper operation to do. And we also have a deterministic location at the end. So we no longer have to rely on the garbage collector to go in and free the memory. We can actually see when the memory needs to be freed. And so if you write this code, you will get the same performance is that you could get with the benchmark. They show you last. That was using an unsafe method to clean the memory, actually. This should go even faster because you are paying a lot less for the first allocation of memory with the memory segment of you. There are problems in this goal, though, not going lie. For example, we have to compute the size of the memory. We want to look at Manoly and then there are all these offset, then the constant spread all over the code. And this is very fragile. If I change, for example, the coordinates from into B lungs, for example, probably I'm 64 machine. This example is no longer going to work. So how can we make this code a little bit more robust? Well, our idea was to introduce a to introduce an abstraction called memory layout. And the goal of this abstraction is actually to be able to replace that comment showing the previous lives or this thing at the top with an actual object creation. So you can actually create an object which specify what is the layout of this array of structure. The advantage of doing that is that once you have an object, you can derive all sorts of important information out of it. So, for example, how big is your layout? Without the alignment of some of the components in the layout? And since layout can compose so you can basically Nespoli out inside out of the layout. You can use layout parts to ask tricky questions, such as what is the offset of the field? Why inside a point which will be normally been an entry and on. Now you can actually ask the API for that and you can imagine that when working with more complex tracts, they will. This will actually be useful. So the big bet here is that maybe more declarative code, there will be less places for bikes to light. So this is how we model their point struct using a layout. So, of course, we have to start from the outside. We create a sequence layout. We call it sequence layout. You have to specify a size, which is five in this case, and then you have to specify what goes inside the sequence. In this case, we have a struct. So we just call a memory layout of struct. And then we have to specify what are the field of the stack. And there are two ins, 30 to be some, assuming the entire map to 32 bit value. And I can even attach names to the field so that then I can actually perform queries on on this particular layout. So here I'm done a little bit of simplification in reality, if you try it out on the Java 14 code. The constructor for the value bits will also take an indigenous because of course, we have to specify whether you are Biggenden or little engine, but it didn't fit in this line. So that's kind of what it is. So let's say that we have these beagley out here, which represents my array, and I want to compute the offset of the Y field inside a point. How can I do that? Well, there is an NDA method inside our layout object, which is called offset without too much fantasy, maybe. You have to pass a path that enables the method to find the field that you want to get the offset for starting from the outer layout. So here we are starting from the sequence layout. The first thing I do is to choose an element of the sequence. Let's say that we pick the element number zero because that's the one with the least offset. And then inside the sequence. Now I have to choose which of the two field structures I want for computing the offset. In this case, I wonder why. And a field. And so by doing deals, I was able to specify a path from the sequence down to the Y field. And now I can ask for the offset. So as you can see, I, I've been able to obtain the offset without writing any number. I'm just essentially querying the API. And that's exactly what we are doing now in this sort. An example, we got rid of the comment at the beginning of the example that we had before, and we replace it with an actual memory layout instantiations. Now we have an object that describes the layout of the scene, of the things that we are going to work on. And then I I'm able in the middle part of this lie to use the layout that I constant, such as how big is a point or what is the opposite of the Y field inside destruct point. And then I can use all this stuff inside my loop to get rid of all the numeric constants. And most importantly, I can use the layout directly into the allocate core for the memory segment, which means I don't have to write the size my end. I would just delegate to the layout API to do the right thing. So this is much easier to read. We gain quite a lot in terms of expressiveness. There is still a little bit. There is still something I don't quite like. Which is if we go inside the loop, we see that there are two calls to the byte buffer Putin's method. It's pretty out of looking at this code that one call is meant to set Langfield of this track. The other call is meant to set a different feel. This right? It's only the opposite computation that kind of gives away that information. So that's yet another place where bugs. Can I. So can we do a little bit better? Can we improve a little bit more on that? And the idea that we got maybe a crazy idea was to introduce a new breed of Randle's called memory access. And those for those of you who are not able or for me, sorry, not not familiar with Debye, with their valuable API, you can think of our endl as our Java language reflect field on steroids in the sense that how a reflective field gives you access to Java fields of our land or gives you also access to effective access to Java Field, but also to more variables such as the array elements or mike buffer elements. So it kind of felt natural to also provide a new brand. And though that was also able to give you access to, for example, off memory by taking as a core the entire memory address, the big gain that you get with these API is that, number one, you get all the atomic cooperation that the veranda API support. So let's say that the byte buffer API is not enough. You you you're not good enough. We just missing see a simple geth. You want an atomic geth or something like that. Oh, you want memory pantsing because you're working with multiple trance. Then you need to reach for the veranda API. That's that's probably the best API to, to do this kind of stuff with. And the second one's point is that if you are using memory layout, you don't have to do anything particularly fancy in order to get these vinyl. You can just ask the layout API, give me the right angle for accessing this field and you will basically get it. So if you want to see Audie's verandahs walk, there is a factory inside them and we actually API that allows you to construct a Brandos by end. Typically, you won't have to do that because as I said before, you will drive there and those from the. But let's say that you want to kind of go through the process and create the Randle's bit by bit. So when you create that memory axis, Brandle, the first thing that you have to specify is a karyotype. So you have to say to the Brandl, what is the Java type that you want to come out of, for example, or forget operation? So in this case, we want to read the value as end because the value are four bytes. So we are going to pass the class to the. End factory. And so we get a and all that. For example, if I give the base the address of the segment, you give me back the value of the X zero coordinate. Can I do more? Yes, of course. If I want, for example, to read that Y zero rather than X zero, I can combine my previous blando with an offset so I can essentially take the address that comes in, attach an extra offset to it, move it and then read that the second address. And so I can read Y zero. If I pass the base address I can get Y zero out. But can I do something more fancy like. Can I access all the Y coordinates in disarray. Actually it turns out I can. I can construct traded Brandl so I can pass in this. Right. In this case, this trial is of course the size of the point. And so I get back over endl. That takes an extra code and not just a memory address, but those. How long were the longest logical index, which is which president says? Which point I want to get the Y from. So if I do, for example, I get with an index of zero, we get way zero. If a specified one is an index, I will get y y one and so forth. Now constructing Brandos like that may be a little bit painful. So we integrated this Varnedoe machinery with the memory layout API. So as you can see the miedo sequence of this slide, we can actually derive all the Brando for accessing X and Y with two simple calls to the layout API. There is a Vendel method. You specify a karyotype. Then you specify a path down to the element that you want to access and so you can construct in two lines. I've arranged for the X element, our Hylander for the Y element. And now inside the loop, you can see that I'm using the X Varnedoe for setting the X elements and the Y of our Enzio for accessing the Y element. So now the code is more explicit. And if I change anything in the layout of there, everything will just flow. So I won't need to update this code at the bottom ever again. So let's switch gears a little bit and talk about the safety. As I said at the beginning. This is a safe API. So one of the main goal is to avoid any kind of VM crashes. It is beyond the scope of these API to avoid the kind of silly user mistakes such as writing. I mean, maybe a day reading the four byte value as a float. That's not something we want to protect you against. But for example, there are a couple of conditions that we want to protect you against, such as accessing memory out of bounds, which if the memories of Heap can result in a crash and also accessing memory after their memory is already being freed. The second problem is, in particular, not a nasty one, especially when you can see there multiple threads accessing memory at the same time, because you can have one thread doing axes in another thread. They're doing their release. So out there, do we make this safe? It's actually pretty tricky. You could lock everything, but that basically kills the performances. Instead, what we decided to do was to, by default, enforce a strong confinement model so that whenever you create a segment, the segment is confined to that particular thread you created. So only that thread has access to the memory associated with the segment. Any other thread that want to join in can, but it has to do an expenses operation called the choir is a choir we call. We create our view that is specific to that second thread and you can only close the original segment. After all, the acquired views have gone. So we still have deterministically allocation, even in the presence of multiple trades. But if you are working with multiple trades, you have to be explicit on always accessing what? So how does this translates in town performances? This was their best result because squeeze out of there by Barford API. We had to cheat a little bit by using unsafe. And this is are these are the numbers that are coming today out of their memory segment API. They are still not as good as the unsafe numbers, but they are a little bit better than bite buffer ones. The main contributor I think to for this number is the fact that the allocation got a lot less expensive compared to the buyback and advocate direct method. But there are all sorts of things. So all the balance, our final variables or C two can always them at will almost. And there is still a little bit of a difference. Also, we have to keep in mind that we are also zero in memory with the memory segment API so we can never go quite as fast as unsafe here. But we are also kind of trying to look ahead a little bit. And we are. We don't want to provide you something that looks a little bit better than by Buffer. We want to give you something that is actually more scalable, as I was mentioning at the beginning. So we are working on a different locator. Jim Lafsky is doing a bunch of work on these new locator called Quotables, The Locator. And the numbers so far and kind of teasing you a little bit are pretty impressive in the sense that I've told you so. I'm expanding the experiment to plug in. There's been a locator on the memory segment API. I was able to reach metaphor from Unseals. Then I'd say, even though I will still hearing memory. So these placatory is doing a lot of the tricks that the Natya locator probably does. But there are a couple of tricks that I think are new. So, for example, we instead of committing memory eagerly, we pretty a big bunch of memory, like four gigabytes. And then we only commit when the client requests memory and the a sales performance is quite a bit. But the big save is that we don't need a native call for doing a mile lock each time we allocate a new segment or each time we have to. FRADE Because a delegator will be able to recycle this memory segments that are being handed out and released. So this is kind of where the future of these API is kind of fading. And I think if we deliver something like this, maybe or at least hopefully some of the alternative API to wide buffer will over time disappear. Or maybe people that are going to write new API will decide to stay on the ME JDK API. How are the was it to get there? Well, I'm going to be honest, it was a little bit hard. Andrew did some benchmark of these API and found from issues that are indeed some issues. We'd fix some of the stuff on in Java 14 or 80. So for example, our support was very conservative with respect to memory barrio's. Every time there was no safe access, it will immediately add barriers after the call. Now, situ is good. It is behaving a little bit better and it's removing barriers when the access is probably off. And there's also improve the performance of their beats by buffer API. So that's actually a good result. Treb confinement checks were not very well treated basic to most of all because the thread that current thread was not being perceived as a constant Bisie tool. So we did some work bloody some work in order to to to to to fix that. And now performances are better, although we have to disable this optimization for On the Loom branch, because this, of course, creates all sorts of havoc with fiber's. But we are not done. There are a lot of issues around escape analysis. This API, as I said, is immutable. Every time you call based on the rest, you create a new instance. Every time you do add offset on an address, you create a new instance. And some time these instances go in the middle and perturb some of the situ optimization that in situ is not so able to can see through some of the allocations. There is also another problem, and this is probably the main problem with this API API accept long as this is and this is good because it gives more room for this API to grow. But at the same time we are running into some bottlenecks with CGU in the sense that we choose optimize to remove bounce checks on looks that work on end. So as soon as you step out of things, you are into any trouble and double doublecheck. Elimination or longer work loops are no longer our role. You don't get vectorized on any of that. So right now we are doing some droits in order to try to can generate the right set of code. But we think that the right approach longer term will be to fix these. Big performance gap in at least the left. She chew to see whether a segment is bigger than two gigabytes or not. And if it is not, then let it revert to the logic and optimization that we are IBM. In other words, there's more work you have to be at. So I think the memory axis API is a great alternative to that by Baffour API or a great compliment. It is a fully immutable API, so it should land over time to better git optimization. There is deterministic allocation that you didn't. I would buy Buffer and that makes quite a difference in terms of G.C. load. The other thing limited. The other single scheme is not limited to two gigabytes, which also makes a difference. If you are using system memories or things like that, and there are there is a real ability for doing's tractor all access with the Vinals memory layout. So I think it's a very compelling alternative to my Bufford. If what you want to do is on, say, is of the memory axis, API is safe. It's a safer by buffer. So it's a good safe replacement to forward Vouchsafe API. There are spatial and temporal checks on every axis and there is a robust ownership model which allow you to remain safe even if you are working on multiple threads and still retained the deterministically location. So where does all this fit in the bigger Panorama picture? I'm not going to talk about JNI, but I just wanted to give you a taste of Cana where it all fits together. So as Mark showed earlier, we want to in Panama that we want to give you tools so that you can start from an end or file, do some work and derive a set of Java bindings. And these are our bindings. Initially we thought, well, maybe they are interfacing with some annotation and then there will be a runtime component that reads the and will generate some code on the fly. We actually realized that there was no need for that. We only needed two pieces. One is the memory, I mean, memory access API piece, which gives you a bunch of Brandos for accessing the memory. For example, is talked fields have particular offset and things like that. The second beat is our following function API it which we are going probably to deliver in 15 also as an incubating API, which allows you to map for and function as Mafa Bendel's. On top of these two bits, you are able to create a very low level set of bindings and then you can also build on top of those. If you want, you can add the plugins to the basic tool that we will provide and generate higher level bindings. But the lower level bindings, as Mark showed before, are not so bad that we are generating static proper. So they are relatively usable. So as I said at the beginning of this talk, this API is actually available in Java 40. So my recommendation is to kind of try it out and report back performance portals or usability issues or whatever you can find. Next steps, of course, are to round up the performance work. We know that we have to do better here and we want also to finish the work on the allocator because we think there's a lot of room for improvement there and we have to polish and penalised API right now. He's an incrementing API. There are probably methods that need to be Polish or name or whatever, and then we have to integrate these API into the overarching Panama story. So you can follow the progress on Ponomarev. What if you are more familiar with core Lib's? You can also report the issues on Carlip, so we'll be looking for both. So thank you.