Video details

"Streaming video analysis in Rust using Pravega" by Tom Kaitchuck


How does Rust hold up when developing complex real world applications?
This talk is a case study based on building a scalable real time video analysis pipeline in Rust using:
* GStreamer - a real time video processing framework * Pravega - a streaming data storage system
This talk covers:
* The challenges involved in creating a scalable video processing pipeline * How and why Pravega, GStreamer, and Rust fit together * How to take advantage of the language features of Rust to manage complexity * Patterns we learned during development
Tom Kaitchuck Dell
Tom Kaitchuck is one of the original developers who started the Pravega project and is currently a core contributor. Tom has been using Rust since 2015 and is the author of the popular aHash crate. He holds a degree in Computer Science from Valparaiso University. Currently working at Dell, Tom previously held software engineering positions at Google and Amazon. His interests include: Concurrency models, Consistency algorithms, Distributed system design, and Effects of language design.
Recorded at Strange Loop 2021


My name is Tom Keitrick. I'm a distinguished engineer at Dell, and today I'm going to be talking about streaming video analysis and rust using G Streamer and prevailed. Fraga is an open source project that I've been part of for a long time, and this is our experience using Rust. This isn't an introduction to Rust talk, but is just us sharing our experience and hopefully providing some lessons that will be useful to all of you. So let me find the problem we're trying to solve for us on this project. First, our goal was to build a streaming video service that could be deployed to a data center or to a customer, and they would be able to have many cameras going in to it and be able to do analysis on those videos in real time. So doing things like object detection seeking and have this all sort of scale up to large numbers of video streams and have good performance. Example of what that looks like is here's Nvidia's Deep Stream running on a live video stream. So this is a generic object detector and it is detecting that there are cars on this highway. This is a very basic application, but you can imagine much more sophisticated ones being built on top of architecture that supports this. To do this, we relied on an application called GStreamer, which is it's been around for a very long time. It's a pipeline based framework for chaining different processing elements related to video and audio processing. Together. The area was originally and most commonly done for client side processing work, but this is now scaled up, so it's more generic and pluggable, and there's over 2000 plugins for it. So this is sort of an ideal technology to build upon. So there have been several existing approaches that people have tried, the most obvious of which is sort of a batch architecture, right? You can imagine you write one day worth of video data into a file in S three, and then you perform analysis on it through some new job or something like that. It's a very common paradigm. The obvious trade offs of that are latency and dealing with recovery. In the case that your device that's writing that die is midway through the traditional solution to that has been to look sort of smaller size patching. So this is a natural transition that people have had from moving to Hadoop to things like Spark where you're going from large batch of a day to small batch of minute. This still has a bit of latency, which may not be an issue for many applications like 1 minute to within 1 minute of real time is probably acceptable to a lot of applications. But failure recovery is still complicated, especially if you're dealing with things pipeline and potentially out of order within that. Right. But if you also think about it, it sort of messes with your data format. One thing that we actually saw customers doing was people were encoding video by grabbing every frame, reencoding it as a JPEG, sending the JPEGs through a Kafka and then extracting them out on the other side with a time window, reordering them into the appropriate time, and then reencoding it as video. This is an awful lot of code, and it destroys your compression ratio, and it ruins what would otherwise be a nice format and dramatically escalates storage costs, because Coffee is obviously going to be storing it on what need to be high performance drives, and we'd like to be storing months, if not years worth of data. So we want something that does better than this, and we want something that's just truly a streaming model from the beginning that treats recent and historical data as contiguous, and you can move between them just by reading it. So to do that we really want to do is we want to lean on object store as our long term persistence. We don't want to try and scale up and have local storage on all these nodes because, well, cloud storage is cheaper, right? And if you're only worried about our high level data, that's what you want. But to do that through the same API is a little bit tricky. Similarly, we want to have easy recovery to just have a point in time. Oh, you failed at this point. You can resume from that point. So the notion of atomic rights is very important and to be able to declare an end to data like, I only want to retain data for one month and have that data disappear after one month while at the same time not screwing up your format. If you just had a retention policy on this three and say, I'll throw out all my data after one month, well, your format better support that or whatever you're using to do indexing. Better be aware of that. So we want something that sort of automates all of this. Fortunately, that exists already, and it's called porega. That is exactly the use case that Povega is built for, and it does all of these things very well. The basic architecture is as follows. You're writing data in, it goes to provega and read access goes through pervega, whether it be real time or batch and propellers in the background, interfacing with your long term storage, whatever that may be HDFS cloud store or whatever, and it's intermediating those calls. But providing a streaming interface and real time access and atomic rights and all that good stuff. So you just operate on the abstraction of the stream. The other advantage that Perega brings to the table is scalability. I need to explain this graph a bit. This is a test that we did just to demonstrate scalability in terms of number of writers. So here the throughput is in 2gb/second, which is very modest and something that all of Purvaya Cocktail Pulser can easily handle from a single source. What we did is we scaled up the number of sources while holding the data rate, the total data rate the same. So when we go up to ten writers, each of them is writing one tens as much data and 100 writers one one hundredths this much data. So this is just demonstrating what happens when you scale up the number of sources independent of the actual data rate. And as you can see, Pulsar literally collapses, and Kafka is barely any better. It doesn't hold up to a large number of writers concurrently. Whereas Travaga is only interested in your data rate, and if it supports it, then it can support it. And if not so, an end to end pipeline looks something like this. You have a camera that's producing some data it's extracting into a video stream. Then you run some derivative algorithm on it. In this case, we're doing an Nvidia GPU based objector who's outputting some JSON. And so we have a modified stream with bounding boxes, and then we have additional metadata that can be pulled off to the side for analysis with, like, a Flink or Spark application. Here is a demo of that. So here's, assuming this plays maybe unable to play video. Well, great. The whole point of prerecording this demo was to run exactly that. Anyway, if that played, it would be a live demo of this working through a Web UI with multiple video streams jumping back and forth between them, fast forwarding rewinding and showing object detector working and labeling things, and then piping that into a Flint job, which was allowing you to run queries against it with a SQL like syntax to produce graphs showing how many cars go by on the highway. So we wrote this in Rust, and that's sort of an interesting decision, because why would we do that? There just weren't a lot of Rust users, and none of our customers were asking for Rust. That wasn't a priority, but we still think this makes sense. Or at least I do. And I think this comes down to something that we came up in a talk a while back by Brian Cantrael, and he talks about the idea of platform values, the idea that software and languages have certain values that they bring with them, and that people who use those software are going to be attracted to those values and build on those, and they will get stronger in those dimensions. Not every piece of software is going to have all of these values. These are just ones he identified, but not every piece of software is going to have all of them, because when push comes to shove, you can right. Certain ones are higher priority for different types of applications and different types of frameworks. And naturally, those applications and frameworks are going to prioritize what's important to them. So if you look at those, he'd sort of identify of these ten that he considered Rust's values. But I would sort of group these into three sort of personas. There's this one we hear about with performance, safety and security. But I think there's also a second and third one, this interoperability and technique and robustness and then composability extensibility, expressiveness and rigor and all of those you can sort of visualize differently. The usual approach that everyone talks about, and the one that gets the most attention is this performance plus safety and security. And the reason I think that gets a lot of attention is because it's very hard to do in a lot of other languages. So this is a graphic that came from the Google Android team where they're talking about like, where do you have security exploits it's when you have unsafe code that's not sandboxed and running on an interestworthy input. But sometimes you can't change the fact that your input is untrustworthy and you don't want to have a sandbox for performance reasons. And so that leaves you in a tight spot where you may want for those same performance reasons that you don't want to put a sandbox on there to not use your C. But there aren't a lot of options that have extremely high performance if you really care about getting every last ounce of out of power out of your CPU. And this has been repeated many times. For example, Amazon's Firecracker, which is the KVM that underpins AWS Google's virtual networking infrastructure that they run in their cloud to provide each user with their own virtual network. The Facebook cryptocurrency DM Rust TLS is a clean room TLS implementation. Similarly, Trust DNS Full and DNS Stack Tor the anonymous web browsing agent. One password has their stack implemented in Rust. For this reason, Servo was a classic one that is a Chrome renderer that they wanted to have it be higher performance and add multithreading, but couldn't do it in cuz the overhead like the security risk of running something that's as complex as CSS and rendering it in C was a problem. And since they did that, Chrome and Microsoft Edge have also rewritten their CSS renderers in Rust. Tikv is very much the same story, and Wasmar is also a very interesting one that mirrors the CSS story, where here you're wanting to execute web assembly, so you have a full blown language and you want to containerize it, but you don't want to have a very high performance cost because everything is going to be going to execute on every single CPU instruction. So those are all great use cases, and if you find yourself in that boat by all means, rest is an excellent choice, but that just wasn't our choice. That wasn't our problem. Our problem was much more basic than that. We didn't have a performance issue because we were going to be network found anyway. We didn't have a security like extreme concern because, well, we're dealing with trusted code on both sides of the wire. Our problem is that we were dealing with a stateful client and to understand why you need to think about recovery, which is, if you imagine you have a client connects to a server sends data. The server can track where the client is, and when the connection goes away, it can tell the client, oh, you left off after that point, and it can resume and resend the same data. But this means that the server, the client code can't be cogent. It can't be just automatically generated from a template or something. And as soon as you say that you're going to have a client with state that you're going to manually write well, it makes sense to start offloading a lot of other things to the client, as opposed to putting them on the server, because you can do things more effectively and efficiently. And by the way, your CPU usage isn't centralized. So what we ended up with was a client design that looked like this where we're implementing a core client implementation once and then we're writing bindings for each of multiple different languages that connect to it so that we can interact with that same client library so that we don't have to write the client logic over and over again. So this is basically doing what's called FFI foreign function interfaces, and there a different set of values matter, which is the second set from Rust's personas, where you're concerned about interoperability integrity and robustness. I understand why Fi can be fairly complicated. You need to think about things like calling conventions. Alignment is the memory garbage collected in one or both languages? Can the objects that are returned contain references to each other? And when is the data freed? Is there a particular allocator that's used? What about error handling? Are there other restrictions on threading? For example, Python has a global interpreter lock that you need to hold before you execute any code. In Python. There are OS specific requirements in languages like Go, for example, and character encoding. Oftentimes different languages have very opinionated views on what a string looks like that may differ from one language to another, and so aligning these on a cross language basis is not necessarily as straightforward as it seems, and I think this is an area where Rust really shines and understand why. I think it's important to look at a method signature. If you look at C here, we have a fairly generic method. It's just right, and it's writing some bytes. But what's going on here? We have two parameters that must agree with one another, and if they don't, it's a bug, right. So that's already like one sort of oddness. And then we're encoding an error in a number in the return value. Go improves on this a bit, and memory management, of course, is just through documentation because you don't know whether the caller or the Callie is supposed to free that buffer. Go improves on this a bit by including the length into the parameters. So those are unified, having a separate error value, so it's not encoded into the normal return value. That's useful. And the memory management is done by garbage collection, which is certainly better than relying on documentation and human effort but can be difficult to work with on a cross language basis. If you've ever tried to get the Go and Python garbage collectors to cooperate with one another when handing objects back and forth between them, you'll understand why compare this to Rust, where Rust has errors that can't be ignored, like by accident. Unlike the Go code there, it has compiletime memory management, which is an interesting way to do it because it's not garbage collected, but it's also not manual in the way that C is. It has very explicit contracts. For example. Here this first parameter. It's dictating that this is safe to pass between threads and can be invoked concurrently for multiple threads. And here it's very explicit about what the semantics of the buff are, that it won't be modified, that it will not be retained or any pointers retained to it after the call returns, and it is the responsibility of the caller to free the buffer. This is amazing and exactly what you want when you're dealing with cross functional language, because you can put across language functions because you can put the specifications that you want into the method signature very explicitly and have them enforced. In fact, this is so ideal that people have automated it. Here's three frameworks. This is Pyo three, which automates calling from Python, Neon, which does the same thing for NodeJS and Helix, which does the same thing for Ruby. An example of what this looks like in your code is if you wanted to call a function from Python, you can annotate the function PY function and import Pyo three, and that function will be callable for more than Python with no extra work on your behalf. So it really is a dramatic improvement in terms of the ability to operate on a cross language basis, and this is made possible by the explicitness of the function signatures in the language. So what lessons did we learn when we were doing all of this? The first that's very interesting, I think, is thinking about ownership. Ownership is a very central concept to Rust but doesn't appear explicitly in any other language. But I would argue that it is in other languages and that it is implicitly in most other languages. So here I've written some Java on the slide here, and there's two alternative methods here for processed foods that process them in different ways. The first step in both of them is to sort the list of them, and this is done in two different ways. In one, in case it's put into a tree set, which is a sort of collection, and by adding it to it, it will implicitly sort them. And the second is by calling sort a standard library function on the list itself. Now these differ in ownership, and this concept exists in Java because these are semantically different in the following way. If you were to inside of the first function, there grab an element from the sorted set and then modify it in such a way that its comparison changed with respect to other elements. This would violate the invariance of tree set. It would render the internal ability of the data structure to function. It would just cause assertion errors or any number of other problems inside the tree set, and you wouldn't be able to predict accurately what it's going to do because there's an implicit contract on tree set, which is that when you pass items into it, that the collection now owns them. And similarly, in the process Foos example, the collections sort method is not owning the data that's passed into it. The assumption is that the caller retains ownership, and so if the implementation of collections sort would be modified in such a way that it, for example, handed the list off to another thread and then in the background called sort concurrently, that would break the invariance of the caller. And in both cases, if you weren't aware of any change that would alter the implicit assumptions about ownership of the function signature or go against them would result in an error on one side or the other. So this concept really does permeate. The next interesting thing that we encountered was that Russ just doesn't need the visitor pattern. It's all over the place in our Java code. And to understand, like, consider when you're doing request reply. It's very straightforward because you end up with a bunch of different types, and you have something that has an interface and you say, okay, well, now I need to process these, but your processing code is almost certainly type specific. So you end up writing something like this where you have this reply processor, and then each of your possible replies. You have a method that is a one liner that just calls an overloaded method on the object that was passed into it. So this is definitely not ideal, but it is what you sort of live with in Javan. Csharp rust has something which is perhaps boring to everyone in the functional programming crowd, but it's absolutely amazing for somebody coming from Java, which is that it has a NUMS where the individual elements of the nouns have member variables that are specific to them, just like an object. And this is just amazingly better because you can replace the Hello implementation up on the upper right with the Hello reply, and any fields that it has can be simply fields in the anew. And this allows you to replace this mess with something much more straightforward, which is just a match statement. The other thing I want to throw out there not because it's Rust specific, but because it's just such a great pattern, and I really want to see everyone do it everywhere, which is the notion of using the type system to enforce invariance. So here's a great example, right. Instead of having a user and a boolean on it, like whether or not the user is authenticated, we really want to have authentication be something that's validated. Right. And so a good way to do that is to embed it into the type system by separating a user in an authenticated user into separate objects, and then using visibility rules to have the one and only one way to create that user to be the authenticated user to be to call authenticate passing in a user. Another example of the exact same thing would be to do it with request an authorized request. But this pattern is a great way to ensure it happens before relationship that you can type into a method signature because you can say, oh, this method takes an authorized request. Rust definitely goes beyond this, though, and you can use the same sort of pattern where you're leveraging the type system to enforce additional invariants. This is a good example of one. Here we have a connection pool, and it has a Git connection method Unsurprisingly. But this connection object has this interesting parameter at the end. The syntax is a little obscure, but what this is doing is it's saying that this connection cannot outlive the pool that it came from? By definition, the connection must be closed before the pool is closed. You can't close the pool first and then the connection in it be left alive and then potentially error or whatever would happen otherwise. So this prevents that sort of runtime consideration from even having to be brought up because the situation is ruled out at compile time. Another way this can be done is with Ampersand mute. The classic things this is able to do is to use for exclusivity, but we found it very effective as a way to impose ordering requirements or ordering semantics. So here's an example where you have a writer object that takes a self as a mute. So when that is invoked, that right method by definition is ordered with respect to all other calls to that right method. So in the case where we want to impose ordering semantics of, say, writing to a stream, we can say, okay, well, the rights go on. The stream go in the order of the right, invocation which is by definition defined because this method signature mandated that it was semantically defined. So yeah, this can really be leveraged. Sort of. One of the classic problems we face is one that exists in basically any application that's like managing connections and multiple requests concurrently. Which is you end up with this notion where you say, okay, I want to have multiple requests going over the wire, and I want to have multiple things coming back over the wire. So you end up with this situation where you have some sort of state about what's going on that is being accessed from both sides, from both the top and below, from replies coming back from the server and from data coming down from above. And this gets very messy very quickly. It's a little bit hard to deal with so much so that people have built patterns around it. The classic example is actor frameworks, where you have a reactor pattern where you have a queue in front of it. You have an actor who's running in an event loop, and that gets a state and an update and then applies that update and does the transition. And that's a much easier model to deal with, except it has the problem that you have to buy into a framework and you have to code to a specification, and it really has a way of intruding on your code. In addition to introducing runtime overhead, what we discovered, we could do it with Rust, which was sort of remarkable is that you can do what I would call the Async reactor pattern, where you replace the cue with an Async rendezvous with an Async rendezvous channel. Rendezvous channel is a channel essentially with zero capacity where you're handing off directly from one thread to another. Now with Async await. The way it works in Rust is that futures aren't real. They aren't like a completeable future in Java or C Sharp. They are sort of done in the background via polling mechanism. So when you wait on a future, it's actually transferring as though the actual physical threat is being switched to another one. So an Asynchronous a Vu channel does nothing. It's essentially a stack switch behind the scenes. It looks like you're working with two distinct threads, but there aren't two distinct threads. It's just two distinct code paths. And if you combine that with a state, which is just an Enume, you can write essentially an actor framework in about twelve lines of code, and this is all just normal code out of the box. Nothing is fancy. You don't need a framework, and there's no overhead, right? You can just do your Async block, have a loop, pull on your channel, and then decide what you want to do based on the state and the channel. And this has defined ordering semantics. So you get all the niceties that you would with an after framework, except there's no queue, so there's no overhead associated with it. There's no state transition or locks or anything else. And because Rust features are zero cost, this all compiles away into nothing, it's as though you wrote the perfectly optimal handle every possible state code, except you didn't have to write it that way. So it's a very neat pattern. What we came to love that we didn't choose rest for, but really came to appreciate later was this third persona that I outlined, which is this composability, extensibility, expressiveness and rigor. What that resulted in is a lot of libraries and crates that were published that are just really great. They have very clear APIs. They're very explicit about exactly what they expect and how you can interoperate with them, how you can extend them, how you can work with them, and they're extremely well defined, and they're very by virtue of having such a clear definition, I've been very carefully crafted to do what they say extremely well. And so we were just like, whereas in virtually every other language I've worked in, adding a dependency was horrible and avoided at all costs because you're like, Well, maybe I'll just write it myself because I don't want to take on this other thing and be worried about it. That just isn't the case in Rust, and we have great experience with our dependencies. First and foremost of these is Certi. Certi is by far the best serialization framework I've ever worked with in any language. You define your object just as you normally would, and you put two annotations on it serialize and deserialize, and it does the rest. It can support all of those formats, and it does so out of the box and generates what the code you need on demand with no separate compilation step. And best of all, you can define your own format, which is what we did. We wrote our own format, and there was no code that was per object per format at all, which is absolutely wonderful. You can just define what it means to write an integer to the wire, and Cerdi takes care of the rest. I wish every language would emulate this because it's such a great experience. Here are some other tools we found super useful, feel free to snapshot this. I would absolutely endorse every single one of these crates. They were really useful, and I have nothing but positive things to say about them. Clippy, Rust format and docs are all built in, and I would strongly encourage everyone to run them as presuming checks because they just do everything exactly what you want. Tracing is a great way to get logging and things automatically inserted into logs snafu for error handling, et cetera. Yeah, I can't say enough about how great rest libraries are. So what we really came to love at the end was also the borrow checker, which is the thing that everyone says you're going to hate and that you're sort of geared up to fight with. But it did something that I expected, which was that we no longer had to think about thread safety. We were writing in Java. We spent 20% to 30% of our design time thinking about how do we design this to be absolutely sure it's thread safe and that we won't ever have any errors and we have to how do we structure our components such that we can easily validate that it is correct at code review time. And so that's a huge time sink, and you spend a lot of time focusing on it and thinking about it and I was really looking forward to working in Rust, where we just would not think about it. But what I didn't anticipate was not only would it save us that time, but it would change the way that we wrote code, because when you no longer have to worry about that, suddenly the things that you contorted your design around to try and make sure that it was provably correct, or that you could convince a human that it was correct just weren't there anymore because you could lean on the compiler to do that enforcement. And that allowed you to do designs that you just wouldn't in other languages, because you had more freedom, and you had more power to sort of leverage some other aspect that you were concerned about to make that better. So with all of Russ's personas, I think they're sort of about one thing. There's sort of three different ones, but I think at the end of the day, they're all really about empowerment, and that's why I love it. Thank you. Let me see if I understand the question. The question was, how do we manage the data in a distributed way such that we're trading off chunk size versus latency? Normally, suppose you had an object store and you were writing to it sort of naively and directly. It won't let you read back the data that you until you're finished writing it. So if you're writing data for an hour into a single file, then your latency is, by definition, at least an hour. Provega doesn't have that issue. You can write to Pervega and then read the data within a single digit milliseconds and have a live feed directly off of Provega. In the meantime, Pro Vega is going to batch that up into a very large object and put it into your object store. But just because it didn't arrive in your object store doesn't mean you can't read it, and it's still durable. As soon as Pervega acts it, which is within again, single digit milliseconds. So far, Vega is able to do that in very fine grained individual rights, which are guaranteed to be atomic. And then as soon as they're acknowledged back, the writer can forget about them. So the writer's buffer is only like a couple of megabytes, and that's the writer's buffer for the case of failures or something went wrong with the connection or whatever. But in general, if everything is sort of up and running fine, the end to end latency will be very low because it's it's going to be read by the reader before it's actually hitting the object store.