You cannot compare the huge data flows in Javascript. Or can you? 🤔
I know, it seems impossible. JavaScript is not a very performance language, most of it because of its single model. So, if I tell you that we will compare 1.8 million votes in the fictional city elections, directly in the browser, you may think I am crazy. The browser will freeze and break down – every time. guess what? You will be right. This is the result with a very popular teams library:
It is clear that it is not so amazing time. But with correct techniques – active memory management, broadcasting structure, web workers, and payment updates – you can do this. This is the evidence:
As you can see, even with 1.8 million being treated in actual time, the user interface is completely responded. More importantly, by injecting small data payments over time, we have turned Wait for that Delay Watch it talk expertise.
Well, now let’s dive into the process of making.
Challenges
We have three challenges to solve it.
-
Achieving the best performance.
-
Dealing with different input formats.
-
It is easy for developers to use them.
performance
The first thing is to avoid blocking the main thread at any cost. So we need an unexplained symbol, a factor to deal with heavy lifting, and linear complexity on me)Which means that the implementation time grows in proportion to the input volume. In addition, effective memory management is very important.
Effective memory management requires memory editing once you process some data, as well as gradually sending data in parts to reduce user interface updates. For example, instead of sending a million different people in a row, we can send groups of a thousand objects. This would significantly reduce the number of DOM updates.
Entry format
The second challenge is to deal with a different type of input formats. We need to be multi -use. We must accept the sects of objects and JSON files and even data flows. So before starting the difference, we need to turn everything into one format: a readable stream.
Developer
Finally, we need to provide a great developed experience. The function should be easy to use, easy to customize. Basically, the user will be able to give two lists to our job. We will compare them, and gradually send the result.
The simplest way to do this is to expose the event listener, with three types of events: ondata
and onerror
and onfinish
. Easy.
the ondata
The event will receive a group of creature teams, called A. chunk
. DIFF should be clear, with previous value, current value, index grave, condition – equal, updated, transfer, or deleted.
Because I know most of you love Typescript, we will also get automatic completion. Ice on the cake, users will be able to select the options to improve the output. Let’s see how the algorithm works.
The algorithm roles
For the sake of simplicity, the code that we pass through comes from streamListDiff
job @donedeal0/superdiff
library.
Let’s take a look at the main job. It takes two comparison lists, a shared key in all objects, such as id
For exemple, to match the objects between the two lists, and some options to improve the output.
In the body’s body, we can see that it is repeating a listener before starting Def. The trick is to operate the logic block simultaneously. Basically, the event episode will implement all the synchronous code first, and only then begin real work.
Next, we convert our list into readable flows, using different ways for each input type (array, file, etc.).
Once we have two righteous flows, we repeat each of them in parallel thanks Promise.all()
. In every repetition, we do two things: First, we verify whether the object is valid – if not, we publish a mistake. Second, we check whether the object that has a similar reference feature is already present in the temporary store of data in the other list.
What is the temporary store of data? There are two temporary stores, one for each list. The idea is to store unparalleled objects in hashmap so that the other menu, which is analyzed at the same moment, can verify in actual time if there is an identical to its current object. This avoids doing two complete repeat, without any results and high memory consumption, before starting the real difference. We do not lose time, and we are effective.
We use hashmap to store unparalleled objects because they support any kind of value as switches, which are very performance and provide chimneys outside the box.
A long short story, if there is a matching in the temporary store, we immediately remove it to free memory and compare it to the current object. Once we have a DIFF object, we immediately send it to the user. If we cannot do comparison, we are inserting the object into the relevant temporary store, waiting for the comparison. Next, we simply repeat each temporary store and treat the remaining objects in the same way.
Now, you may wonder why we have not repeated one on the first list, and find it matching in the second list. Well, if you have a million objects, a find()
The search will be very ineffective, because it will lead to a large number of repetitions. But the temporary method of data allows us to recover the match with the cost of zero performance thanks has()
road.
Earlier, I said we are sending every object immediately to the user. This is partly true. Imagine that the user receives one million consecutive objects – can carry the main thread. What we do instead, is to store the object difference in another temporary store. Once this temporary store reaches a certain size, say a thousand objects, we clean it and send one set of data to the user.
To do this, we use the closure. Let’s look at outputDiffChunk
job. It contains a bench to store the difference and return two functions: handleDiffChunk
and releaseLastChunks
. handleDiffChunk
He receives a difference object and adds it to the temporary store if it is not yet full. If it is full, it sends the batch to the user. Because we use one form of handleDiffChunk
Context outputDiffChunk
It is preserved, which is why we can reach the temporary store DIFF every time we tackle the difference of a new object.
finally, releaseLastChunks
It is self -perceived. Once all the differences are processed, it expels the DIFF store again and sends the remaining data to the user.
In the end, we are emitted by a finish
The event, and that’s all.
Another thing
The explanatory show that we saw earlier uses a virtual list to provide huge quantities of elements in DOM, as well as benefit from requestAnimationFrame
To avoid updating it often.
As a result, everything is really smooth. It seems that John Du has been elected, congratulating him!
Links
Warehouse Documents Npm