You cannot compare the huge data flows in Javascript. Or can you? 🤔

News Fetcher7 hours ago

0 0 5 minutes read

I know, it seems impossible. JavaScript is not a very performance language, most of it because of its single model. So, if I tell you that we will compare 1.8 million votes in the fictional city elections, directly in the browser, you may think I am crazy. The browser will freeze and break down – every time. guess what? You will be right. This is the result with a very popular teams library:

The application show of the application that tries to compare 1.8 million votes on elections.

It is clear that it is not so amazing time. But with correct techniques – active memory management, broadcasting structure, web workers, and payment updates – you can do this. This is the evidence:

The application show of the application that compares 1.8 million votes to elections.

As you can see, even with 1.8 million being treated in actual time, the user interface is completely responded. More importantly, by injecting small data payments over time, we have turned Wait for that Delay Watch it talk expertise.

Well, now let’s dive into the process of making.

Challenges

We have three challenges to solve it.

Achieving the best performance.
Dealing with different input formats.
It is easy for developers to use them.

performance

The first thing is to avoid blocking the main thread at any cost. So we need an unexplained symbol, a factor to deal with heavy lifting, and linear complexity on me)Which means that the implementation time grows in proportion to the input volume. In addition, effective memory management is very important.

Effective memory management requires memory editing once you process some data, as well as gradually sending data in parts to reduce user interface updates. For example, instead of sending a million different people in a row, we can send groups of a thousand objects. This would significantly reduce the number of DOM updates.

Entry format

The second challenge is to deal with a different type of input formats. We need to be multi -use. We must accept the sects of objects and JSON files and even data flows. So before starting the difference, we need to turn everything into one format: a readable stream.

Developer

Finally, we need to provide a great developed experience. The function should be easy to use, easy to customize. Basically, the user will be able to give two lists to our job. We will compare them, and gradually send the result.

The simplest way to do this is to expose the event listener, with three types of events: ondataand onerrorand onfinish. Easy.

Our StreamlistDif function and events listener

the ondata The event will receive a group of creature teams, called A. chunk. DIFF should be clear, with previous value, current value, index grave, condition – equal, updated, transfer, or deleted.

Because I know most of you love Typescript, we will also get automatic completion. Ice on the cake, users will be able to select the options to improve the output. Let’s see how the algorithm works.

Example use with options and writing

The algorithm roles

For the sake of simplicity, the code that we pass through comes from streamListDiff job @donedeal0/superdiff library.

Let’s take a look at the main job. It takes two comparison lists, a shared key in all objects, such as id For exemple, to match the objects between the two lists, and some options to improve the output.

In the body’s body, we can see that it is repeating a listener before starting Def. The trick is to operate the logic block simultaneously. Basically, the event episode will implement all the synchronous code first, and only then begin real work.

Streamlistdif: Our main job

Next, we convert our list into readable flows, using different ways for each input type (array, file, etc.).

Convert files or matrix inputs into readable flows

Once we have two righteous flows, we repeat each of them in parallel thanks Promise.all(). In every repetition, we do two things: First, we verify whether the object is valid – if not, we publish a mistake. Second, we check whether the object that has a similar reference feature is already present in the temporary store of data in the other list.

Repeating on the tables

What is the temporary store of data? There are two temporary stores, one for each list. The idea is to store unparalleled objects in hashmap so that the other menu, which is analyzed at the same moment, can verify in actual time if there is an identical to its current object. This avoids doing two complete repeat, without any results and high memory consumption, before starting the real difference. We do not lose time, and we are effective.

We use hashmap to store unparalleled objects because they support any kind of value as switches, which are very performance and provide chimneys outside the box.

Simultaneous inserts and retrieval matching inputs

A long short story, if there is a matching in the temporary store, we immediately remove it to free memory and compare it to the current object. Once we have a DIFF object, we immediately send it to the user. If we cannot do comparison, we are inserting the object into the relevant temporary store, waiting for the comparison. Next, we simply repeat each temporary store and treat the remaining objects in the same way.

Sending the existing teams to the user

Now, you may wonder why we have not repeated one on the first list, and find it matching in the second list. Well, if you have a million objects, a find() The search will be very ineffective, because it will lead to a large number of repetitions. But the temporary method of data allows us to recover the match with the cost of zero performance thanks has() road.

The map () is more efficient given our restrictions

Earlier, I said we are sending every object immediately to the user. This is partly true. Imagine that the user receives one million consecutive objects – can carry the main thread. What we do instead, is to store the object difference in another temporary store. Once this temporary store reaches a certain size, say a thousand objects, we clean it and send one set of data to the user.

To do this, we use the closure. Let’s look at outputDiffChunk job. It contains a bench to store the difference and return two functions: handleDiffChunk and releaseLastChunks. handleDiffChunk He receives a difference object and adds it to the temporary store if it is not yet full. If it is full, it sends the batch to the user. Because we use one form of handleDiffChunkContext outputDiffChunk It is preserved, which is why we can reach the temporary store DIFF every time we tackle the difference of a new object.

Instead of sending the data one at one time, we send it in batches.

finally, releaseLastChunks It is self -perceived. Once all the differences are processed, it expels the DIFF store again and sends the remaining data to the user.

In the end, we are emitted by a finish The event, and that’s all.

Another thing

The explanatory show that we saw earlier uses a virtual list to provide huge quantities of elements in DOM, as well as benefit from requestAnimationFrame To avoid updating it often.

As a result, everything is really smooth. It seems that John Du has been elected, congratulating him!

Links

Warehouse Documents Npm

News Fetcher7 hours ago

0 0 5 minutes read

Challenges

performance

Entry format

Developer

The algorithm roles

Another thing

Links

News Fetcher

Related Articles

More pain in front of Solana? A dangerous price decrease to $ 125 waving with this support re -testing

The highest encryption for investment at now February 22 – Vana, Binaryx, The Sandbox

Best encryption for purchase now February 11 – Helium, Theet Network, Sui

8 early warning signs

Leave a Reply Cancel reply