Защо да научаваме структури от данни и алгоритми?

В тази статия ще научим защо всеки програмист трябва да научи структури от данни и алгоритми с помощта на примери.

Тази статия е за онези, които току-що са започнали да изучават алгоритми и са се чудили колко въздействащо ще бъде повишаването на техните кариерни / програмни умения. Също така е за тези, които се чудят защо големи компании като Google, Facebook и Amazon наемат програмисти, които са изключително добри в оптимизирането на алгоритмите.

Какво представляват алгоритмите?

Неформално алгоритъмът не е нищо друго освен споменаване на стъпки за решаване на проблем. Те по същество са решение.

Например алгоритъм за решаване на проблема с факториали може да изглежда по следния начин:

Проблем: Намерете факториала на n

 Инициализиране на факт = 1 За всяка стойност v в диапазон 1 до n: Умножете факта по v факт съдържа факториал на n

Тук алгоритъмът е написан на английски. Ако беше написано на език за програмиране, вместо това бихме го извикали да кодира . Ето код за намиране на факториал на число в C ++.

 int factorial(int n) ( int fact = 1; for (int v = 1; v <= n; v++) ( fact = fact * v; ) return fact; )

Програмирането е свързано със структури от данни и алгоритми. Структурите на данни се използват за задържане на данни, докато алгоритмите се използват за решаване на проблема, като се използват тези данни.

Структурите на данни и алгоритмите (DSA) преминават през решения на стандартни проблеми в детайли и ви дават представа колко ефективно е да използвате всеки един от тях. Освен това ви научава на науката за оценка на ефективността на даден алгоритъм. Това ви позволява да изберете най-добрия от различните избори.

Използване на структури от данни и алгоритми, за да направите вашия код мащабируем

Времето е ценно.

Да предположим, че Алис и Боб се опитват да решат прост проблем за намиране на сумата от първите 10 ¹¹ естествени числа. Докато Боб пише алгоритъма, Алис го прилага, доказвайки, че е толкова просто, колкото да критикува Доналд Тръмп.

Алгоритъм (от Боб)

 Инициализирайте сума = 0 за всяко естествено число n в диапазон от 1 до 1011 (включително): добавете n към сума сума е вашият отговор

Код (от Алис)

 int findSum() ( int sum = 0; for (int v = 1; v <= 100000000000; v++) ( sum += v; ) return sum; )

Алис и Боб изпитват еуфория от себе си, че биха могли да построят нещо свое почти за нула време. Нека се промъкнем в работното им пространство и да слушаме техния разговор.

 Алис: Нека пуснем този код и да разберем сумата. Боб: Пуснах този код преди няколко минути, но той все още не показва изхода. Какво не е наред с него?

Ами сега, нещо се обърка! Компютърът е най-детерминираната машина. Връщането назад и опитът да го стартирате отново няма да помогне. Така че нека анализираме какво не е наред с този прост код.

Два от най-ценните ресурси за компютърна програма са времето и паметта .

Времето, необходимо на компютъра за изпълнение на кода, е:

 Време за изпълнение на код = брой инструкции * време за изпълнение на всяка инструкция

Броят на инструкциите зависи от кода, който сте използвали, а времето, необходимо за изпълнението на всеки код, зависи от вашата машина и компилатор.

В този случай общият брой изпълнени инструкции (да речем x) е , което еx = 1 + (10¹¹ + 1) + (10¹¹) + 1x = 2 * 10¹¹ + 3

Нека приемем, че компютърът може да изпълнява инструкции за една секунда (може да варира в зависимост от конфигурацията на машината). Времето, необходимо за стартиране над кода еy = 10⁸

 Време, необходимо за стартиране на код = x / y (повече от 16 минути)

Възможно ли е да оптимизираме алгоритъма, така че Алис и Боб да не трябва да чакат 16 минути всеки път, когато изпълняват този код?

Сигурен съм, че вече познахте правилния метод. Сумата от първите N естествени числа се дава по формулата:

 Сума = N * (N + 1) / 2

Преобразуването му в код ще изглежда по следния начин:

 int сума (int N) (връщане N * (N + 1) / 2;)

This code executes in just one instruction and gets the task done no matter what the value is. Let it be greater than the total number of atoms in the universe. It will find the result in no time.

The time taken to solve the problem, in this case, is 1/y (which is 10 nanoseconds). By the way, the fusion reaction of a hydrogen bomb takes 40-50 ns, which means your program will complete successfully even if someone throws a hydrogen bomb on your computer at the same time you ran your code. :)

Note: Computers take a few instructions (not 1) to compute multiplication and division. I have said 1 just for the sake of simplicity.

More on Scalability

Scalability is scale plus ability, which means the quality of an algorithm/system to handle the problem of larger size.

Consider the problem of setting up a classroom of 50 students. One of the simplest solutions is to book a room, get a blackboard, a few chalks, and the problem is solved.

But what if the size of the problem increases? What if the number of students increased to 200?

The solution still holds but it needs more resources. In this case, you will probably need a much larger room (probably a theater), a projector screen and a digital pen.

What if the number of students increased to 1000?

The solution fails or uses a lot of resources when the size of the problem increases. This means, your solution wasn't scalable.

What is a scalable solution then?

Consider a site like Khanacademy, millions of students can see videos, read answers at the same time and no more resources are required. So, the solution can solve the problems of larger size under resource crunch.

If you see our first solution to find the sum of first N natural numbers, it wasn't scalable. It's because it required linear growth in time with the linear growth in the size of the problem. Such algorithms are also known as linearly scalable algorithms.

Our second solution was very scalable and didn't require the use of any more time to solve a problem of larger size. These are known as constant-time algorithms.

Memory is expensive

Memory is not always available in abundance. While dealing with code/system which requires you to store or produce a lot of data, it is critical for your algorithm to save the usage of memory wherever possible. For example: While storing data about people, you can save memory by storing only their age not the date of birth. You can always calculate it on the fly using their age and current date.

Examples of an Algorithm's Efficiency

Here are some examples of what learning algorithms and data structures enable you to do:

Example 1: Age Group Problem

Problems like finding the people of a certain age group can easily be solved with a little modified version of the binary search algorithm (assuming that the data is sorted).

The naive algorithm which goes through all the persons one by one, and checks if it falls in the given age group is linearly scalable. Whereas, binary search claims itself to be a logarithmically scalable algorithm. This means that if the size of the problem is squared, the time taken to solve it is only doubled.

Suppose, it takes 1 second to find all the people at a certain age for a group of 1000. Then for a group of 1 million people,

the binary search algorithm will take only 2 seconds to solve the problem
the naive algorithm might take 1 million seconds, which is around 12 days

The same binary search algorithm is used to find the square root of a number.

Example 2: Rubik's Cube Problem

Imagine you are writing a program to find the solution of a Rubik's cube.

This cute looking puzzle has annoyingly 43,252,003,274,489,856,000 positions, and these are just positions! Imagine the number of paths one can take to reach the wrong positions.

Fortunately, the way to solve this problem can be represented by the graph data structure. There is a graph algorithm known as Dijkstra's algorithm which allows you to solve this problem in linear time. Yes, you heard it right. It means that it allows you to reach the solved position in a minimum number of states.

Example 3: DNA Problem

DNA is a molecule that carries genetic information. They are made up of smaller units which are represented by Roman characters A, C, T, and G.

Imagine yourself working in the field of bioinformatics. You are assigned the work of finding out the occurrence of a particular pattern in a DNA strand.

It is a famous problem in computer science academia. And, the simplest algorithm takes the time proportional to

 (number of character in DNA strand) * (number of characters in pattern)

A typical DNA strand has millions of such units. Eh! worry not. KMP algorithm can get this done in time which is proportional to

 (number of character in DNA strand) + (number of characters in pattern)

The * operator replaced by + makes a lot of change.

Considering that the pattern was of 100 characters, your algorithm is now 100 times faster. If your pattern was of 1000 characters, the KMP algorithm would be almost 1000 times faster. That is, if you were able to find the occurrence of pattern in 1 second, it will now take you just 1 ms. We can also put this in another way. Instead of matching 1 strand, you can match 1000 strands of similar length at the same time.

And there are infinite such stories…

Final Words

Generally, software development involves learning new technologies on a daily basis. You get to learn most of these technologies while using them in one of your projects. However, it is not the case with algorithms.

Ако не познавате добре алгоритмите, няма да можете да определите дали можете да оптимизирате кода, който пишете в момента. От вас се очаква да ги знаете предварително и да ги прилагате, когато е възможно и критично.

Специално говорихме за мащабируемостта на алгоритмите. Софтуерната система се състои от много такива алгоритми. Оптимизирането на някой от тях води до по-добра система.

Важно е обаче да се отбележи, че това не е единственият начин да направите система мащабируема. Например техника, известна като разпределено изчисление, позволява на независими части от програма да се изпълняват на множество машини заедно, което я прави още по-мащабируема.