big data

Python As Primary Language for Spark Application

Recently, I was working on the script to recognize and compare a performance of our dev and prod environments. The script includes two phases, generation and processing (aggregation)  data to simulate high cluster’s load. I chose python (pyspark) for writing spark application to discover advantages and disadvantage of using Python for spark.

The experiment revealed next advantages and disadvantages:

Python is a common language for data scientist that is why it is easy to start with it in a good coherence team. Moreover, the fact that you don’t need to spend time for compiling makes experimentation with data more efficient and productive. The huge advantage of using Python is existence a mature data analysis libraries which facilitate a quick start.

Problems are revealed when you start working on sophisticated and complex data transformation and analysis by using third party libraries. Before distributing a python code by spark need to be sure that third-party libraries dependencies are available for all spark’s nodes. Spark provides with the ability to run egg’s, but in this case, we need to be sure that your compiled python code would run on other nodes environments.

The one more my concern of using pyspark it was debugging ability it is not even comparable to what you get in case of using Scala/Java.

linux

MidnightCommander doesn’t support ssh connection to non-default port

I think that midnight commander(mc) is the simplest and robust file manager for terminal but this simplicity has a price that should be paid by cutting some features which are often faced linux terminal users.
I guess, everyone faced the problem when you need to connect via ssh over non-default port (22). The problem is that mc doesn’t have option to for port connection that is why we should go other way.

The one of solution for this occasion , that works for me perfectly, is

  • go to ~/.ssh/
  • use or create new “config” file
  • add

    host somename
    user username
    hostname somename.com (or IP)
    port 2222

  • go to mc and make a connection by using value (e.g. somename)
linux

User switcher shows [Invalid UTF-8] after upgrading to Ubuntu 10.11

The problem is that configuration for login contains restrictions for UID and GID. As default value is 1000, so sign  “Invalid UTF-8” appears.

Solution:

The minimal value for UID and GID has to be changed.

if you do not know your ID  then execute

cat /etc/passwd |grep [USERNAME]
[USERNAME]:x:500:500:XXX,,,:/home/[USERNAME]:/bin/bash

  • The third field is the user identifier, the number that the operating system uses for internal purposes. It does not have to be unique.[*]
  • The fourth field is the group identifier. This number identifies the primary group of the user; all files that are created by this user may initially be accessible to this group.[*]

Your current UID and GID have to be set as UID_MIN and GID_MIN in etc/login.defs

sudo gedit  /etc/login.defs
# Min/max values for automatic uid selection in useradd
UID_MIN             500

# Min/max values for automatic gid selection in groupadd
GID_MIN             500

Everything should work after re-login or reboot.

Uncategorized

DB2 SQL Error: SQLCODE=-668, SQLSTATE=57016

After some manipulation with table I have faced the exception DB2 SQL Error: SQLCODE=-668, SQLSTATE=57016, the description of one does not explain properly the reason of problem and solution for it. It appeared after changing a table structure,in my case, the new column with some constrain was added .

Solution:
Execute the sql query. The parameter has to be the name of table that was changed.
REORG TABLE <NAME OF TABLE>

linux

Install Google Chrome 4 for OpenSuse 11.1

The nice browser works fine for me too. I would like to mark that the installation of application is easy.
At first it has to be downloaded last build of application by http://build.chromium.org/buildbot/snapshots/chromium-rel-linux/ Then it has to be unpackaged in properly folder.
After that it should be done some libraries are available for application: (it should be done with root permissions)

ln -s /usr/lib/libnss3.so /usr/lib/libnss3.so.1d
ln -s /usr/lib/libnssutil3.so /usr/lib/libnssutil3.so.1d
ln -s /usr/lib/libsmime3.so /usr/lib/libsmime3.so.1d
ln -s /usr/lib/libssl3.so /usr/lib/libssl3.so.1d
ln -s /usr/lib/libplds4.so /usr/lib/libplds4.so.0d
ln -s /usr/lib/libplc4.so /usr/lib/libplc4.so.0d
ln -s /usr/lib/libnspr4.so /usr/lib/libnspr4.so.0d

The application will became runnable but flash won’t works. That is why it should be made folder “plugins” and make symbolic link on properly flash lib inside the folder.

ln -s /usr/lib/browser-plugins/libflashplayer.so

and run application ./chrome –enable-plugins

programming

Intersting features of using Spring Injection in Grails

After some frustration of integration testing I carry on with investigation of spring injection in Grails.

So, the grails has predefined place for storing Spring configurations it’s /grails-app/spring. The grails affords two formats for defining beans. It’s standard (well-known) xml format and you can define beans by using groovy syntax. The name of spring configuration also is defined it’s resources.groovy or/and resources.xml. It’s not mistake that I’ve written or/and as you can use both of this format in same time/I think it’s useful think. Moreover, those files share the bean configurations. Another world you can use the bean that was defined in resources.groovy for making references in resources.xml, but not vice versa.

It looks like:

resources.groovy

beans = {

testString1(String, "test string")

}

resources.xml

<bean name="testString3" class="java.lang.String">
<constructor-arg ref="testString1"/>
</bean>

programming

The frustration experience of using integration testing for Grails application

After digging test capabilities of Grails I was found that it’s in really crude stage. Frankly said I expected little bit more…I think if you don’t have normal tool for testing application then you can’t use it for seriously tasks.
After some complains I try to explain what annoying things I was bumped into.

It was strange that my favorite IDE (Intelij Idea) doesn’t have normal functionalities for running integration test for Grails application. I understand that it’s not problems of Grails application It’s just probably the guys from Idea doesn’t see any reason for implementing one. But it’s more disadvantage for Grails then for IDEA.
After googling this problem I have found some workaround. The guys from Grails advise to run like

mvn grails:exec -Dcommand=test-app -Dargs="-integration"

Hence, it should be created mvn run configuration in IDEA with goal

grails:exec -Dcommand=test-app -Dargs="-integration"

but unfortunatelly it doesn’t perfect work for me as it runs unit and integration test. 😦 Let’s go further. the time for running two really simple test takes about 31 seconds. Come on…it’s awful the integration testing unbelievable slow. 😦 Moreover, I can run only one test from test case it’s awful..

As a result that I can say about integration testing. it’s developed like possibility that will use on (pre-)production environment for tracking application before deploying. But it’s really bad for test-driven development.

linux

StarDict works for me again

Yesterday, I desided to improve my communication media stuff , my skype didn’t allow to make voice call. When I call to somebody the skype responde me with message “Problem with Audio Playback”.  I t was detected that skype doesn’t work with pulseaudio correctly. The solution was found in ubuntu community documentation (https://help.ubuntu.com/community/Skype). By using Yast (my favorite OS is OpenSuse)  it was deleted all dependens on  pulseaudio and it was installed esound.

The next challenging was happened after starting stardict (version 3.0.1) application It didn’t start. It was freeze on start stage and of course nothing in the log 😦 . The suspicion falls on esound conflicts. It was desided  to make first step to stun my stardict instalationby some trick.

su

cd /usr/lib64/stardict/plugins

mv stardict_espeak.so stardict_espeak.so.bak

and start stardict again .  🙂 HE IS ALIVE 🙂